CBDP

a straggler slows everything down

Overhead for parallelization
Synchronization is required
Load imbalances
Amdahl’s Law

Popular Content
Poor hash functions

Better than linear scalability (per server >1 user)

Per resource added, exactly one extra user can additionally use the service

Per resource added, less than one user can additionally use the service

Too expensive to make everything fully redundant

Replication
Reconsumption

Replicate the data and service and have the job run again.

Remember the data lineage for a job and re-run task. Easier for stateless.

May introduce data consistency problems (e.g. data consumed, new data pushed)

choosing the location of the DC
- e.g. Iceland
raise temperature of aisles
reduce conversion of energy (e.g. directly work at 12V)
reuse dissapated heat

1.15-1.18

Power
Cooling
Shelter
Security

Inverse of data center efficiency

High numbers of read only operations

Get consistent snapshot (read: back-up snapshot) of leader’s database (already saved somewhere)
Copy snapshot to follower
Since leader likely has update in this time, get the leader’s replication log
Apply log. Once processed, follower should be caught-up

Catch Up Recovery

Failover

synchronous or asynchronous

Single-leader replication
Multi-leader replication
Leaderless replication

Scalability
Less contention
Improve performance
Optimize storage costs
Improve Security

If a partition is unfair, that is, one part has more data or queries.

Disproportionally high load

horizontal
vertical
functional

key range
hash-based

Network Behavior
Node Behavior
Timing Behavior

Vector
Lamport

Cannot update state immediately, must wait for delivery
Need fault tolerant total order broadcast

Empty AppendEntries RPCs

100-500ms

In consensus 1 or more nodes propose values, only one of which is accepted. Crashes are tolerated in this system.

As for Atomic Commits, every node votes whether or not it wishes to commit or abort. Commits require all nodes, whilst aborts require 1+ nodes to vote abort. Crashes cause a direct abort.

Linearizability only handles a single transaction, whilst Serializability groups transactions together, that can be executed in parallel.

Yes, known as Strict Serializability

Operation Based CRDT

Commutative: s1 U s2 = s2 U s1
Associative: (s1 U s2) U s3 = s1 U (s2 U s3)
Idempotent: s1 U s1 = s1

Question: What is the consequence of Amdahl’s law on parallel systems

Question: Why is linear scalability hard to achieve (4)?

Question: How can load imbalanced be created? (2)

Question: What is super linear scalability

Question: What is linear scalability?

Question: What is Sub Linear Scalability

Question: Why full redundancy is not used?

Question: What are the two Failure Recovery Methods?

Question: Failure Recovery: Replication

Question: Failure Recovery: Re-Consumption

Question: Failure Recovery: Replication Problem

Question: You can improve the PUE by…

Question: Trivia: 2018 PUE levels

Question: What do DC provide (4)

Question: PUE is the…

Question: When should you use replication?

Question: How to Enroll Followers

Question: How to handle Follower Failure

Question: How to handle Leader Failure

Question: Replication can be…

Question: Three main approaches to replication are…

Question: Benefits of Partitioning…

Question: What is a skewed shard?

Question: What is a hot spot shard?

Question: Three Main Types of Partitioning?

Question: Two Main Approaches of Horizontal Partitioning?

Question: Three System Models

Question: Two types of Logical Clocks

Question: Limitations of State Machine Replication

Question: What do Raft Heartbeats contain

Question: Election Timeout Typical Duration

Question: Difference between Atomic Commit and Consensus

Question: Linearizability vs Serializability

Question: Can you combine Linearizability and Serializability

Question: Which CRDT requires Broadcast

Question: Merge Operation of Two CRDT States must be…