Scalable Systems

Scalable Systems

Distributed Computing Challenges

Ideally adding N servers allows the service to support N users.

Linear scalability is hard to achieve because…

Overheads and synchronization is needed
Load imbalances create hot spots
- Popular content
- Poor hash functions
Amdahl’s law
- A straggler slows everything down

Challenges

Scalability

Define: Scalability

Independent parallel processing of tasks / sub-requests ==> Can add additional servers for further concurrency

Fault Tolerance

Define: Fault Tolerance

Must be able to handle and recover from both software and hardware failures. Services and data must be replicated for redundancy.

Full redundancy is too expensive
Thus failure recovery is used

Failure Recovery Methods

Replication
- Replicate data and service
- Consistency may be harmed
  - New data is pushed
Re-Consumption
- Remember data lineage for job
- Re-run task
- Easy for stateless services

High Availability

Define: High Availability

Service must have 100% 24/7 uptime

Downtime is bad
- IT downtime costs on average 5600$ per minute!
  - Gartner study
Cloud service providers provide contractual agreement for quality of service

Downtime Guarantee	Max Downtime per Year	Estimated Costs
99.9%	8.77h	~3,000,000$
99.99%	52.6min	~300,000$

Consistency

Define: Consistency

Data stored/produced by the services must be consistent

Define: CAP Theorem

Consistency Availability Partition Tolerance

Define: Strongly Consistent

Cost of additional latency

Define: Inconsistent Operations

Better performance, applications are harder to write

Gmail’s email sending is strongly consistent, but marking a message is inconsistent.

Performance

Define: Performance

Predictable low latency processing with high throughput

Define: Tail Latency

The last 0.X% of the request latency distribution time

Remember that overall latency >= latency of slowest component

Design Principles of Cloud Applications

Define: Design for Self Healing

In a distributed system, failures happen all the time. Design the application to be self-healing

Define: Make all things redundant

Build redundancy into your application to avoid having single points of failure.

Define: Minimize Coordination

Minimize coordination between application services to achieve better scalability

Define: Design to Scale Out

Design your application so that it can scale horizontally, adding or removing new instances on demand.

Define: Partition Around Limits

Use partitioning to work around database, network and compute limits.

Define: Use of stateless services

Scaling without having a state is trivial

Define: Caching

Latency is king. Caching helps to significantly reduce the job’s latency

Define: Use the best data store for the job

Pick the storage technology that is the best fit for your data and how it will be used.

Define: Distribute Computation

Partition/Aggregate compute pattern is one that scales pretty well

Define: Design for Evolution

An evolutionary design is key for continuous innovation

Typical Design of Scalable Service

Find the requirements and goals of the system (e.g., functional, non-functional)
Figure out the workloads the system should be optimized for (e.g., is it a read-heavy workload, etc.)
Do a back-of-the-envelope calculations for estimated storage capacity needs
High-level system design
Do the database schema based on the functional requirements
Do the large-scale system design based on the non-functional requirements
How do you scale the system?
How can you make it reliable and redundant?
How would you do data sharding?
Cache and load balancing?
How can you implement the functional compute requirements in the scaled system

Additional Content

Amdahl’s Law

In general terms, Amdahl’s Law states that in parallelization, if P is the proportion of a system or program that can be made parallel, and 1-P is the proportion that remains serial, then the maximum speedup S(N) that can be achieved using N processors is: S(N)=1/((1-P)+(P/N)) As N grows the speedup tends to 1/(1-P).

Speedup is limited by the total time needed for the sequential (serial) part of the program. For 10 hours of computing, if we can parallelize 9 hours of computing and 1 hour cannot be parallelized, then our maximum speedup is limited to 10 times as fast. If computers get faster the speedup itself stays the same.

http://www.umsl.edu/~siegelj/CS4740_5740/Overview/Amdahl's.html

a straggler slows everything down

Overhead for parallelization
Synchronization is required
Load imbalances
Amdahl’s Law

Popular Content
Poor hash functions

Better than linear scalability (per server >1 user)

Per resource added, exactly one extra user can additionally use the service

Per resource added, less than one user can additionally use the service

Too expensive to make everything fully redundant

Replication
Reconsumption

Replicate the data and service and have the job run again.

Remember the data lineage for a job and re-run task. Easier for stateless.

May introduce data consistency problems (e.g. data consumed, new data pushed)

Define: SLA

Service Level Agreement