Ideally adding N servers allows the service to support N users.
Linear scalability is hard to achieve because…
Independent parallel processing of tasks / sub-requests ==> Can add additional servers for further concurrency
Must be able to handle and recover from both software and hardware failures. Services and data must be replicated for redundancy.
Failure Recovery Methods
Service must have 100% 24/7 uptime
Downtime Guarantee | Max Downtime per Year | Estimated Costs |
---|---|---|
99.9% | 8.77h | ~3,000,000$ |
99.99% | 52.6min | ~300,000$ |
Data stored/produced by the services must be consistent
Consistency Availability Partition Tolerance
Cost of additional latency
Better performance, applications are harder to write
Gmail’s email sending is strongly consistent, but marking a message is inconsistent.
Predictable low latency processing with high throughput
The last 0.X% of the request latency distribution time
Remember that overall latency >= latency of slowest component
In a distributed system, failures happen all the time. Design the application to be self-healing
Build redundancy into your application to avoid having single points of failure.
Minimize coordination between application services to achieve better scalability
Design your application so that it can scale horizontally, adding or removing new instances on demand.
Use partitioning to work around database, network and compute limits.
Scaling without having a state is trivial
Latency is king. Caching helps to significantly reduce the job’s latency
Pick the storage technology that is the best fit for your data and how it will be used.
Partition/Aggregate compute pattern is one that scales pretty well
An evolutionary design is key for continuous innovation
In general terms, Amdahl’s Law states that in parallelization, if P is the proportion of a system or program that can be made parallel, and 1-P is the proportion that remains serial, then the maximum speedup S(N) that can be achieved using N processors is:
S(N)=1/((1-P)+(P/N))
As N grows the speedup tends to 1/(1-P).
Speedup is limited by the total time needed for the sequential (serial) part of the program. For 10 hours of computing, if we can parallelize 9 hours of computing and 1 hour cannot be parallelized, then our maximum speedup is limited to 10 times as fast. If computers get faster the speedup itself stays the same.
http://www.umsl.edu/~siegelj/CS4740_5740/Overview/Amdahl's.html
a straggler slows everything down
Better than linear scalability (per server >1 user)
Per resource added, exactly one extra user can additionally use the service
Per resource added, less than one user can additionally use the service
Too expensive to make everything fully redundant
Replicate the data and service and have the job run again.
Remember the data lineage for a job and re-run task. Easier for stateless.
May introduce data consistency problems (e.g. data consumed, new data pushed)
Service Level Agreement