论文原址: http://csl.stanford.edu/~christos/publications/2014.mutilate.eurosys.pdf
In this paper, we analyze the challenges of maintaining high QoS for low-latency workloads when sharing servers with other workloads.
The additional workloads can interfere with resources such as processing cores, cache space, memory or I/O bandwidth
The goal of this work is to investigate if workload colocation and good quality-of-service for latency-critical services are fundamentally incompatible in modern systems, or if instead we can reconcile the two
混部引起的Qos下降通常有三种原因:
- queuing delay : increases in queuing delay due to interference on shared resources
- scheduling delay : long scheduling delays when timesharing processor cores
- load imbalance : poor tail latency due to thread load imbalance
论文中以memcached为例,分别从这三反面详细分析了时延敏感服务在混部场景下Qos是如何被影响的。

1. Queuing delay
What: Queuing delay occurs due to coincident or rapid request arrivals,Interference from co-located workloads impacts queuing delay by increasing service time, thus decreasing service rate. Even if the co-located workload runs on separate processor cores, its footprint on shared caches, memory channels, and I/O channels slows down the service rate for the latency critical workload.
How: Thus, we propose that load be provisioned to services in an interference-aware manner, that takes into account the reduction in throughput that a service might experience when deployed on servers with co-located workloads.
2. Scheduling delay
What: 调度延迟主要有两方面:
- scheduler wait time
- context switch latency
Linux内核默认CFS调度器最大的问题是: CFS’s wakeup placement algorithm allows sporadic tasks to induce long wait time on latency-sensitive tasks like memcached.
How : F ortunately, there are several strategies one can employ to mitigate this wait time for latency-sensitive services, including
- adjusting task share values in CFS,
- utilizing Linux’s POSIX real-time scheduling disciplines instead of CFS, or
- using a general purpose scheduler with support for latency-sensitive tasks, like BVT
- CPU Bandwidth Limits to Enforce Fairness
3. Load imbalance
What: A latency-sensitive service’s vulnerability to load imbalance can be easily ascertained by purposefully putting it in a situation where threads are unbalanced
One solution to this problem is particularly straight-forward and effective: threads can be pinned explicitly to distinct cores, so that Linux can never migrate them on top of each other