In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage
Log-structured storage has been widely deployed in various domains of storage systems for high performance. However, its garbage collection (GC) incurs severe write amplification (WA) due to the frequent rewrites of live data. This motivates many research studies, particularly on data placement strategies, that mitigate WA in log-structured storage. We show how to design an optimal data placement scheme that leads to the minimum WA with the future knowledge of block invalidation time (BIT) of each written block. Guided by this observation, we propose InferBIT, a novel data placement algorithm that aims to minimize WA in log-structured storage. Its core idea is to infer the BITs of written blocks from the underlying storage workloads, so as to place the blocks with similar estimated BITs into the same group in a fine-grained manner. We show via both mathematical and trace analyses that InferBIT can infer the BITs by leveraging the write skewness property in real-world storage workloads. Evaluation on block-level I/O traces from real-world cloud block storage workloads shows that InferBIT achieves the lowest WA compared to eight state-of-the-art data placement schemes.
READ FULL TEXT