metadata distribution and consistency techniques for large scale cluster file systems

Most supercomputers nowadays are based on large clusters, which call for sophisticated, scalable, and decentralized metadata processing techniques. From the perspective of maximizing metadata throughput, an ideal metadata distribution policy should automatically balance the namespace locality and even distribution without manual intervention. None of existing metadata distribution schemes is designed to make such a balance. We propose a novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units. Extensive simulation and measurement results show that DDG policies with a proper granularity significantly outperform traditional techniques such as the Random policy and the Subtree policy by 40 percent to 62 times. In addition, from the perspective of file system reliability, metadata consistency is an equally important issue. However, it is complicated by dynamic metadata distribution. Metadata consistency of cross-metadata server operations cannot be solved by traditional metadata journaling on each server. While traditional two-phase commit (2PC) algorithm can be used, it is too costly for distributed file systems. We proposed a consistent metadata processing protocol, S2PC-MP, which combines the two-phase commit algorithm with metadata processing to reduce overheads. Our measurement results show that S2PC-MP not only ensures fast recovery, but also greatly reduces fail-free execution overheads.