Many big data applications process large volumes of data using a pipeline of operators, a number of which do not require any ordering of data records. Hurricane is a scalable decentralized system that aggregates secondary storage devices in a cluster with the aim of supporting parallel scans of data stored across them. Hurricane spreads input and output data uniformly at random and leverages the absence of order between data blocks to seamlessly balance load and mitigate the effect of stragglers. We implemented a prototype of Hurricane with an HDFS-like RPC interface to facilitate interoperability and show that the resulting system is scalable and seamlessly achieves I/O balance at near-maximal bandwidth.
Pipeling of order-oblivious operators
Order-oblivious operators have the property that the result of their execution on a list of records does not depend on the ordering of records in that list : any permutation returns a correct answer. Those operators are very interesting in a cluster because they don't need any centralized control. For example, you can take advantages of those by balancing I/O load across available disks.
Balance I/O load randomly across available disks & saturate available storage bandwidth
This is possible because Hurricane leverages the absence of order between data blocks (order-oblivious operators). We achieve with the assumption that remote storage bandwidth = local storage bandwidth because storage is slower than network thus network is not a bottleneck. This could be satisfied by carefully matching network interfaces to storage interfaces. This configuration is realistic for most clusters at rack scale or even more  .
Albeit the server side of Hurricane is written in C++, we provide an HDFS-like RPC interface using Thrift. This allows users to access through Hurricane with a large class of programming languages. Moreover, if we wanted to change the server side code in another language, it would still work !