Table of Contents

Support for deduplication in Loris

Student: Not assigned yet
Owner: David van Moolenbroek dcvmoole@cs.vu.nl / Raja Appuswamy raja@cs.vu.nl
Git branch name: N/A

The Loris project

The storage industry has witnessed a tremendous change in both hardware and software fronts over the past decade. Storage hardware landscape is witnessing the birth and adoption of new classes of storage like flash memory-based solid state drives that posses radically different characteristics compared to the traditional disk drives. Storage software, on the other hand, has evolved from simple single disk file systems to feature-rich, sophisticated storage systems like ZFS and Btrfs that support a suite of features like snapshotting, cloning, checksumming, background defragmentation etc.

Our research involves building a next-generation storage stack called Loris, on top of the MINIX 3 operating system. Before we started designing our stack, we analyzed all existing solutions along three different dimensions - reliability, flexibility and heterogeneity. We found that all existing solutions fail to satisfy the requirements of an ideal storage stack. Using the modular, layered network stack as a guiding example, we then designed and implemented the Loris stack which solves all problems faced by existing approaches.

Loris' modularity makes it implement a highly-reliable storage solution that can protect itself from both hardware and software failures. Loris' flexibility makes it possible to deploy a storage stack that can snapshot and clone data in a range of granularities ranging from individual files all the way to file volumes.

Project description

The widespread adoption of virualization in data centers has resulted in a proliferation of duplicate data being created and stored to hold virtual machine disk images. As most virtual machine hold identical copies of several files, like system binaries, storing all this data once per virtual machine results in very poor storage efficiency. Deduplication has emerged as the industry standard for tackling this problem.

By computing a hash over each data block stored, dedup attempts to identify all duplicate blocks. By storing only one copy of each unique data block, dedup provides high storage efficiency. In contrast to dedup which works on a block by block basis, the layers in the Loris stack work on a whole file basis. Thus, this project involves exploring several design alternatives with respect to integrating deduplication with the Loris stack, implementing a dedup solution, and evaluating it using a range of benchmarks.

If you are interested, please come and talk to us!