Table of Contents

Advanced File Typing in Loris

Student: Not assigned yet
Owner: David van Moolenbroek dcvmoole@cs.vu.nl / Raja Appuswamy raja@cs.vu.nl
Git branch name: N/A

The Loris project

The storage industry has witnessed a tremendous change in both hardware and software fronts over the past decade. Storage hardware landscape is witnessing the birth and adoption of new classes of storage like flash memory-based solid state drives that posses radically different characteristics compared to the traditional disk drives. Storage software, on the other hand, has evolved from simple single disk file systems to feature-rich, sophisticated storage systems like ZFS and Btrfs that support a suite of features like snapshotting, cloning, checksumming, background defragmentation etc.

Our research involves building a next-generation storage stack called Loris, on top of the MINIX 3 operating system. Before we started designing our stack, we analyzed all existing solutions along three different dimensions - reliability, flexibility and heterogeneity. We found that all existing solutions fail to satisfy the requirements of an ideal storage stack. Using the modular, layered network stack as a guiding example, we then designed and implemented the Loris stack which solves all problems faced by existing approaches.

Loris' modularity makes it implement a highly-reliable storage solution that can protect itself from both hardware and software failures. Loris' flexibility makes it possible to deploy a storage stack that can snapshot and clone data in a range of granularities ranging from individual files all the way to file volumes.

Project description

Fle systems have been used as document stores for housing a heterogeneous mix of data ranging from small text files to large multimedia files like photos, music and videos. With the amount of data stored by users increasing at an alarming rate, hierarchy-based file access and organization has lost ground to content-based access mechanisms. Most users have resorted to using attribute-based or tag-based naming schemes offered by multimedia and desktop search applications for managing and searching their data.

These applications essentially build a user-level data management system that crawls the file system periodically to extract data and metadata, maintains indices on the extracted information, and offers application-specific search interfaces to query over the gathered data. Such applications also store data in custom file formats and are essentially mini-file systems managing everything from space allocation to data caching within a file. For instance, a recent study revealed that the DOC file format is modeled based on the FAT file system and has an incredibly complex layout for data. These applications are also monolithic giants that reference hundreds of libraries and use tons of application frameworks for accessing data. For instance, a recent study showed that the simple process of inserting fifteen images and saving a DOC file results in accesses to over three hundred different files in the file system.

We recently showed how integrating metadata search into the storage stack provides several benefits by extending the Loris stack to support metadata management. We would now like to investigate the benefits of integrating data management into Loris. There are lots of open questions that we can investigate as a part of this project: 1) Does awareness of complex application-specific file system have any potential performance/reliability benefits? 2) Can we use Loris as an infrastructure for implementing application frameworks?, 3) Can we use type information to automate policy assignment?

If you are interested, please come and talk to us!