Active Storage
The efficient management of enormous and increasing volumes of data
remains a
challenging problem. Despite the improvements of storage capacities,
the cost
of moving data between the processing nodes and the storage devices has
not
improved at the same rate as the disk capacity. In recent years,
several
parallel file systems have been developed to tackle the problem of data
management in the context of high-performance computing systems. Some
of these
parallel file systems, such as Lustre and PVFS, use mainstream server
computers
as I/O servers. The combined computing capacity of these hundreds, or
even
thousands, of storage nodes can be considerable. However, it is not
usually
exploited due to the role of these nodes as I/O elements that only
store
data.
One approach to reduce the bandwidth requirements between storage and
compute
devices, and to leverage the computing capacity of the storage nodes,
is, when
possible, to move computation closer to the storage devices. We call
this
approach Active Storage in context of parallel file systems.
By
offloading some computing tasks to the storage nodes, near to the data
that
they manage, Active Storage makes it possible to substantially reduce
the data
movement across the network and, hence, the overall network traffic.
Active Storage is targeted at applications with I/O-intensive stages
that
involve fundamentally-independent data sets. It can be used to process,
either
on-line or off-line, output files from scientific simulation runs. Some
examples of tasks suitable for Active Storage include: compression and
archival
of output files, statistical analysis of the output data and storing
the
results in an external database, indexing the contents of the output
files,
simple data transformations such as unit conversion by multiplying by a
scalar
a set of numbers, etc. By performing these operations in the storage
nodes, not
only we achieve the aforementioned benefits with respect to the
resource usage,
but also we can exonerate scientific application programmers from
implementing
I/O tasks which are "oblivious" to the main application.
Founding
- This effort is a part of the DoE SciDAC SDM project.
Current Focus Areas
- Active Storage for Lustre
- Active Storage for PVFS
- Active Storage with striped files
Contacts
- Jarek Nieplocha
- Juan Piernas-Canovas
- Evan J. Felix
Publications
- Juan Piernas, Jarek Nieplocha, "Efficient Management of
Complex Striped Files in Active Storage", Proc. Europar'08. 2008.
PDF
- Juan Piernas, Jarek Nieplocha, Evan J. Felix. "Evaluation
of Active Storage Strategies for the Lustre Parallel File
System". Proceedings of the Supercomputing'07 Conference,
November, 2007.
PDF
- Evan J. Felix, Kevin Fox, Kevin Regimbal, Jarek Nieplocha.
"Active Storage Processing in a Parallel File System".
6th LCI International Conference on Linux Clusters: The HPC Revolution. Chapel Hill, North Carolina, on April 26, 2005.
PDF
