ChemIO
High-Performance I/O for Computational Chemistry Applications
The ChemIO project is a joint effort of Argonne National Laboratory and Pacific Northwest National Laboratory. It is affiliated with a DOE Grand Challenge project developing Massively Parallel Methods for Computational Chemistry, with the multi-agency Scalable I/O Project, and with the EMSL. It has three main goals:
- To facilitate the development of portable, high-performance computational chemistry codes, by defining a standard I/O API that meets chemistry requirements, and providing high-performance implementations of this API on different high-performance computers;
- To contribute to the understanding of requirements for standard APIs for high-performance I/O, the performance that can be achieved for these APIs on various high-performance computer systems, and the techniques required to optimize performance on various systems; and
- To investigate how the successful Global Array programming model can be extended to secondary storage.
Computational chemistry is an extremely interesting discipline from a high-performance I/O standpoint. Certain computational chemistry codes perform well on scalable parallel computers. They often operate on very large data sets and display irregular access patterns. Furthermore, many state-of-the-art computational chemistry codes have been engineered to perform repeated recomputations so that they can run in-memory on vector computers; it seems likely that I/O-based ("out-of-core") solutions will allow significantly better performance on scalable parallel computers.
A paper accepted for publication in a special issue of The Int. Journal of Supercomputer Applications and High Performance Computing provides further details on the ChemIO project.
Progress, Software, and Users
So far we have produced a design and implementation for a chemistry I/O API, ChemIO, comprising three components:
- Disk Resident Arrays(DRA), an extension of global arrays to secondary storage;
- Exclusive Access Files (EAF), per-processor private files; and
- Shared Files(SF), files to which multiple processors can read and write independently.
All the components have asynchronous API to allow the applications overlap expensive I/O with computations.
The ChemIO is implemented in a a modular fashion. The user level librarries (DRA, EAF, SF) are layered upon ELIO (ELementary I/O) device library that provides an portable interface to different file systems.
On IBM SP, Shared Files use Distant I/O library in addition to ELIO.
ChemIO code is in use by several application groups. For example:
- A group at Argonne are using ChemIO's Exclusive Access Files to build a high-performance non-direct (out-of-core) SCF code (see also their benchmark results)
- RI-SCF and RI-MP2 programs being developed at PNNL and Syracuse use DRA
- semi-direct SCF implemented in NWChem (PNNL) uses EAF