ScalaBLAST
ScalaBLAST1 is a parallel implementation of the original NCBI sequence alignment algorithm (BLAST). ScalaBLAST can be used to rapidly identify sequences which are similar to a set of protein sequences supplied by a user. A typical query list might contain thousands or millions of individual sequences, each of which is meant to be scored against a large database of publicly available sequence information, such as the nonredundant protein sequence database (nr).
ScalaBLAST achieves speedup in a multiprocessor environment by two concurrent methods: 1) breaking up the query list into smaller lists and scheduling a BLAST search to be performed on each smaller list by a given process group and 2) efficiently managing access to the large target database using Global Arrays, a software implementation of shared memory interface which can be used in shared or distributed memory architectures. This unique combination of breaking the query list and managing memory in an efficient way gives excellent scaling for jobs of a sufficient size.
Supercomputing 2008 Analytics Challenge video, featuring ScalaBLAST and SHOT
Click here to download the video in quicktime format (1.2 GB).
BE SURE TO SET YOUR QuickTIME PLAYER's PREFERENCES TO "Use high quality video setting when available"!
Performance and Benchmarks
Work factor is a measure of how many queries are completed by each processor per minute per million sequences in the database. 448,000+ sequences from the PFAM database were 'BLAST'ed against the nonredundant protein database (nr) using up to 1500 processors. Work factor was gagued by allowing the processors to get as far into the list as possible in 3.5 hours and dividing the number of queries processed in that time by the number of processors used and normalizing the results to a database size of 1 million sequences (for future comparison as nr grows in size). 1
ScalaBLAST was run using 1000 queries against the nr database. Using Global Arrays to provide software implementation of shared memory resulted in identical run times for two different architectures- a true shared memory architecture (SGI Altix, 1.5GHz Itanium-2) and a distributed memory architecture (Linux cluster with dual 1.5 GHz Itanium-2, and Quadrics QSnet-II network). 1
Memory management
ScalaBlast uses nonblocking communication available in the Global Array toolkit to implement prefetching to hide virtually all of the communication cost on cluster platforms.
Prefetching hides memory latency for distributed systems. In this figure, an approximately 50% boost in scaling is evident on MPP2 from prefetching, allowing ScalaBLAST to perform on the distributed memory system almost exactly the same as it does on the shared memory system (see previous figure).1 Prefetching was implemented by creating local memory buffers which are being filled via RDMA calls (transparent to the user) while processing is taking place on the previously filled buffer. When the end of a buffer is reached, the one-sided memory operation is then checked for completion and the active buffer is swapped while a new prefetch operation is initiated.
Portability
| processor | compiler | architecture | interconnect | filesystem |
|---|---|---|---|---|
| ia64 | intel | 1960 processor Cluster (MPP2) | Quadrics elanIV | Lustre, /home |
| ia64 | intel | 128 processor SGI Altix | N/A | /home |
| i386 | gnu | 96processor cluster (Presidio) | Gigabit Ethernet | /scratch, /home |
| em64 | intel | 4 processor cluster (XOA) | myrinet | /scratch, /home |
ScalaBLAST has been ported to a variety of high-end and commodity architectures, including distributed memory and true shared memory. It has been run using many different interconnect grades: gigabit ethernet, myrinet, quadrics. ScalaBLAST has been built using intel and gnu compiler, and has worked in both 64 and 32 bit architectures. ScalaBLAST works efficiently posting results and reading the database files over a globally mounted filesystem (Lustre), from local disk space (/scratch) and over network file systems (/home).
1 Oehmen, C.S., J. Nieplocha. "ScalaBLAST: A Scalable Implementation of BLAST for High Performance Data-Intensive Bioinformatics Analysis", IEEE Trans. Parallel Dist. Sys. Special issue on high-performance computational biology, 2006, in press.
