High-Performance Computing

ScalaBLAST

ScalaBLAST1 is a parallel implementation of the original NCBI sequence alignment algorithm (BLAST). ScalaBLAST can be used to rapidly identify sequences which are similar to a set of protein sequences supplied by a user. A typical query list might contain thousands or millions of individual sequences, each of which is meant to be scored against a large database of publicly available sequence information, such as the nonredundant protein sequence database (nr).

ScalaBLAST achieves speedup in a multiprocessor environment by two concurrent methods: 1) breaking up the query list into smaller lists and scheduling a BLAST search to be performed on each smaller list by a given process group and 2) efficiently managing access to the large target database using Global Arrays, a software implementation of shared memory interface which can be used in shared or distributed memory architectures. This unique combination of breaking the query list and managing memory in an efficient way gives excellent scaling for jobs of a sufficient size.

Supercomputing 2008 Analytics Challenge video, featuring ScalaBLAST and SHOT

Click here to download the video in quicktime format (1.2 GB).

BE SURE TO SET YOUR QuickTIME PLAYER's PREFERENCES TO "Use high quality video setting when available"!

Performance and Benchmarks

Work factor is a measure of how many queries are completed by each processor per minute per million sequences in the database. 448,000+ sequences from the PFAM database were 'BLAST'ed against the nonredundant protein database (nr) using up to 1500 processors. Work factor was gagued by allowing the processors to get as far into the list as possible in 3.5 hours and dividing the number of queries processed in that time by the number of processors used and normalizing the results to a database size of 1 million sequences (for future comparison as nr grows in size). 1

run times for distributed and shared memory architectures

ScalaBLAST was run using 1000 queries against the nr database. Using Global Arrays to provide software implementation of shared memory resulted in identical run times for two different architectures- a true shared memory architecture (SGI Altix, 1.5GHz Itanium-2) and a distributed memory architecture (Linux cluster with dual 1.5 GHz Itanium-2, and Quadrics QSnet-II network). 1

Memory management

ScalaBlast uses nonblocking communication available in the Global Array toolkit to implement prefetching to hide virtually all of the communication cost on cluster platforms.

Prefetching hides memory latency for distributed systems. In this figure, an approximately 50% boost in scaling is evident on MPP2 from prefetching, allowing ScalaBLAST to perform on the distributed memory system almost exactly the same as it does on the shared memory system (see previous figure).1 Prefetching was implemented by creating local memory buffers which are being filled via RDMA calls (transparent to the user) while processing is taking place on the previously filled buffer. When the end of a buffer is reached, the one-sided memory operation is then checked for completion and the active buffer is swapped while a new prefetch operation is initiated.

Portability

processor compiler architecture interconnect filesystem
ia64 intel 1960 processor Cluster (MPP2) Quadrics elanIV Lustre, /home
ia64 intel 128 processor SGI Altix N/A /home
i386 gnu 96processor cluster (Presidio) Gigabit Ethernet /scratch, /home
em64 intel 4 processor cluster (XOA) myrinet /scratch, /home

ScalaBLAST has been ported to a variety of high-end and commodity architectures, including distributed memory and true shared memory. It has been run using many different interconnect grades: gigabit ethernet, myrinet, quadrics. ScalaBLAST has been built using intel and gnu compiler, and has worked in both 64 and 32 bit architectures. ScalaBLAST works efficiently posting results and reading the database files over a globally mounted filesystem (Lustre), from local disk space (/scratch) and over network file systems (/home).


1 Oehmen, C.S., J. Nieplocha. "ScalaBLAST: A Scalable Implementation of BLAST for High Performance Data-Intensive Bioinformatics Analysis", IEEE Trans. Parallel Dist. Sys. Special issue on high-performance computational biology, 2006, in press.


HPC Information Contact
Webmaster
Last Modified: May 2006