Skip to Main Content U.S. Department of Energy
PNNL: High Performance Computing

ARMCI Performance

The latency numbers below are for blocking operations. Nonblocking operations tend to provide better results depending on the benchmark used.

Network Protocol Latency Put (us) Latency Get (us)
Shared Memory(Linux) 0.162 0.160
Myrinet-GM(2.4GHz Pentium-4, Linux 2.4, Myrinet C card, GM 1.64) 12.8 17.8
Quadrics Elan-3(1GHz ia64,Linux 2.4.20) 4.71 6.42
Quadrics Elan-4(1.4GHz AMD Opteron,Linux 2.4) 1.80 2.66
Quadrics Elan-4(1.5GHz ia64,Linux 2.4.20) 2.45 4.56
Infiniband(1GHz ia64,Linux 2.4.20) 7.4 16.0


Myrinet-GM (IA32)

Linux cluster with dual 2.4GHz Pentium-4 nodes, Myrinet-2000 (M3F-PCI64C-2 Myrinet interface) located at the State University of New York at Buffalo. It employs GM (1.6.4) and MPICH-GM libraries provided by Myricom.

Comparison of latency of ARMCI get (nonblocking get followed by wait) operation with GM.

Non-blocking (overlapping communication with computation): % overlap for increasing message sizes for MPI and ARMCI (direct and server based protocols.

Myrinet-GM (IA64)

Linux cluster with dual 1 GHz Itanium-2 nodes, Myrinet-2000 ( M3F-PCI64B-2 Myrinet interface) located at Pacific Northwest National Laboratory. It employs GM (1.6.4) and MPICH-GM libraries provided by Myricom.

Site Links

Our Research

Past Research