Skip to Main Content U.S. Department of Energy
PNNL: High Performance Computing

Global Arrays Release Notes

Global Arrays development has moved to GitHub. Please see the file for the latest release notes.

5.5 August 2016

  • Added port for libfabric (--with-ofi) via ComEx. This adds native support for Intel Omnipath.
  • Numerous bug fixes.

5.4 April 2016

  • Numerous bug fixes.
  • Performed license/copyright audit of source files.
  • Removed BGML and DCMF ports. Use MPI multithreading instead.

5.4b May 2015

  • Added port for MPI progress ranks (--with-mpi-pr) via ComEx.
  • Added port for MPI multithreading with progress threads (--with-mpi-pt) via ComEx.
  • Added port for MPI multithreading (--with-mpi-mt) via ComEx.
  • Changed default network interface from SOCKETS to MPI two-sided (--with-mpi-ts) via ComEx.
  • Improved ScaLAPACK and ELPA integration.
  • Replace EISPACK with LAPACK/BLAS.

User Guide for New Ports

This release contains a number of new communication runtime implementations which directly use MPI instead of a native communication library, e.g., OpenIB verbs, Cray DMAPP. The basis of all of our MPI-based ports is the use of the MPI two-sided primitives (MPI_Send, MPI_Recv) to implement our ComEx/ARMCI one-sided protocols. The primary benefit of these ports is that Global Arrays and its user applications will now run on any platform where MPI is supported.

The recommended ports include, in order,

  1. --with-mpi-pr MPI-1 with progress ranks.
  2. --with-mpi-pt MPI-2 with progress threads.
  3. --with-mpi-ts MPI-1 two-sided.

There are some caveats which must be mentioned in order to use the new MPI ports.

How to Use Progress Ranks

Your application code must not rely on MPI_COMM_WORLD directly. Instead, you must duplicate the MPI communicator that the GA library returns to you in place of any world communicator. Example code follows:


      program main
      implicit none
#include “mpi.fh"
#include "global.fh"
#include "ga-mpi.fh"
      integer comm
      integer ierr
      call mpi_init(ierr)
      call ga_initialize()
      call ga_mpi_comm(comm)
! use the returned comm as ususal
      call ga_terminate()
      call mpi_finalize(ierr)


#include "ga.h"
#include "ga-mpi.h"
int main(int argc, char **argv) {
    MPI_Comm comm;
    comm = GA_MPI_Comm();
    return 0;

How to Use Progress Threads

This port uses MPI_Init_thread() internally with a threading level of MPI_THREAD_MULTIPLE. It will create one progress thread per compute node. It is advised to undersubscribe your compute nodes by one core. Your application code can remain unchanged unless you call MPI_Init() in your application code, in which case GA will detect the lower MPI threading level and abort with an error.

How to Use the New Default Two Sided Port

The MPI two-sided port is fully compatible with the MPI-1 standard. However, your application code will require additional GA_Sync() calls prior to and after any MPI function calls. This effectively splits user application code into blocks/epochs/phases of MPI code and GA code. Not doing so will likely cause your application to hang since our two sided port can only make communication progress inside of a GA function call.

Any application code which only makes GA function calls can remain unchanged.

5.3 February 2014

  • Fixed bug where incorrect BLAS integer size was used in ComEx.
  • Fixed incompatibilities between this and the 5.2 release with respect to NWChem.
  • Validated this software with the NWChem 6.3 sources.
  • See the 5.3b release notes for further relevant details

5.3b December 2013

  • Added port for Portals4 (configure --with-portals4).

    When linking to the Portals4 reference implementation, it is highly recommended that the ummunotify driver is installed. Otherwise, memory registration errors and/or dropped messages may occur. This behavior can be verified using the PTL_DEBUG=1 and PTL_LOG_LEVEL=2 Portals4 environment variables.

  • Added ARMCI profiling to ComEx.
  • Updated autotool scripts to latest versions.

5.2 August 2012

  • Added the Communications Runtime for Extreme Scale (ComEx) software to the GA release bundle. ComEx provides the following features to GA:
    • Added port for Cray XE and XC (configure --with-dmapp).
    • Added port for MPI-1 two-sided (configure --with-mpi-ts).
    For additional information on ComEx, please see the ComEx website.
  • Added support for externally linkable ARMCI (configure --with-armci=...)
  • Added ability for users to select a different IB device using the ARMCI_OPENIB_DEVICE environment variable.
  • Added ability for users to specify upper bound on ARMCI shared memory. Added ARMCI_DEFAULT_SHMMAX_UBOUND which is set and parsed at configure time.
  • Changed how users link their applications. You now need "-lga -larmci" since libarmci is possibly an external dependency (see above).
  • Fixed support for Intel/QLogic IB thanks to patches from Dean Luick, Intel.
  • Improved BLAS library parsing for ACML and MKL ('-mkl').
  • Improved ScaLAPACK integration thanks to Edo Apra.

5.1.1 October 2012

  • Added a wrapper for fsync to SF library.
  • Added MA_ACCESS_INDEX_TYPE to ma library.
  • Added missing Python C sources.
  • Changed atomic operations.
  • Fixed numerous bugs for compilation on IBM AIX, as well as IBM xl compilers.
  • Fixed many warnings reported by Intel compilers.
  • Fixed integer overflow for indexing large arrays during accumulate.
  • Fixed bug in GA_Zgemm64.
  • Fixed ghosts hanging.
  • Removed a few debugging print statements from pario.

5.1 February 2012

  • Added unified "NGA" prefix for all functions
  • Added new profiling interface and weak symbol interposition support
  • Added support for struct data types using the new NGA_Register_type(), NGA_Get_field() and NGA_Put_field() functions
  • Added ga-config for 3rd party software to query compilation flags used for GA
  • Added Global Arrays in NumPy (GAiN) interface for a NumPy work-alike with a Global Arrays backend
  • Added GA_MPI_Comm() and other functions to retrieve MPI_Comm object associated with a GA processor group
  • Added MPI-MT (MPI_THREAD_MULTIPLE) port for use when a native port is not available
  • Added armci_msg_finalize() to abstract the message passing function required for application termination
  • Added ability for EAF_Open() to use MA memory operations instead of I/O operations
  • Changed ARMCI directory structure
  • Changed NGA_Add_patch() algorithm to use less memory
  • Changed tascel to no longer be part of the top-level configure (must be installed separately)
  • Changed Python base module from "ga" to "ga4py" since we now have the submodules and ga4py.gain
  • Changed autotools build to use autoconf-2.68 and libtool-2.4.2
  • Changed ARMCI Fortran sources to use integer*4 type rather than an integer size compiler flag
  • Fixed numerous configure and source bugs with our ScaLAPACK integration
  • Fixed bug in NGA_Matmul_patch()
  • Fixed numerous configure bugs
  • Fixed numerous Makefile bugs
  • Fixed support for large files
  • Improved internal code, reducing the amount of dereferenced pointers
  • Improved restriction on calling GA_Initalize() before messaging layer -- GA_Initalize(), ARMCI_Init(), MPI_Init()/tcg_pbegin() can be called in any order
  • Removed deprecated interconnects Myrinet GM, VIA, Mellanox Verbs API, Quadrics Elan-3, Quadrics Elan-4
  • Removed vampir support
  • Removed KSR and PARAGON support

5.0.3 February 2012

  • Added support for Cray compilers
  • Fixed shared library linking
  • Fixed a few *critical* bugs in GA_Duplicate()
  • Fixed bugs in strided get/put/acc routines
  • Fixed bugs in GPC support
  • Fixed numerous compilation warnings
  • Fixed numerous valgrind warnings
  • Fixed numerous configure bugs
  • Fixed numerous Makefile bugs
  • Fixed numerous bugs in Python interface
  • Fixed bug in GA_Patch_enum()
  • Fixed bug in TCGMSG-MPI nxtval()
  • Fixed latency reporting in perf.x benchmark
  • Fixed Fortran ordering bug in NGA_Scatter(), NGA_Scatter_acc(), and NGA_Gather()
  • Improved BGP configure
  • Improved TCGMSG-MPI build
  • Improved test suite
  • Improved numerous inefficiencies in Python interface

5.0.2 March 2011

  • Added support for Sun Studio compilers
  • Added support for AMD Open64 compilers
  • Changed ARMCI RMW interface to use void*
  • Changed GA_Patch_enum() to use void*
  • Fixed bugs in processor groups code
  • Fixed numerous compilation warnings
  • Fixed numerous configure bugs
  • Fixed numerous Makefile bugs
  • Fixed bug in GA_Unpack()
  • Fixed bug in GA_Dgemm() concerning transpose
  • Fixed numerous bugs in Python interface
  • Improved ga_scan_copy() and ga_scan_add()

5.0.1 January 2011

  • Fixed numerous configure bugs
  • Fixed numerous Makefile bugs
  • Fixed numerous bugs in test suite
  • Fixed atomics bug
  • Fixed numerous tascel bugs
  • Fixed bug in single complex matrix multiply
  • Fixed bug in destruction of mutexes
  • Fixed bug in process group collectives
  • Fixed bug in GA_Terminate()
  • Improved configure for NEC and HPUX

5.0 November 2010

  • Added Restricted Arrays (see user manual)
  • Added on-demand connection management for infiniband
  • Added new Python interface
  • Added Task Scheduling Library (tascel)
  • Added NGA_Scatter_flat(), NGA_Gather_flat(), NGA_Scatter_acc_flat()
  • Added NGA_Locate_nnodes()
  • Changed build system to use GNU autotools (autoconf,automake,libtool)
  • Improved scalability for fence

5.0b July 2010

  • Changed build system to use GNU autotools (autoconf,automake,libtool)

4.3 May 2010

  • Fixed BlueGene/P port
  • Fixed support for Sparse Data Operations (See GA user manual - Chapter 11 for more details)
  • Improved portals port to scale upto 200K procs
  • Improved OpenIB port

4.2 July 2009

  • Added support for Sparse Data Operations (See GA user manual - Chapter 11 for more details)
  • Fixed BlueGene/P port
  • Improved portals port for Cray XT5
  • Improved OpenIB port

4.1 May 2008

  • Added port for Cray XT4 network
  • Added port for BlueGene/L network
  • Added port for BlueGene/P network
  • Added port for OpenIB network
  • Added MPI-SPAWN network (one-sided communication through MPI2 Dynamic Process management and Send/Recv)
  • Improved one-sided non-blocking operations

4.0 April 2006

  • Added support for multilevel parallelism: processor group awareness
  • Added support for large arrays (a terabyte Global Array now possible)
  • Improved GA_Dgemm matrix multiplication based on SRUMMA
  • Improved one-sided non-blocking operations

Site Links

Our Research

Past Research

Global Arrays Links

News and Events