Global Arrays Release Notes

Global Arrays development has moved to GitHub. Please see the CHANGELOG.md file for the latest release notes.

5.5 August 2016

Added port for libfabric (--with-ofi) via ComEx. This adds native support for Intel Omnipath.
Numerous bug fixes.

5.4 April 2016

Numerous bug fixes.
Performed license/copyright audit of source files.
Removed BGML and DCMF ports. Use MPI multithreading instead.

5.4b May 2015

Added port for MPI progress ranks (--with-mpi-pr) via ComEx.
Added port for MPI multithreading with progress threads (--with-mpi-pt) via ComEx.
Added port for MPI multithreading (--with-mpi-mt) via ComEx.
Changed default network interface from SOCKETS to MPI two-sided (--with-mpi-ts) via ComEx.
Improved ScaLAPACK and ELPA integration.
Replace EISPACK with LAPACK/BLAS.

User Guide for New Ports

This release contains a number of new communication runtime implementations which directly use MPI instead of a native communication library, e.g., OpenIB verbs, Cray DMAPP. The basis of all of our MPI-based ports is the use of the MPI two-sided primitives (MPI_Send, MPI_Recv) to implement our ComEx/ARMCI one-sided protocols. The primary benefit of these ports is that Global Arrays and its user applications will now run on any platform where MPI is supported.

The recommended ports include, in order,

--with-mpi-pr MPI-1 with progress ranks.
--with-mpi-pt MPI-2 with progress threads.
--with-mpi-ts MPI-1 two-sided.

There are some caveats which must be mentioned in order to use the new MPI ports.

How to Use Progress Ranks

Your application code must not rely on MPI_COMM_WORLD directly. Instead, you must duplicate the MPI communicator that the GA library returns to you in place of any world communicator. Example code follows:

Fortran77:

      program main
      implicit none
#include “mpi.fh"
#include "global.fh"
#include "ga-mpi.fh"
      integer comm
      integer ierr
      call mpi_init(ierr)
      call ga_initialize()
      call ga_mpi_comm(comm)
! use the returned comm as ususal
      call ga_terminate()
      call mpi_finalize(ierr)
      end

C/C++:

#include 
#include "ga.h"
#include "ga-mpi.h"
int main(int argc, char **argv) {
    MPI_Comm comm;
    MPI_Init(&argc,&argv);
    GA_Initialize();
    comm = GA_MPI_Comm();
    GA_Terminate();
    MPI_Finalize();
    return 0;
}

How to Use Progress Threads

This port uses MPI_Init_thread() internally with a threading level of MPI_THREAD_MULTIPLE. It will create one progress thread per compute node. It is advised to undersubscribe your compute nodes by one core. Your application code can remain unchanged unless you call MPI_Init() in your application code, in which case GA will detect the lower MPI threading level and abort with an error.

How to Use the New Default Two Sided Port

The MPI two-sided port is fully compatible with the MPI-1 standard. However, your application code will require additional GA_Sync() calls prior to and after any MPI function calls. This effectively splits user application code into blocks/epochs/phases of MPI code and GA code. Not doing so will likely cause your application to hang since our two sided port can only make communication progress inside of a GA function call.

Any application code which only makes GA function calls can remain unchanged.

5.3 February 2014

Fixed bug where incorrect BLAS integer size was used in ComEx.
Fixed incompatibilities between this and the 5.2 release with respect to NWChem.
Validated this software with the NWChem 6.3 sources.
See the 5.3b release notes for further relevant details

5.3b December 2013

Added port for Portals4 (configure --with-portals4).
When linking to the Portals4 reference implementation, it is highly recommended that the ummunotify driver is installed. Otherwise, memory registration errors and/or dropped messages may occur. This behavior can be verified using the PTL_DEBUG=1 and PTL_LOG_LEVEL=2 Portals4 environment variables.
Added ARMCI profiling to ComEx.
Updated autotool scripts to latest versions.

5.2 August 2012

Added the Communications Runtime for Extreme Scale (ComEx) software to the GA release bundle. ComEx provides the following features to GA:
- Added port for Cray XE and XC (configure --with-dmapp).
- Added port for MPI-1 two-sided (configure --with-mpi-ts).
For additional information on ComEx, please see the ComEx website.
Added support for externally linkable ARMCI (configure --with-armci=...)
Added ability for users to select a different IB device using the ARMCI_OPENIB_DEVICE environment variable.
Added ability for users to specify upper bound on ARMCI shared memory. Added ARMCI_DEFAULT_SHMMAX_UBOUND which is set and parsed at configure time.
Changed how users link their applications. You now need "-lga -larmci" since libarmci is possibly an external dependency (see above).
Fixed support for Intel/QLogic IB thanks to patches from Dean Luick, Intel.
Improved BLAS library parsing for ACML and MKL ('-mkl').
Improved ScaLAPACK integration thanks to Edo Apra.

5.1.1 October 2012

Added a wrapper for fsync to SF library.
Added MA_ACCESS_INDEX_TYPE to ma library.
Added missing Python C sources.
Changed atomic operations.
Fixed numerous bugs for compilation on IBM AIX, as well as IBM xl compilers.
Fixed many warnings reported by Intel compilers.
Fixed integer overflow for indexing large arrays during accumulate.
Fixed bug in GA_Zgemm64.
Fixed ghosts hanging.
Removed a few debugging print statements from pario.

5.1 February 2012

Added unified "NGA" prefix for all functions
Added new profiling interface and weak symbol interposition support
Added support for struct data types using the new NGA_Register_type(), NGA_Get_field() and NGA_Put_field() functions
Added ga-config for 3rd party software to query compilation flags used for GA
Added Global Arrays in NumPy (GAiN) interface for a NumPy work-alike with a Global Arrays backend
Added GA_MPI_Comm() and other functions to retrieve MPI_Comm object associated with a GA processor group
Added MPI-MT (MPI_THREAD_MULTIPLE) port for use when a native port is not available
Added armci_msg_finalize() to abstract the message passing function required for application termination
Added ability for EAF_Open() to use MA memory operations instead of I/O operations
Changed ARMCI directory structure
Changed NGA_Add_patch() algorithm to use less memory
Changed tascel to no longer be part of the top-level configure (must be installed separately)
Changed Python base module from "ga" to "ga4py" since we now have the submodules ga4py.ga and ga4py.gain
Changed autotools build to use autoconf-2.68 and libtool-2.4.2
Changed ARMCI Fortran sources to use integer*4 type rather than an integer size compiler flag
Fixed numerous configure and source bugs with our ScaLAPACK integration
Fixed bug in NGA_Matmul_patch()
Fixed numerous configure bugs
Fixed numerous Makefile bugs
Fixed support for large files
Improved internal code, reducing the amount of dereferenced pointers
Improved restriction on calling GA_Initalize() before messaging layer -- GA_Initalize(), ARMCI_Init(), MPI_Init()/tcg_pbegin() can be called in any order
Removed deprecated interconnects Myrinet GM, VIA, Mellanox Verbs API, Quadrics Elan-3, Quadrics Elan-4
Removed vampir support
Removed KSR and PARAGON support

5.0.3 February 2012

Added support for Cray compilers
Fixed shared library linking
Fixed a few *critical* bugs in GA_Duplicate()
Fixed bugs in strided get/put/acc routines
Fixed bugs in GPC support
Fixed numerous compilation warnings
Fixed numerous valgrind warnings
Fixed numerous configure bugs
Fixed numerous Makefile bugs
Fixed numerous bugs in Python interface
Fixed bug in GA_Patch_enum()
Fixed bug in TCGMSG-MPI nxtval()
Fixed latency reporting in perf.x benchmark
Fixed Fortran ordering bug in NGA_Scatter(), NGA_Scatter_acc(), and NGA_Gather()
Improved BGP configure
Improved TCGMSG-MPI build
Improved test suite
Improved numerous inefficiencies in Python interface

5.0.2 March 2011

Added support for Sun Studio compilers
Added support for AMD Open64 compilers
Changed ARMCI RMW interface to use void*
Changed GA_Patch_enum() to use void*
Fixed bugs in processor groups code
Fixed numerous compilation warnings
Fixed numerous configure bugs
Fixed numerous Makefile bugs
Fixed bug in GA_Unpack()
Fixed bug in GA_Dgemm() concerning transpose
Fixed numerous bugs in Python interface
Improved ga_scan_copy() and ga_scan_add()

5.0.1 January 2011

Fixed numerous configure bugs
Fixed numerous Makefile bugs
Fixed numerous bugs in test suite
Fixed atomics bug
Fixed numerous tascel bugs
Fixed bug in single complex matrix multiply
Fixed bug in destruction of mutexes
Fixed bug in process group collectives
Fixed bug in GA_Terminate()
Improved configure for NEC and HPUX

5.0 November 2010

Added Restricted Arrays (see user manual)
Added on-demand connection management for infiniband
Added new Python interface
Added Task Scheduling Library (tascel)
Added NGA_Scatter_flat(), NGA_Gather_flat(), NGA_Scatter_acc_flat()
Added NGA_Locate_nnodes()
Changed build system to use GNU autotools (autoconf,automake,libtool)
Improved scalability for fence

5.0b July 2010

Changed build system to use GNU autotools (autoconf,automake,libtool)

4.3 May 2010

Fixed BlueGene/P port
Fixed support for Sparse Data Operations (See GA user manual - Chapter 11 for more details)
Improved portals port to scale upto 200K procs
Improved OpenIB port

4.2 July 2009

Added support for Sparse Data Operations (See GA user manual - Chapter 11 for more details)
Fixed BlueGene/P port
Improved portals port for Cray XT5
Improved OpenIB port

4.1 May 2008

Added port for Cray XT4 network
Added port for BlueGene/L network
Added port for BlueGene/P network
Added port for OpenIB network
Added MPI-SPAWN network (one-sided communication through MPI2 Dynamic Process management and Send/Recv)
Improved one-sided non-blocking operations

4.0 April 2006

Added support for multilevel parallelism: processor group awareness
Added support for large arrays (a terabyte Global Array now possible)
Improved GA_Dgemm matrix multiplication based on SRUMMA
Improved one-sided non-blocking operations