subroutine ga_abs_value(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
Take the element-wise absolute value of the array.
subroutine ga_abs_value_patch(g_a, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | lo(ndim), hi(ndim) | g_a patch coordinates | input |
Collective on the processor group inferred from the arguments.
Take the element-wise absolute value of the patch.
subroutine ga_acc(g_a, ilo, ihi, jlo, jhi, buf, ld, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo, ihi, jlo, jhi | input | |
double precision/complex | buf | local buffer containing data | input |
integer | ld | input | |
double precision/complex | alpha | scale argument for accumulate | input |
subroutine nga_acc(g_a, lo, hi, buf, ld, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer containing data | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
type | alpha | scale argument for accumulate | input |
One-sided (non-collective).
Combines data from local array buffer with data in the global array section. The local array is assumed to be have the same number of dimensions as the global array.
global array section (lo[],hi[]) += *alpha * buffer
subroutine ga_access(g_a, ilo, ihi, jlo, jhi, index, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo, ihi, jlo, jhi | input | |
integer | index | output | |
integer | ld | output |
subroutine nga_access(g_a, lo, hi, index, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of array dimensions | input |
integer | lo(ndim),hi(ndim) | patch specification | input |
integer | index | reference to local data | output |
integer | ld(ndim-1) | array of leading dimensions | output |
Local operation.
Provides access to the specified patch of array. Returns leading dimension ld and and MA-like index for the data. This routine is intended for writing new GA operations. A call to ga_access should normally follow a call to ga_distribution that returns coordinates of the patch associated with a processor. You need to make sure that the coordinates of the patch are valid (test values returned from ga_distribution).
Your code should include a MA include file, mafdecls.h.
dbl_mb(index) - for double precision data int_mb(index) - for integer data dcpl_mb(index) - for double complex data
The addressing convention refers the first element of the patch. However, you can only pass that reference to another subroutine where it could be used like a normal array, see the following example.
Example
For a given subroutine:
subroutine foo(A, nrows, ncols lda) double precision A(lda,*) integer nrows, ncols .... end
you can reference A(ilo:ihi,jlo:jhi) in the following way:
call foo(dbl_mb(index), ihi-ilo+1, jhi-jlo+1, lda)
Note that the size of the integer index must match the platform, 32-bit platforms need to use 4-byte integers and 64-bit platforms need to use 8-byte integers, regardless of whether the other integers in the program are compiled as 4 or 8-byte integers. This can create portability problems when code is being built on different platforms. Users can use the preprocessors symbol GA_ACCESS_INDEX_TYPE to correctly size the integers used as indices across different architectures. The GA_ACCESS_INDEX_TYPE is used instead of an integer declaration and the the global.fh file needs to be included in the subroutine. This will result in code that works for both 32 and 64-bit platforms without having to explicity convert the integer size.
subroutine nga_access_block(g_a, idx, index, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of array dimensions | input |
integer | idx | block index | input |
integer | index | reference to local data | output |
integer | ld(ndim-1) | array of leading dimensions | output |
Local operation.
This function can be used to gain direct access to the data represented by a single block in a global array with a block-cyclic data distribution. The index idx is the index of the block in the array assuming that blocks are numbered sequentially in a column-major order. A quick way of determining whether a block with index idx is held locally on a processor is to calculate whether mod(idx,nproc) equals the processor ID, where nproc is the total number of processors. Once the index has been returned, local data can be accessed as described in the documentation for nga_access. Each call to nga_access_block should be followed by a call to either nga_release_block or nga_release_update_block.
Please check the documentation for the nga_access function for more information on how to use the index returned by this subroutine to access locally held data.
subroutine nga_access_block_grid(g_a, subscript, index, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of array dimensions | input |
integer | subscript(ndim) | subscript of block in array | input |
integer | index | reference to local data | output |
integer | ld(ndim-1) | array of leading dimensions | output |
Local operation.
This function can be used to gain direct access to the data represented by a single block in a global array with a SCALAPACK block-cyclic data distribution that is based on an underlying processor grid. The subscript array contains the subscript of the block in the array of blocks. This subscript is based on the location of the block in a grid, each of whose dimensions is equal to the number of blocks that fit along that dimension. Once the index has been returned, local data can be accessed as described in the documentation for nga_access. Each call to nga_access_block_grid should be followed by a call to either nga_release_block_grid or nga_release_update_block_grid.
subroutine nga_access_block_segment(g_a, proc, index, len)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | proc | processor ID | input |
integer | index | reference to local data | output |
integer | len | length of data on processor | output |
Local operation.
This function can be used to gain access to the all the locally held data on a particular processor that is associated with a block-cyclic distributed array. Once the index has been returned, local data can be accessed as described in the documentation for nga_access. The parameter len is the number of data elements that are held locally. The data inside this segment has a lot of additional structure so this function is not generally useful to developers. It is primarily used inside the GA library to implement other GA routines. Each call to nga_access_block_segment should be followed by a call to either nga_release_block_segment or nga_release_update_block_segment.
subroutine nga_access_ghost_element(g_a, index, subscript, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | index | index pointing to location of element indexed by subscript() | output |
integer | subscript(ndim) | array of integers that index desired element | input |
integer | ld(ndim-1) | array of strides for local data patch. These include ghost cell widths. | output |
Local operation.
This function can be used to return a pointer to any data element in the locally held portion of the global array and can be used to directly access ghost cell data. The array subscript refers to the local index of the element relative to the origin of the local patch (which is assumed to be indexed by (0,0,...)).
To use the index returned by the nga_access_ghost_element subroutine, see the documentation on nga_access.
subroutine nga_access_ghosts(g_a, dims, index, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | dims(ndim) | array of dimensions of local patch, including ghost cells | output |
integer | index | returns an index corresponding to the origin the global array patch held locally on the processor | output |
integer | ld(ndim) | physical dimenstions of the local array patch, including ghost cells | output |
Local operation.
Provides access to the local patch of the global array. Returns leading dimension ld and and pointer for the data. This routine will provide access to the ghost cell data residing on each processor. Calls to nga_access_ghosts should normally follow a call to nga_distribution that returns coordinates of the visible data patch associated with a processor. You need to make sure that the coordinates of the patch are valid (test values returned from nga_distribution).
You can only access local data. To see how to use the index returned by this subroutine, check the documentation on nga_acces.
subroutine ga_add(alpha, g_a, beta, g_b, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | g_b | array handle | input |
integer | g_c | array handle | input |
double precision/complex/integer | alpha,beta | input |
Collective on the processor group inferred from the arguments.
The arrays (which must be the same shape and identically aligned) are added together element-wise.
c = alpha * a + beta * b;
The result (c) may replace one of the input arrays (a/b).
subroutine ga_add_constant(g_a, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double/complex/integer/float | alpha | added value | input |
Collective on the processor group inferred from the arguments.
Add the constant pointed by alpha to each element of the array.
subroutine ga_add_constant_patch(g_a, lo, hi, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of dimensions | input |
integer | lo(ndim), hi(ndim) | patch coordinates | input |
double/complex/integer/float | alpha | added value | input |
Collective on the processor group inferred from the arguments.
Add the constant pointed by alpha to each element of the patch.
subroutine ga_add_diagonal(g_a, g_v)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_v | array handles | input |
Collective on the processor group inferred from the arguments.
Adds the elements of the vector g_v to the diagonal of this matrix g_a.
subroutine ga_add_patch(alpha, g_a, ailo, aihi, ajlo, ajhi,
beta, g_b, bilo, bihi, bjlo, bjhi,
g_c, cilo, cihi, cjlo, cjhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b, g_c | input | |
double precision/complex/integer | alpha, beta | scale factors patches | input |
integer | ailo, aihi, ajlo, ajhi | g_a patch coordinates | input |
integer | bilo, bihi, bjlo, bjhi | g_b patch coordinates | input |
integer | cilo, cihi, cjlo, cjhi | g_c patch coordinates | input |
subroutine nga_add_patch(alpha, g_a, alo, ahi, beta, g_b, blo, bhi
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b, g_c | input | |
double precision/complex/integer | alpha,beta | scale factors for patches | input |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
integer | blo(ndim), bhi(ndim) | g_b patch coordinates | input |
integer | clo(ndim), chi(ndim) | g_c patch coordinates | input |
Collective on the processor group inferred from the arguments.
Patches of arrays (which must have the same number of elements) are added together element-wise.
c[ ][ ] = alpha * a[ ][ ] + beta * b[ ][ ]
void nga_alloc_gatscat_buf(nelems)
Type | Name | Description | Intent |
---|---|---|---|
integer | nelems | maximum number of elements to scatter/gather | input |
Local operation.
This function can be used to enhance the performance when the gather/scatter operations are being called multiple times in succession. If the maximum number of elements being called in any gather/scatter operation is known prior to executing a code segment, then some internal buffers used in the gather/scatter operations can be allocated beforehand instead of at every individual call. This can result in substantial performance boosts in some cases. When the buffers are no longer needed they can be removed using the corresponding free call.
logical function ga_allocate(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
logical | .TRUE. if allocation of g_a was successful | output |
Collective on the processor group inferred from the arguments.
This function allocates the memory for the global array handle originally obtained using the GA_Create_handle function. At a minimum, the GA_Set_data function must be called before the memory is allocated. Other GA_Set_xxx functions can also be called before invoking this function.
Returns True if allocation of g_a was successful.
subroutine ga_brdcst(type, buf, lenbuf, root)
Type | Name | Description | Intent |
---|---|---|---|
integer | type | input | |
byte | buf(lenbuf) | buffer contain broadcasted/received data | input/output |
integer | lenbuf | input | |
integer | root | input |
Collective on the world processor group.
Broadcast from process root to all other processes a message of length lenbuf.
This is operation is provided only for convenience purposes: it is available regardless of the message-passing library that GA is running.
subroutine ga_check_handle(g_a, string)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
character(*)* | string | message string | input |
Local operation.
Check that the global array handle g_a is valid ... if not, call ga_error with the string provided and some more info.
integer function ga_cluster_nnodes()
Local operation.
This functions returns the total number of nodes that the program is running on. On SMP architectures, this will be less than or equal to the total number of processors.
integer function ga_cluster_nodeid()
Local operation.
This function returns the node ID of the process. On SMP architectures with more than one processor per node, several processes may return the same node id.
integer function ga_cluster_nprocs(inode)
Type | Name | Description | Intent |
---|---|---|---|
integer | inode | node id | input |
Local operation.
This function returns the number of processors available on node inode.
integer function ga_cluster_proc_nodeid(proc)
Type | Name | Description | Intent |
---|---|---|---|
integer | proc | process id | input |
Local operation.
This function returns the node ID of the specified process proc. On SMP architectures with more than one processor per node, several processes may return the same node id.
integer function ga_cluster_procid(inode,iproc)
Type | Name | Description | Intent |
---|---|---|---|
integer | inode | node id | input |
integer | iproc | processor id | input |
Local operation.
This function returns the processor id associated with node inode and the local processor ID iproc. If node inode has N processors, then the value of iproc lies between 0 and N-1.
logical function ga_compare_distr(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
logical | .TRUE. if distributions are identical | output |
Collective on the processor group inferred from the arguments.
Compares distributions of two global arrays. Returns 0 if distributions are identical and 1 when they are not.
subroutine ga_copy(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | g_b | array handle | input |
Collective on the processor group inferred from the arguments.
Copies elements in array represented by g_a into the array represented by g_b. The arrays must be the same type, shape, and identically aligned.
For patch operations, the patches of arrays may be of different shapes but must have the same number of elements. Patches must be nonoverlapping (if g_a=g_b). Transposes are allowed for patch operations.
subroutine ga_copy_patch(trans, g_a, ailo, aihi, ajlo, ajhi,
g_b, bilo, bihi, bjlo, bjhi)
Type | Name | Description | Intent |
---|---|---|---|
character | trans | transpose operator | input |
integer | g_a, g_b | input | |
integer | ailo, aihi, ajlo, ajhi | g_a patch coordinates | input |
integer | bilo, bihi, bjlo, bjhi | g_b patch coordinates | input |
subroutine nga_copy_patch(trans, g_a, alo, ahi, g_b, blo, bhi)
Type | Name | Description | Intent |
---|---|---|---|
character | trans | transpose operator | input |
integer | g_a, g_b | input | |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
integer | blo(ndim), bhi(ndim) | g_b patch coordinates | input |
Collective on the processor group inferred from the arguments.
Copies elements in a patch of one array into another one. The patches of arrays may be of different shapes but must have the same number of elements. Patches must be non-overlapping (if g_a=g_b).
trans = `N' or `n' means that the transpose operator should not be applied. trans = `T' or `t' means that transpose operator should be applied.
logical function ga_create(type, dim1, dim2, array_name, chunk1, chunk2, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | dim1,dim2 | array (dim1,dim2) as in FORTRAN | input |
integer | chunk1,chunk2 | minimum size that dimensions should be chunked up into | input |
integer | g_a | handle for future references | output |
logical function nga_create(type, ndim, dims, array_name, chunk, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | chunk(ndim) | array of chunks, each element specifies minimum size that given dimensions should be chunked up into | input |
integer | g_a | integer handle for future references | output |
logical | .TRUE. if array creation was successful | output |
Collective on the default processor group.
Creates an ndim-dimensional array using the regular distribution model and returns an integer handle representing the array.
The array can be distributed evenly or not. The control over the distribution is accomplished by specifying chunk (block) size for all or some of array dimensions. For example, for a 2-dimensional array, setting chunk[0]=dim[0] gives distribution by vertical strips (chunk[0]*dims[0]); setting chunk[1]=dim[1] gives distribution by horizontal strips (chunk[1]*dims[1]). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk[i] as less than 1 will cause that dimension to be distributed evenly.
As a convenience, when chunk is specified as NULL, the entire array is distributed evenly.
Return value: a non-zero array handle means the call was succesful.
logical function nga_create_config(type, ndim, dims, array_name, chunk,
p_handle, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | chunk(ndim) | array of chunks, each element specifies minimum size that given dimensions should be chunked up into | input |
integer | p_handle | processor group handle | input |
integer | g_a | integer handle for future references | output |
logical | .TRUE. if array creation successful | output |
Collective on the default processor group.
Creates an ndim-dimensional array using the regular distribution model but with an explicitly specified processor group handle and returns an integer handle representing the array.
This call is essentially the same as the nga_create call, except for the processor group handle p_handle. It can also be used to create mirrored arrays.
Return value: a non-zero array handle means the call was succesful.
logical function nga_create_ghosts(type, ndim, dims, width, array_name,
chunk, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | width(ndim) | array of ghost cell widths | input |
integer | chunk(ndim) | array of chunks, each element specifies minimum size that given dimensions should be chunked up into | input |
integer | g_a | integer handle for future references | output |
logical | .TRUE. if array creation succesful | output |
Collective on the default processor group.
Creates an ndim-dimensional array with a layer of ghost cells around the visible data on each processor using the regular distribution model and returns an integer handle representing the array.
The array can be distributed evenly or not evenly. The control over the distribution is accomplished by specifying chunk (block) size for all or some of the array dimensions. For example, for a 2-dimensional array, setting chunk(1)=dim(1) gives distribution by vertical strips (chunk(1)*dims(1)); setting chunk(2)=dim(2) gives distribution by horizontal strips (chunk(2)*dims(2)). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk(i) as < 1 will cause that dimension (i-th) to be distributed evenly. The width of the ghost cell layer in each dimension is specified using the array width(). The local data of the global array residing on each processor will have a layer width[n] ghosts cells wide on either side of the visible data along the dimension n.
Return value: a non-zero array handle means the call was successful.
logical function nga_create_ghosts_config(type, ndim, dims, width, array_name,
chunk, p_handle, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | width(ndim) | array of ghost cell widths | input |
integer | chunk(ndim) | array of chunks, each element specifies minimum size that given dimensions should be chunked up into | input |
integer | p_handle | processor group handle | input |
integer | g_a | integer handle for future references | output |
logical | .TRUE. if array creation successful | output |
Collective on the default processor group.
Creates an ndim-dimensional array with a layer of ghost cells around the visible data on each processor using the regular distribution model and an explicitly specified processor list and returns an integer handle representing the array.
This call is essentially the same as the NGA_Create_ghosts call, except for the processor list handle p_handle. It can be used to create mirrored arrays.
Return value: a non-zero array handle means the call was successful.
logical function nga_create_ghosts_irreg(type, ndim, dims, width, array_name, map, nblock, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | width(ndim) | array of ghost cell widths | input |
integer | nblock(ndim) | no. of blocks each dimension is divided into | input |
integer | map(s) | starting index for for each block; the size s is a sum of all elements of nblock array | input |
integer | g_a | integer handle for future references | output |
Collective on the default processor group.
Creates an array with ghost cells by following the user-specified distribution and returns an integer handle representing the array.
The distribution is specified as a Cartesian product of distributions for each dimension.
Figure "crghostir" below demonstrates distribution of a 2-dimensional array 8x10 on 6 (or more) processors.
nblock(2)={3,2}, the size of map array is s=5 and the array map contains the following elements map={1,3,7, 1, 6}. The distribution is nonuniform because, P1 and P4 get 20 elements each and processors P0, P2, P3, and P5 only 10 elements each.
The array width is used to control the width of the ghost cell boundary around the visible data on each processor. The local data of the Global Array residing on each processor will have a layer width[n] ghosts cells wide on either side of the visible data along the dimension n.
Return value: a non-zero array handle means the call was succesful.
logical function nga_create_ghosts_irreg_config(type, ndim,
dims, width, array_name,
map, nblock,
p_handle, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | width(ndim) | array of ghost cell widths | input |
integer | nblock(ndim) | no. of blocks each dimension is divided into | input |
integer | map(s) | starting index for for each block; the size s is a sum of all elements of nblock array | input |
integer | p_handle | processor group handle | input |
integer | g_a | integer handle for future references | output |
Collective on the default processor group.
Creates an array with ghost cells by following the user-specified distribution and returns an integer handle representing the array. The user can specify that the array is created on a particular processor group.
This call is similar to the nga_create_ghosts_irreg call.
Return value: a non-zero array handle means the call was succesful.
integer function ga_create_handle()
Collective on the default processor group.
This function returns a Global Array handle that can then be used to create a new Global Array. This is part of a new API for creating Global Arrays that is designed to replace the old interface built around the NGA_Create_xxx calls. The sequence of operations is to begin with a call to GA_Greate_handle to get a new array handle. The attributes of the array, such as dimension, size, type, etc., can then be set using successive calls to the GA_Set_xxx subroutines. When all array attributes have been set, the GA_Allocate subroutine is called and the Global Array is actually created and memory for it is allocated.
logical function ga_create_irreg(type, dim1, dim2, array_name, map1,
nblock1, map2, nblock2, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | dim1,dim2 | array (dim1,dim2) as in FORTRAN | input |
integer | nblock1 | no. of blocks first dimension is divided into | input |
integer | nblock2 | no. of blocks second dimension is divided into | input |
integer | map1(*) | ilo for each block | input |
integer | map2(*) | jlo for each block | input |
integer | g_a | integer handle for future references | output |
logical | .TRUE. if array creation successful | output |
logical function nga_create_irreg(type, ndim, dims, array_name, map,
nblock, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | nblock(ndim) | no. of blocks each dimension is divided into | input |
integer | map(s) | starting index for for each block; the size s is a sum of all elements of nblock array | input |
integer | g_a | integer handle for future references | output |
Collective on the default processor group.
Creates an array by following the user-specified distribution and returns an integer handle representing the array.
The distribution is specified as a Cartesian product of distributions for each dimension. The array indices start at 0. For example, Figure "crirreg" below demonstrates the distribution of a 2-dimensional 8x10 array on 6 (or more) processors.
nblock(2)={3,2}, the size of the map array is s=5 and the array map contains the following elements map={1,3,7,1,6}. The distribution is nonuniform because P1 and P4 get 20 elements each and processors P0, P2, P3, and P5 only 10 elements each.
Return value: a non-zero array handle means the call was succesful.
logical function nga_create_irreg_config(type, ndim, dims, array_name,
map, nblock, p_handle, g_a)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a unique character string | input |
integer | type | data type (MT_DBL,MT_INT,MT_DCPL,etc.) | input |
integer | ndim | number of array dimensions | input |
integer | dims(ndim) | array of dimensions | input |
integer | nblock(ndim) | no. of blocks each dimension is divided into | input |
integer | map(s) | starting index for for each block; the size s is a sum of all elements of nblock array | input |
integer | p_handle | processor group handle | input |
integer | g_a | integer handle for future references | output |
Collective on the default processor group.
Creates an array by following the user-specified distribution and an explicitly specified processor group handle and returns an integer handle representing the array.
This call is essentially the same as the nga_create_irreg call, except for the processor group handle p_handle. It can also be used to create arrays on mirrored arrays.
Return value: a non-zero array handle means the call was succesful.
logical function ga_create_mutexes(number)
Type | Name | Description | Intent |
---|---|---|---|
integer | number | input |
Collective on the world processor group.
Creates a set containing the number of mutexes. Returns 0 if the operation succeeded or 1 if it has failed. Mutex is a simple synchronization object used to protect Critical Sections. Only one set of mutexes can exist at a time. An array of mutexes can be created and destroyed as many times as needed.
Mutexes are numbered: 0, ..., number-1.
Returns: True on success, False on failure
logical function ga_destroy(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
Deallocates the array and frees any associated resources.
logical function ga_destroy_mutexes()
Collective on the world processor group.
Destroys the set of mutexes created with ga_create_mutexes. Returns 0 if the operation succeeded or 1 when failed.
subroutine ga_diag(g_a, g_s, g_v, eval)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | Matrix to diagonalize | input |
integer | g_s | Metric | input |
integer | g_v | Global matrix to return evecs | output |
double precision | eval(*) | Local array to return evals | output |
Collective on the processor group inferred from the arguments.
Solve the generalized eigenvalue problem returning all eigenvectors and values in ascending order. The input matrices are not overwritten or destroyed.
All eigen-values as a vector in ascending order.
subroutine ga_diag_reuse(control, g_a, g_s, g_v, eval)
Type | Name | Description | Intent |
---|---|---|---|
integer | control | Control flag | input |
integer | g_a | Matrix to diagonalize | input |
integer | g_s | Metric | input |
integer | g_v | Global matrix to return evecs | output |
double precision | eval(*) | Local array to return evals | output |
Collective on the processor group inferred from the arguments.
Solve the generalized eigenvalue problem returning all eigenvectors and values in ascending order. Recommended for REPEATED calls if g_s is unchanged. Values of the control flag:
value action/purpose 0 indicates first call to the eigensolver >0 consecutive calls (reuses factored g_s) <0 only erases factorized g_s; g_v and eval unchanged (should be called after previous use if another eigenproblem, i.e., different g_a and g_s, is to be solved)
The input matrices are not destroyed.
Returns: All eigen-values as a vector in ascending order.
subroutine ga_diag_std(g_a, g_v, eval)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | Matrix to diagonalize | input |
integer | g_v | Global matrix to return evecs | output |
double precision | eval(*) | Local array to return evals | output |
Collective on the processor group inferred from the arguments.
Solve the standard (non-generalized) eigenvalue problem returning all eigenvectors and values in the ascending order. The input matrix is neither overwritten nor destroyed.
Returns: all eigenvectors via the g_v global array, and eigenvalues as an array in ascending order
subroutine ga_distribution(g_a, iproc, ilo, ihi, jlo, jhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | iproc | process number | input |
integer | ilo,ihi | i-range held by process iproc | output |
integer | jlo,jhi | j-range held by process iproc | output |
subroutine nga_distribution(g_a, iproc, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | iproc | process number | input |
integer | ndim | number of dimensions | input |
integer | lo(ndim),hi(ndim) | range held by process iproc | output |
Local operation.
This function returns the bounding indices of the block owned by the process iproc. These indices are inclusive.
If no array elements are owned by process iproc, the range is returned as lo()=0 and hi()= -1 for all dimensions.
double precision function ga_ddot(g_a, g_b)
double complex function ga_zdot(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | g_b | array handle | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise dot product of the two arrays which must be of the same types and same number of elements.
Return value = SUM_ij a(i,j)*b(i,j)
double precision function ga_ddot_patch (g_a, ta, ailo, aihi, ajlo, ajhi,
g_b, tb, bilo, bihi, bjlo, bjhi)
double complex function ga_zdot_patch (g_a, ta, ailo, aihi, ajlo, ajhi,
g_b, tb, bilo, bihi, bjlo, bjhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | input | |
integer | ailo, aihi, ajlo, ajhi | g_a patch coordinates | input |
integer | bilo, bihi, bjlo, bjhi | g_b patch coordinates | input |
character*1 | ta, tb | transpose flags | input |
double precision function nga_ddot_patch (g_a, ta, alo, ahi,
g_b, tb, bio, bhi)
double complex function nga_zdot_patch (g_a, ta, alo, ahi,
g_b, tb, blo, bhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | input | |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
integer | blo(ndim), bhi(ndim) | g_b patch coordinates | input |
character*1 | ta, tb | transpose flags | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise dot product of the two (possibly transposed) patches which must be of the same type and have the same number of elements.
logical function ga_duplicate(g_a, g_b, array_name)
Type | Name | Description | Intent |
---|---|---|---|
character*(*) | array_name | a character string | input |
integer | g_a | Integer handle for reference array | input |
integer | g_b | Integer handle for new array | output |
logical | .TRUE. if array creation successful | output |
Collective on the processor group inferred from the arguments.
Creates a new array by applying all the properties of another existing array. It returns an array handle.
Return value: a non-zero array handle means the call was succesful.
subroutine ga_elem_divide(g_a, g_b, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
Collective on the processor group inferred from the arguments.
Computes the element-wise quotient of the two arrays which must be of the same types and same number of elements. For two-dimensional arrays,
c(i,j) = a(i,j)/b(i,j)
The result (c) may replace one of the input arrays (a/b). If one of the elements of array g_b is zero, the quotient for the element of g_c will be set to GA_NEGATIVE_INFINITY.
subroutine ga_elem_divide_patch(g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch dimensions | input |
integer | blo(ndim), bhi(ndim) | g_b patch dimensions | input |
integer | clo(ndim), chi(ndim) | g_c patch dimensions | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise quotient of the two patches which must be of the same types and same number of elements. For two-dimensional arrays,
c(i,j) = a(i,j)/b(i,j)
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_maximum(g_a, g_b, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
Collective on the processor group inferred from the arguments.
Computes the element-wise maximum of the two arrays which must be of the same types and same number of elements. For two dimensional arrays,
c(i,j) = max{a(i,j), b(i,j)}
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_maximum_patch(g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch dimensions | input |
integer | blo(ndim), bhi(ndim) | g_b patch dimensions | input |
integer | clo(ndim), chi(ndim) | g_c patch dimensions | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise maximum of the two patches which must be of the same types and same number of elements. For two-dimensional noncomplex arrays,
c(i,j) = max{a(i,j), b(i,j)}
If the data type is complex, then
c(i,j).real = max{ |a(i,j)|, |b(i,j)| } while c(i,j).image = 0.
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_minimum(g_a, g_b, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
Collective on the processor group inferred from the arguments.
Computes the element-wise minimum of the two arrays which must be of the same types and same number of elements. For two dimensional arrays,
c(i,j) = min{a(i,j), b(i,j)}
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_minimum_patch(g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_b | array handles | input |
integer | g_c | array handle | output |
integer | ndim | number of dimensions | input |
integer | alo(ndim),ahi(ndim) | g_a patch dimensions | input |
integer | blo(ndim),bhi(ndim) | g_b patch dimensions | input |
integer | clo(ndim),chi(ndim) | g_c patch dimensions | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise minimum of the two patches which must be of the same types and same number of elements. For two-dimensional of noncomplex arrays,
c(i,j) = min{a(i,j), b(i,j)}
If the data type is complex, then
c(i,j).real = min{ |a(i,j)|, |b(i,j)| } while c(i,j).image = 0.
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_multiply(g_a, g_b, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | input array handles | input |
integer | g_c | output array handle | output |
Collective on the processor group inferred from the arguments.
Computes the element-wise product of the two arrays which must be of the same types and same number of elements. For two-dimensional arrays,
c(i, j) = a(i,j)*b(i,j)
The result (c) may replace one of the input arrays (a/b).
subroutine ga_elem_multiply_patch(g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | array handles | input |
integer | g_c | array handle | output |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch dimensions | input |
integer | blo(ndim), bhi(ndim) | g_b patch dimensions | input |
integer | clo(ndim), chi(ndim) | g_c patch dimensions | input |
Collective on the processor group inferred from the arguments.
Computes the element-wise product of the two patches which must be of the same types and same number of elements. For two-dimensional arrays,
c(i,j) = a(i,j)*b(i,j)
The result (c) may replace one of the input arrays (a/b).
subroutine ga_error(message, code)
Type | Name | Description | Intent |
---|---|---|---|
character*1 | message(*) | input | |
integer | code | input |
Local operation.
To be called in case of an error. Print an error message and an integer value that represents an error code as well as releasing some system resources. This is the required way of aborting the program execution.
subroutine ga_fence()
One-sided (non-collective).
Blocks the calling process until all the data transfers corresponding to GA operations called after ga_init_fence complete. For example, since ga_put might return before the data reaches the final destination, ga_init_fence and ga_fence allow the process to wait until the data tranfer is fully completed:
ga_init_fence(); ga_put(g_a, ...); ga_fence();
ga_fence must be called after ga_init_fence. A barrier, ga_sync, assures the completion of all data transfers and implicitly cancels all outstanding ga_init_fence calls. ga_init_fence and ga_fence must be used in pairs, multiple calls to ga_fence require the same number of corresponding ga_init_fence calls. ga_init_fence/ga_fence pairs can be nested.
ga_fence works for multiple GA operations. For example:
ga_init_fence(); ga_put(g_a, ...); ga_scatter(g_a, ...); ga_put(g_b, ...); ga_fence();
The calling process will be blocked until data movements initiated by two calls to ga_put and one ga_scatter complete.
subroutine ga_fill(g_a, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double precision/complex/integer | s | fill value | input |
Collective on the processor group inferred from the arguments.
Assign a single value to all elements in the array.
subroutine ga_fill_patch(g_a, ailo, aihi, ajlo, ajhi, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision/complex/integer | s | fill value | input |
integer | ailo, aihi, ajlo, ajhi | g_a patch coordinates | input |
subroutine nga_fill_patch (g_a, alo, ahi, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision/complex/integer | s | fill value | input |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
Collective on the processor group inferred from the arguments.
Fill the patch of g_a with value of `val'
void nga_free_gatscat_buf()
Local operation.
This function is used to free up internal buffers that were set with the corresponding allocation call. The buffers can be used to improve performance if multiple calls are being made to the gather/scatter operations.
subroutine ga_gather(g_a, v, i, j, n)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision | v(n) | output | |
integer | i(n), j(n), n | input |
subroutine nga_gather(g_a, v, subsArray, n)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | n | number of elements | input |
type | v(n) | array containing values | output |
integer | ndim | number of array dimensions | input |
integer | subsArray(ndim,n) | array of subscripts for each element | input |
One-sided (non-collective).
Gathers array elements from a global array into a local array. The contents of the input arrays (v, subsArray) are preserved.
for (k=0; k<= n; k++) {v[k] = a[subsArray[k][0]][subsArray[k][1]][subsArray[k][2]]...;}
subroutine GA_Dgemm(ta, tb, m, n, k, alpha, g_a, g_b, beta, g_c)
subroutine GA_Sgemm(ta, tb, m, n, k, alpha, g_a, g_b, beta, g_c)
subroutine GA_Zgemm(ta, tb, m, n, k, alpha, g_a, g_b, beta, g_c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, g_b | handles to input arrays | input |
integer | g_c | handle to output array | output |
character(1) | ta, tb | transpose operators | input |
integer | m | number of rows of op(A) and of matrix C | input |
integer | n | number of columns of op(B) and of matrix C | input |
integer | k | number of columns of op(A) and rows of matrix op(B) | input |
double precision/double complex/real | alpha, beta | scale factors | input |
Collective on the processor group inferred from the arguments.
Performs one of the matrix-matrix operations:
C := alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B, and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
On entry, transa specifies the form of op( A ) to be used in the matrix multiplication as follows:
ta = `N' or `n', op( A ) = A. ta = `T' or `t', op( A ) = A'.
subroutine ga_get(g_a, ilo, ihi, jlo, jhi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ilo, ihi, jlo, jhi | input | |
double precision/complex/integer | buf | local buffer where data goes to | output |
integer | ld | leading dimension | input |
subroutine nga_get(g_a, lo, hi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the data goes to | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
Copies data from global array section to the local array buffer. The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsistencies or errors in the input arguments are fatal.
Example: For the ga_get operation transfering data from the [10:14, 0:4] section of 2-dimensional 15x10 global array into a local buffer 5x10 array we have:
lo={10,0,} hi={14,4}, ld={10}
Figure "get" below shows the GET operation.
Return: The local array buffer.
subroutine ga_get_block_info(g_a, num_blocks, block_dims)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | num_blocks(ndim) | number of blocks along each axis | output |
integer | block_dims(ndim) | dimensions of block | output |
Local operation.
This subroutine returns information about the block-cyclic distribution associated with global array g_a. The number of blocks along each of the array axes are returned in the array num_blocks and the dimensions of the individual blocks, specified in the GA_Set_block_cyclic or GA_Set_block_cyclic_proc_grid subroutines, are returned in block_dims.
This is a local function.
logical function ga_get_debug()
Local operation.
This function returns the value of an internal flag in the GA library whose value can be set using the GA_Set_debug subroutine.
subroutine ga_get_diag(g_a, g_v)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | g_v | array handle | input |
Collective on the processor group inferred from the arguments.
Inserts the diagonal elements of this matrix g_a into the vector g_v.
subroutine nga_get_ghost_block(g_a, lo, hi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the data goes to | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
This operation behaves similarly to the GA Get operation and will default to a standard Get if the global array has no ghost cells or the request does not fit into the ghost cell region of the calling processor. It is easier to use than the NGA_Access_ghosts function but it may be slower and requires more memory. The operation will copy data from the locally held portions of the global array, including the ghost cells, if the requested block falls within the region defined by visible block held by the process plus the ghost cell region. For example, if the process holds the visible block [2:8, 2:8] with a ghost cell width that is one element deep, then a request for the block [1:9,1:9] will use the local ghost cells to fill the local buffer. In this case, the data transfer is completely local. If the request is for a block such as [1:10,1:10], which would require data from another process, then the NGA_Get_ghost_block call reverts to an ordinary NGA_Get operation and ignores the locally held ghost data.
Return: The local array buffer.
subroutine ga_igop(type, x, n, op)
subroutine ga_sgop(type, x, n, op)
subroutine ga_dgop(type, x, n, op)
subroutine ga_cgop(type, x, n, op)
subroutine ga_zgop(type, x, n, op)
Type | Name | Description | Intent |
---|---|---|---|
integer | type | this argument is deprecated | input |
integer | x(n) | local array of initial/final values | input/output |
real | x(n) | local array of initial/final values | input/output |
double precision | x(n) | local array of initial/final values | input/output |
single complex | x(n) | local array of initial/final values | input/output |
double complex | x(n) | local array of initial/final values | input/output |
character*(*) | op | global operation | input |
Collective on the world processor group.
Global OPeration.
X(1:N) is a vector present on each process. GOP `sums' elements of X accross all nodes using the commutative operator OP. The result is broadcast to all nodes. Supported operations include `+', `*', `max', `min', `absmax', `absmin'. The use of lowerecase for operators is necessary.
This operation is provided only for convenience purposes: it is available regardless of the message-passing library that GA is running with.
logical function ga_has_ghosts(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
logical | .TRUE. if array has ghost cells | output |
Collective on the processor group inferred from the arguments.
This function returns 1 if the global array has some dimensions for which the ghost cell width is greater than zero, it returns 0 otherwise.
subroutine ga_init_fence()
Local operation.
Initializes tracing of the completion status of data movement operations.
subroutine nga_initialize()
subroutine ga_initialize()
Collective on the processor group inferred from the arguments.
Allocate and initialize internal data structures in Global Arrays.
subroutine ga_initialize_ltd(limit)
Type | Name | Description | Intent |
---|---|---|---|
integer | limit | amount of memory in bytes per process | input |
Collective on the processor group inferred from the arguments.
Allocate and initialize internal data structures and set the limit for memory used in Global Arrays. The limit is per process: it is the amount of memory that the given processor can contribute to collective allocation of Global Arrays. It does not include temporary storage that GA might be allocating (and releasing) during execution of a particular operation.
*limit < 0 means "allow unlimited memory usage" in which case this operation is equivalent to GA_initialize.
subroutine ga_inquire(g_a, type, dim1, dim2)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | type | output | |
integer | dim1 | output | |
integer | dim2 | output |
subroutine nga_inquire(g_a, type, ndim, dims)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | type | data type id | output |
integer | ndim | number of dimensions | output |
integer | dims(ndim) | array of dimensions | output |
Local operation.
Returns data type and dimensions of the array.
integer function ga_inquire_memory()
Returns the amount of memory (in bytes) used in the allocated global arrays on the calling processor.
subroutine ga_inquire_name(g_a, array_name)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
character*(*) | array_name | output |
Local operation.
Returns the name of an array represented by the handle g_a.
integer ga_is_mirrored(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Local operation.
This subroutine checks if the array is a mirrored array or not. Returns 1 if it is a mirrored array, else it returns 0.
integer function ga_llt_solve(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | coefficient matrix | input |
integer | g_b | rhs/solution matrix | input/output |
Collective on the processor group inferred from the arguments.
Solves a system of linear equations
A * X = B
using the Cholesky factorization of an NxN double precision symmetric positive definite matrix A (represented by handle g_a). On successful exit B will contain the solution X.
It returns:
= 0 : successful exit > 0 : the leading minor of this order is not positive definite and the factorization could not be completed.
logical function ga_locate(g_a, i, j, owner)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | i, j | element subscript | input |
integer | owner | process id | output |
logical function nga_locate(g_a, subscript, owner)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | subscript | element subscript | input |
integer | owner | process id | output |
Local operation.
Return the GA compute process ID that 'owns' the data. If any element of subscript[] is out of bounds "-1" is returned.
logical function ga_locate_region(g_a, ilo, ihi, jlo, jhi, map, np)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ilo, ihi, jlo, jhi | bounding indices for array section | input |
integer | map(5,*) | array containing mapping information | output |
integer | np | number of processors containing data | output |
logical function nga_locate_region(g_a, lo, hi, map, proclist, np)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of dimensions | input |
integer | lo(ndim),hi(ndim) | region(patch) specifications | input |
integer | map(2*ndim,*) | patch ownership array | output |
integer | proclist(np) | list of processes | output |
integer | np | number of processes containing data | output |
map(1:ndim,i) | contains lower bound dimensions for part owned by process proclist(i) | input | |
map(ndim+1:2*ndim,i) | contains upper bound dimensions for part owned by process proclist(i) | input |
Local operation.
Return a list of the GA processes ID that `own' the data. Parts of the specified patch might be actually `owned' by several processes. If lo/hi are out of bounds "0" is returned, otherwise the return value is equal to the number of processes that hold the data.
map(1:ndim,i) - lo(1:ndim) map(ndim+1:2*ndim,i) - hi(1:ndim) procs(i) - processor id that owns data in patch described by lo,hi
subroutine ga_lock(mutex)
Type | Name | Description | Intent |
---|---|---|---|
integer | mutex | mutex id | input |
One-sided (non-collective).
Locks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to lock a mutex which was already locked by this process.
subroutine ga_lu_solve(trans, g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
character | trans | transpose or not transpose | input |
integer | g_a | coefficient matrix | input |
integer | g_b | rhs/solution matrix | input/output |
Collective on the processor group inferred from the arguments.
Solve the system of linear equations op(A)X = B based on the LU factorization.
op(A) = A or A' depending on the parameter trans:
trans = `N' or `n' means that the transpose operator should not be applied. trans = `T' or `t' means that the transpose operator should be applied.
Matrix A is a general real matrix. Matrix B contains possibly multiple rhs vectors. The array associated with the handle g_b is overwritten by the solution matrix X.
subroutine ga_mask_sync(first,last)
Type | Name | Description | Intent |
---|---|---|---|
logical | first | mask for prior internal synchronization | input |
logical | last | mask for post internal synchronization | input |
Collective on the default processor group.
This subroutine can be used to remove synchronization calls from around collective operations. Setting the parameter first = .false. removes the synchronization prior to the collective operation, setting last = .false. removes the synchronization call after the collective operation. This call is applicable to all collective operations. It most be invoked before each collective operation.
subroutine ga_matmul_patch (transa, transb, alpha, beta,
g_a, ailo, aihi, ajlo, ajhi,
g_b, bilo, bihi, bjlo, bjhi,
g_c, cilo, cihi, cjlo, cjhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, ailo, aihi, ajlo, ajhi | patch of g_a | input |
integer | g_b, bilo, bihi, bjlo, bjhi | patch of g_b | input |
integer | g_c, cilo, cihi, cjlo, cjhi | patch of g_c | input |
double precision/complex | alpha, beta | scale factors | input |
character*1 | transa, transb | transpose operators | input |
subroutine nga_matmul_patch(transa, transb, alpha, beta,
g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a, alo, ahi | patch of g_a | input |
integer | g_b, blo, bhi | patch of g_b | input |
integer | g_c, clo, chi | patch of g_c | input |
double precision/complex | alpha, beta | scale factors | input |
character*1 | transa, transb | transpose operators | input |
Collective on the processor group inferred from the arguments.
ga_matmul_patch is a patch version of ga_dgemm and comes in 2-D and N-D versions. The 2-D interface performs the operation:
C[cilo:cihi,cjlo:cjhi] := alpha* AA[ailo:aihi,ajlo:ajhi] * BB[bilo:bihi,bjlo:bjhi] ) + beta*C[cilo:cihi,cjlo:cjhi],
where AA = op(A), BB = op(B), and op(X) is one of
op(X) = X or op(X) = X',
Valid values for transpose arguments: 'n', 'N', 't', 'T'. It works for both double and double complex data tape.
nga_matmul_patch is a N-dimensional patch version of ga_dgemm and is similar to the 2-D interface:
C[clo[]:chi[]] := alpha* AA[alo[]:ahi[]] * BB[blo[]:bhi[]] ) + beta*C[clo[]:chi[]],
subroutine ga_median(g_a, g_b, g_c, g_m)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input array handle | input |
integer | g_b | input array handle | input |
integer | g_c | input array handle | input |
integer | g_m | output array handle | input |
Collective on the processor group inferred from the arguments.
Computes the componentwise Median of three arrays g_a, g_b, and g_c, and stores the result in this array g_m. The result (m) may replace one of the input arrays (a/b/c).
subroutine ga_median_patch(g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi,
g_m, mlo, mhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input array handle | input |
integer | g_b | input array handle | input |
integer | g_c | input array handle | input |
integer | g_m | output array handle | input |
integer | ndim | number of dimensions | input |
integer | alo(ndim),ahi(ndim) | g_a patch dimensions | input |
integer | blo(ndim),bhi(ndim) | g_b patch dimensions | input |
integer | clo(ndim),chi(ndim) | g_c patch dimensions | input |
integer | mlo(ndim),mhi(ndim) | g_m patch dimensions | input |
Collective on the processor group inferred from the arguments.
Computes the componentwise Median of three patches g_a, g_b, and g_c, and stores the result in this patch g_m. The result (m) may replace one of the input patches (a/b/c).
integer function ga_memory_avail()
Local operation.
Returns amount of memory (in bytes) left for allocation of new global arrays on the calling processor.
Note: If GA_uses_ma returns true, then GA_Memory_avail returns the lesser of the amount available under the GA limit and the amount available from MA (according to ma_inquire_avail operation). If no GA limit has been set, it returns what MA says is available.
If ( !GA_Uses_ma() && !GA_Memory_limited() ) returns < 0, indicating that the bound on currently available memory cannot be determined.
logical function ga_memory_limited()
Local operation.
Indicates if limit is set on memory usage in Global Arrays on the calling processor. "1" means "yes", "0" means "no".
Returns: True for "yes", False for "no"
integer nga_merge_distr_patch(g_a, alo, ahi, g_b, blo, bhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_b | array handles | input |
integer | alo(ndim),ahi(ndim) | g_a patch coordinates | input |
integer | blo(ndim),bhi(ndim) | g_b patch coordinates | input |
Collective on the processor group inferred from the arguments.
This function merges all copies of a patch of a mirrored array (g_a) into a patch in a distributed array (g_b).
subroutine ga_merge_mirrored(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
This subroutine merges mirrored arrays by adding the contents of each array across nodes. The result is that each mirrored copy of the array represented by g_a is the sum of the individual arrays before the merge operation. After the merge, all mirrored arrays are equal.
subroutine ga_nbacc(g_a, ilo, ihi, jlo, jhi, buf, ld, alpha, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ilo,ihi | starting indices for global array section | input |
integer | jlo,jhi | ending indices for global array section | input |
buf | local buffer array where the data is | input | |
integer | ld | leading dimension/stride/extent for buffer array | input |
double precision/complex | alpha | scale factor | input |
integer | nbhandle | non-blocking request handle | output |
subroutine nga_nbacc(g_a, lo, hi, buf, ld, alpha, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo[ndim] | array of starting indices for global array section | input |
integer | hi[ndim] | array of ending indices for global array section | input |
buf | local buffer array where the data is | input | |
integer | ld[ndim-1] | array specifying leading dimensions/strides/extents for buffer array | input |
double precision/complex | alpha | scale factor | input |
integer | nbhandle | non-blocking request handle | output |
One-sided (non-collective).
A non-blocking version of the blocking accumulate operation. The accumulate operation can be completed locally by making a call to the wait (e.g., NGA_NbWait) routine.
Non-blocking version of ga.acc.
The accumulate operation can be completed locally by making a call to the ga.nbwait() routine.
Combines data from buffer with data in the global array patch.
The buffer array is assumed to be have the same number of dimensions as the global array. If the buffer is not contiguous, a contiguous copy will be made.
global array section (lo[],hi[]) += alpha * buffer
subroutine ga_nbget(g_a, ilo, ihi, jlo, jhi, buf, ld, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ilo,ihi | starting indices for global array section | input |
integer | jlo,jhi | ending indices for global array section | input |
type | buf | local buffer array where the data goes | input/output |
integer | ld | leading dimension/stride/extent for buffer array | input |
integer | nbhandle | non-blocking request handle | output |
subroutine nga_nbget(g_a, lo, hi, buf, ld, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | lo[ndim] | array of starting indices for global array section | input |
integer | hi[ndim] | array of ending indices for global array section | input |
type | buf | local buffer array where the data goes | input/output |
integer | ld[ndim-1] | array specifying leading dimensions/strides/extents for buffer array | input |
integer | nbhandle | non-blocking request handle | output |
One-sided (non-collective).
A non-blocking version of the blocking get operation. The get operation can be completed locally by making a call to the wait (e.g., NGA_NbWait) routine.
Copies data from global array section to the local array buffer.
The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in the input arguments are fatal.
Returns: The local array buffer.
subroutine ga_nblock(g_a, nblock)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | nblock[ndim] | number of partitions for each dimension | output |
Local operation.
Given a distribution of an array represented by the handle g_a, returns the number of partitions of each array dimension.
subroutine ga_nbput(g_a, ilo, ihi, jlo, jhi, buf, ld, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ilo,ihi | starting indices for global array section | input |
integer | jlo,jhi | ending indices for global array section | input |
typebuf | local buffer array where the data is | input | |
integer | ld | leading dimension/stride/extent for buffer array | input |
integer | nbhandle | non-blocking request handle | output |
subroutine nga_nbput(g_a, lo, hi, buf, ld, nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | lo[ndim] | array of starting indices for global array section | input |
integer | hi[ndim] | array of ending indices for global array section | input |
typebuf | local buffer array where the data is | input | |
integer | ld[ndim-1] | array specifying leading dimensions/strides/extents for buffer array | input |
integer | nbhandle | non-blocking request handle | output |
One-sided (non-collective).
A non-blocking version of the blocking put operation. The put operation can be completed locally by making a call to the wait (e.g., NGA_NbWait) routine.
Copies data from local array buffer to the global array section.
The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in input arguments are fatal.
logical function ga_nbtest(nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | nbhandle | non-blocking request handle | input |
logical | .TRUE. if operation has completed | output |
logical function nga_nbtest(nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | nbhandle | non-blocking request handle | input |
logical | .TRUE. if operation has completed | output |
One-sided (non-collective).
This function tests a non-blocking one-sided operation for completion. If true, the function is completed locally and the buffer is available for either use or reuse, depending on the operation. Once this operation has returned true, there is no need to call nbwait on the handle. The test function is properly implemented only on the Progress Ranks and RMA runtimes. For other runtimes, it defaults to the non-blocking wait function and always returns true.
subroutine ga_nbwait(nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | nbhandle | non-blocking request handle | input |
subroutine nga_nbwait(nbhandle)
Type | Name | Description | Intent |
---|---|---|---|
integer | nbhandle | non-blocking request handle | input |
One-sided (non-collective).
This function completes a non-blocking one-sided operation locally. Waiting on a nonblocking put or an accumulate operation assures that data was injected into the network and the user buffer can now be reused. Completing a get operation assures data has arrived into the user memory and is ready for use. The wait operation ensures only local completion. Not all runtimes support true non-blocking capability. For those that don't all operations are blocking and the wait function is a no-op.
Unlike their blocking counterparts, the nonblocking operations are not ordered with respect to the destination. Performance being one reason, the other reason is that ensuring ordering would incur additional and possibly unnecessary overhead on applications that do not require their operations to be ordered. For cases where ordering is necessary, it can be done by calling a fence operation. The fence operation is provided to the user to confirm remote completion if needed.
integer function ga_ndim(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input |
Local operation.
Returns the number of dimensions in the array represented by the handle g_a.
integer function ga_nnodes()
Local operation.
Returns the number of the GA compute (user) processes.
integer function ga_nodeid()
Local operation.
Returns the GA process id (0, ..., ga_Nnodes()-1) of the requesting compute process.
subroutine ga_norm_infinity(g_a, nm)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double precision | nm | matrix/vector infinity-norm value | output |
Collective on the processor group inferred from the arguments.
Computes the infinity-norm of the matrix or vector g_a.
subroutine ga_norm1(g_a, nm)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double precision | nm | matrix/vector 1-norm value | output |
Collective on the processor group inferred from the arguments.
Computes the 1-norm of the matrix or vector g_a.
logical function ga_overlay(g_a, g_p)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | handle for overlay array | input |
integer | g_p | handle for parent array | output |
Collective on the processor group inferred from the arguments.
The overlay function is designed to allow users to create a global array on top of an existing global array. This can be used in situations where it is desirable to create and destroy a large number of global arrays in rapid succession and where the size of these global arrays can be bounded beforehand. A large global array is created at the start using conventional create or allocate calls and then is used as the parent for new global arrays that are allocated using the overlay call. This approach removes some collectives and memory allocation and deallocation calls from the process of creating a global array and should result in improved performance. Note that the parent global array should not be used while another array is overlayed on top of it.
Collective on the processor group inferred from the arguments.
subroutine ga_pack(g_src, g_dest, g_mask, lo, hi, icount)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_src | handle for source array | input |
integer | g_dest | handle for destination array | output |
integer | g_mask | handle for integer array representing a bit mask | input |
integer | lo,hi | low and high values of range on which operation is performed | input |
integer | icount | number of values in compressed array | output |
Collective on the processor group inferred from the arguments.
The pack subroutine is designed to compress the values in the source vector g_src into a smaller destination array g_dest based on the values in an integer mask array g_mask. The values lo and hi denote the range of elements that should be compressed and icount is a variable that on output lists the number of values placed in the compressed array. This operation is the complement of the GA_Unpack operation. An example is shown below
GA_Pack(g_src, g_dest, g_mask, 1, n, &icount); g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 g_dest: 1 7 9 12 15 16 icount: 6
The current implementation requires that the distribution of the g_mask array matches the distribution of the g_src array.
subroutine ga_patch_enum(g_a, lo, hi, start, inc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | lo,hi | low and high values of array patch | input |
start | integer/double precision/complex starting value of enumeration | input | |
inc | integer/double precision/complex increment value | input |
Collective on the processor group inferred from the arguments.
This subroutine enumerates the values of an array between elements lo and hi starting with the value start and incrementing each subsequent value by inc. This operation is only applicable to 1-dimensional arrays. An example of its use is shown below:
GA_Patch_enum(g_a, 1, n, 7, 2); g_a: 7 9 11 13 15 17 19 21 23 ...
subroutine nga_periodic_acc(g_a, lo, hi, buf, ld, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the local data is | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
type | alpha | scale argument for accumulate | input |
One-sided (non-collective).
Same as nga_acc except the indices can extend beyond the array boundary/dimensions in which case the library wraps them around. For Python, this is the periodic version of ga.acc.
Combines data from buffer with data in the global array patch.
The buffer array is assumed to be have the same number of dimensions as the global array. If the buffer is not contiguous, a contiguous copy will be made.
global array section (lo[],hi[]) += alpha * buffer
subroutine nga_periodic_get(g_a, lo, hi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the data goes to | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
Same as nga_get except the indices can extend beyond the array boundary/dimensions in which case the library wraps them around.
The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in the input arguments are fatal.
Returns: The local Array buffer.
subroutine nga_periodic_put(g_a, lo, hi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the data comes from | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
Same as nga_put except the indices can extend beyond the array boundary/dimensions in which case the library wraps them around. The indices can extend beyond the array boundary/dimensions in which case the libray wraps them around. Copies data from local array buffer to the global array section. The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in input arguments are fatal.
subroutine ga_pgroup_brdcst(p_handle, type, buf, lenbuf, root)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
integer | type | message index | input |
byte | buf(lenbuf) | local message buffer | input/output |
integer | lenbuf | length of message | input |
integer | root | processor sending message | input |
Collective on the processor group inferred from the arguments.
Broadcast data from processor specified by root to all other processors in the processor group specified by p_handle. The length of the message in bytes is specified by lenbuf. The initial and broadcasted data can be found in the buffer specified by the pointer buf.
If the buffer is not contiguous, an error is raised. This operation is provided only for convenience purposes: it is available regardless of the message-passing library that GA is running with.
Returns: The buffer in case a temporary was passed in.
integer function ga_pgroup_create(list, size)
Type | Name | Description | Intent |
---|---|---|---|
integer | size | number of processors in group | input |
integer | list(size) | list of processors in processor group | input |
integer | pgroup handle | output |
Collective on the default processor group.
This command is used to create a processor group. At present, it must be invoked by all processors in the current default processor group. The list of processors use the indexing scheme of the default processor group. If the default processor group is the world group, then these indices are the usual processor indices. This function returns a process group handle that can be used to reference this group by other functions.
logical function ga_pgroup_destroy(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
integer | .FALSE. if processor group was not previously active | output |
Collective on the processor group inferred from the arguments.
This command is used to free up a processor group handle. It returns 0 if the processor group handle was not previously active.
integer function ga_pgroup_duplicate(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
integer | handle to new processor group | output |
Return a copy of an existing processor group.
Collective on the processor group inferred from the arguments.
integer function ga_pgroup_get_default()
Local operation.
This function will return a handle to the default processor group, which can then be used to create a global array using one of the NGA_create_*_config or GA_Set_pgroup calls.
integer function ga_pgroup_get_mirror()
Local operation.
This function will return a handle to the mirrored processor group, which can then be used to create a global array using one of the NGA_create_*_config or GA_Set_pgroup calls.
integer function ga_pgroup_get_world()
Local operation.
This function will return a handle to the world processor group, which can then be used to create a global array using one of the NGA_create_*_config or GA_Set_pgroup calls.
subroutine ga_pgroup_igop(p_handle, type, buf, n, op)
subroutine ga_pgroup_sgop(p_handle, type, buf, n, op)
subroutine ga_pgroup_dgop(p_handle, type, buf, n, op)
subroutine ga_pgroup_cgop(p_handle, type, buf, n, op)
subroutine ga_pgroup_zgop(p_handle, type, buf, n, op)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
integer | type | message index (deprecated) | input |
integer | buf(n) | array | input/output |
real | buf(n) | array | input/output |
double precision | buf(n) | array | input/output |
single complex | buf(n) | array | input/output |
double complex | buf(n) | array | input/output |
integer | n | number elements in array | input |
character*(*) | op | operation on data | input |
Collective on the processor group inferred from the arguments.
The buf[n] is an array present on each processor in the processor group p_handle. The GA_Pgroup_dgop `sums' all elements in buf[n] across all processors in the group specified by p_handle using the commutative operation specified by the character string op. The result is broadcast to all processor in p_handle. Allowed strings are `+', `*', `max', `min', `absmax', `absmin'. The use of lowerecase for operators is necessary.
integer function ga_pgroup_nnodes(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
Local operation.
This function returns the number of processors contained in the group specified by p_handle.
Returns the number of processors contained in the group specified by pgroup.
integer function ga_pgroup_nodeid(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
Local operation.
This function returns the relative index of the processor in the processor group specified by p_handle. This index will generally differ from the absolute processor index returned by GA_Nodeid if the processor group is not the world group.
Returns the relative index of the processor in the processor group specified by pgroup.
integer function ga_pgroup_self()
Type | Name | Description | Intent |
---|---|---|---|
integer | handle to new processor group | output |
Return a processor group that only contains the calling process.
One-sided (non-collective).
subroutine ga_pgroup_set_default(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
Collective on the processor group inferred from the arguments.
This function can be used to reset the default processor group on a collection of processors. All processors in the group referenced by p_handle must make a call to this function. Any standard global array call that is made after resetting the default processor group will be restricted to processors in that group. Global arrays that are created after resetting the default processor group will only be defined on that group and global operations, such as GA_Sync or GA_Igop, and will be restricted to processors in that group. The GA_Pgroup_set_default call can be used to rapidly convert large applications, written with GA, into routines that run on processor groups.
The default processor group can be overridden by using GA calls that require an explicit group handle as one of the arguments.
subroutine ga_pgroup_sync(p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | p_handle | processor group handle | input |
Collective on the processor group inferred from the arguments.
This operation executes a synchronization group across the processors in the processor group specified by p_handle. Nodes outside this group are unaffected.
subroutine ga_print(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input |
Collective on the processor group inferred from the arguments.
Prints an entire array to the standard output.
subroutine ga_print_distribution(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input |
Collective on the processor group inferred from the arguments.
Prints the array distribution.
Collective on the processor group inferred from the arguments.
Prints an entire array to a file.
subroutine ga_print_patch(g_a,ilo,ihi,jlo,jhi,pretty)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo,ihi,jlo,jhi | input | |
integer | pretty | input |
Collective on the processor group inferred from the arguments.
Prints a patch of g_a array to the standard output. If the variable pretty has the value 0 then output is printed in a dense fashion. If pretty has the value 1 then output is formatted and rows/columns are labeled.
subroutine ga_print_stats()
Local operation.
This non-collective (MIMD) operation prints information about:
subroutine ga_proc_topology(g_a, proc, prow, pcol)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | proc | input | |
integer | prow, pcol | output |
Local operation.
Based on the distribution of an array associated with handle g_a, determines coordinates of the specified processor in the virtual processor grid corresponding to the distribution of array g_a. The numbering starts from 0. The values of -1 means that the processor doesn't "own" any section of the array represented by g_a.
subroutine ga_put(g_a, ilo, ihi, jlo, jhi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo, ihi, jlo, jhi | bounding indices for global array section | input |
double precision/complex/integer | buf | local buffer where data comes from | output |
integer | ld | input |
subroutine nga_put(g_a, lo, hi, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
type | buf | local buffer array where the data comes from | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
Copies data from the local array buffer to the global array section. The local array is assumed to have the same number of dimensions as the global array. Any detected inconsistencies or errors in input arguments are fatal.
integer function ga_read_inc(g_a, i, j, inc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | i, j, inc | input |
integer function nga_read_inc(g_a, subscript, inc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
subscript | (ndim) | subscript array for the referenced element | input |
inc | amount element is incremented after read | input |
One-sided (non-collective).
Atomically read and increment an element in an integer array.
*BEGIN CRITICAL SECTION* old_value = a(subscript) a(subscript) += inc *END CRITICAL SECTION* return old_value
subroutine ga_recip(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
Take the element-wise reciprocal of the array.
subroutine ga_recip_patch(g_a, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of dimensions | input |
integer | lo(ndim), hi(ndim) | patch coordinates | input |
Collective on the processor group inferred from the arguments.
Take element-wise reciprocal of the patch.
subroutine ga_release(g_a, ilo, ihi, jlo, jhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo, ihi, jlo, jhi | input |
subroutine nga_release(g_a, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of array dimensions | input |
integer | lo(ndim),hi(ndim) | patch specification | input |
Local operation.
Releases access to a global array when the data was read only.
Your code should look like:
NGA_Distribution(g_a, myproc, lo,hi); NGA_Access(g_a, lo, hi, \&ptr, ld);GA_Release(g_a, lo, hi);
NOTE: see restrictions specified for ga_access.
subroutine nga_release_block(g_a, index)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | index | block index | input |
Local operation.
Releases access to the block of data specified by the integer index when data was accessed as read only. This is only applicable to block-cyclic data distributions created using the simple block-cyclic distribution.
subroutine nga_release_block_grid(g_a, subscript)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | subscript(ndim) | indices of block in array | input |
Local operation.
Releases access to the block of data specified by the subscript array when data was accessed as read only. This is only applicable to block-cyclic data distributions created using the SCALAPACK data distribution.
subroutine nga_release_block_segment(g_a, iproc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | iproc | processor ID | input |
Local operation.
Releases access to the block of locally held data for a block-cyclic array, when data was accessed as read-only.
subroutine nga_release_ghost_element(g_a, subscript)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | subscript(ndim) | element subscript | input |
Local operation.
Releases access to the locally held data for an array with ghost elements, when data was accessed as read-only.
subroutine nga_release_ghosts(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Local operation.
Releases access to the locally held block of data containing ghost elements, when data was accessed as read-only.
subroutine ga_release_update(g_a, ilo, ihi, jlo, jhi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ilo, ihi, jlo, jhi | input |
subroutine nga_release_update(g_a, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | ndim | number of array dimensions | input |
integer | lo(ndim),hi(ndim) | patch specification | input |
Local operation.
Releases access to the data. It must be used if the data was accessed for writing.
NOTE: see restrictions specified for ga_access.
subroutine nga_release_update_block(g_a, index)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | index | block index | input |
Local operation.
Releases access to the block of data specified by the integer index when data was accessed in read-write mode. This is only applicable to block-cyclic data distributions created using the simple block-cyclic distribution.
subroutine nga_release_update_block_grid(g_a, subscript)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | subscript(ndim) | indices of block in array | input |
Local operation.
Releases access to the block of data specified by the subscript array when data was accessed in read-write mode. This is only applicable to block-cyclic data distributions created using the SCALAPACK data distribution.
subroutine nga_release_update_block_segment(g_a, iproc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | iproc | processor ID | input |
Local operation.
Releases access to the block of locally held data for a block-cyclic array, when data was accessed as read-only.
subroutine nga_release_update_ghost_element(g_a, subscript)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | subscript(ndim) | element subscript | input |
Local operation.
Releases access to the locally held data for an array with ghost elements, when data was accessed in read-write mode.
subroutine nga_release_update_ghosts(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Local operation.
Releases access to the locally held block of data containing ghost elements, when data was accessed in read-write mode.
subroutine ga_scale(g_a, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double precision/complex/integer | s | scale factor | input |
Collective on the processor group inferred from the arguments.
Scales an array by the constant s. Note that the library is unable to detect errors when the pointed value is of a different type than the array.
subroutine ga_scale_cols(g_a, g_v)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_v | array handles | input |
Collective on the processor group inferred from the arguments.
Scales the columns of this matrix g_a using the vector g_v.
subroutine ga_scale_patch(g_a, ailo, aihi, ajlo, ajhi, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision/complex/integer | s | scale factor | input |
integer | ailo, aihi, ajlo, ajhi | g_a patch coordinates | input |
subroutine nga_scale_patch(g_a, alo, ahi, s)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision/complex/integer | s | scale factor | input |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
Collective on the processor group inferred from the arguments.
Scale an array by the factor `val'
subroutine ga_scale_rows(g_a, g_v)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_v | array handles | input |
Collective on the processor group inferred from the arguments.
Scales the rows of this matrix g_a using the vector g_v.
subroutine ga_scan_add(g_src, g_dest, g_mask, lo, hi, excl)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_src | handle for source array | input |
integer | g_dest | handle for destination array | output |
integer | g_mask | handle for integer array representing a bit mask | input |
integer | lo,hi | low and high values of range on which operation is performed | input |
integer | excl | value to signify if masked values are included in add | input |
Collective on the processor group inferred from the arguments.
This operation will add successive elements in a source vector g_src and put the results in a destination vector g_dest. The addition will restart based on the values of the integer mask vector g_mask. The scan is performed within the range specified by the integer values lo and hi. Note that this operation can only be applied to 1-dimensional arrays. The excl flag determines whether the sum starts with the value in the source vector corresponding to the location of a 1 in the mask vector (excl=0) or whether the first value is set equal to 0 (excl=1). Some examples of this operation are given below.
GA_Scan_add(g_src, g_dest, g_mask, 1, n, 0); g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 g_dest: 1 3 6 10 16 21 7 15 9 19 30 12 25 39 15 16 33 GA_Scan_add(g_src, g_dest, g_mask, 1, n, 1); g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 g_dest: 0 1 3 6 10 15 0 7 0 9 19 0 12 25 0 0 16
subroutine ga_scan_copy(g_src, g_dest, g_mask, lo, hi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_src | handle for source array | input |
integer | g_dest | handle for destination array | output |
integer | g_mask | handle for integer array representing a bit mask | input |
integer | lo,hi | low and high values of range on which operation is performed | input |
Collective on the processor group inferred from the arguments.
This subroutine does a segmented scan-copy of values in the source array g_src into a destination array g_dest with segments defined by values in the integer mask array g_mask. The scan-copy operation is only applied to the range between the lo and hi indices. This operation is restriced to 1-dimensional arrays. The resulting destination array will consist of segments of consecutive elements with the same value. An example is shown below.
GA_Scan_copy(g_src, g_dest, g_mask, 1, n); g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 g_dest: 1 1 1 1 1 1 7 7 9 9 9 12 12 12 15 16 16
subroutine ga_scatter(g_a, v, i, j, n)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
double precision | v(n) | vector of values to be scattered | input |
integer | i(n), j(n) | array of subscripts for each destination element | input |
integer | n | number of elements to be scattered | input |
subroutine nga_scatter(g_a, v, subsArray, n)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | n | number of elements | input |
type | v(n) | array containing values | input |
integer | ndim | number of array dimensions | input |
integer | subsArray(ndim,n) | array of subscripts for each element | input |
One-sided (non-collective).
Scatters array elements into a global array. The contents of the input arrays (v,subsArray) are preserved.
for (k=0; k<= n; k++) {a[[subsArray[k][0]][subsArray[k][1]][subsArray[k][2]]... = v[k];}
subroutine ga_scatter_acc(g_a, v, i, j, n, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | n | number of elements | input |
type | v(n) | array containing value | input |
integer | i(n),j(n) | arrays of indices | input |
double precision/complex | alpha | multiplicative value | input |
subroutine nga_scatter_acc(g_a, v, subsArray, n, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | n | number of elements | input |
type | v(n) | array containing value | input |
integer | ndim | number of array dimensions | input |
integer | subsArray(ndim,n) | array of subscripts | input |
double precision/complex | alpha | multiplicative value | input |
One-sided (non-collective).
Scatters array elements from a local array into a global array. Adds values from the local array to existing values in the global array after multiplying by alpha. The contents of the input arrays (v, subsArray) are preserved.
for (k=0; k<= n; k++) {a[subsArray[k][0]][subsArray[k][1]][subsArray[k][2]]... += v[k];}
Like scatter, but adds values to existing values in the global array after multiplying by alpha.
subroutine nga_select_elem(g_a, op, val, index)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle Control | input |
character(*)* | op | operator {`min',`max'} | input |
val | address where selected value should be stored | output | |
integer | index(ndim) | array index for the selected element | output |
Collective on the processor group inferred from the arguments.
Returns the value and index for an element that is selected by the specified operator ("min" or "max") in a global array corresponding to g_a handle.
subroutine ga_set_array_name(g_a, name)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
character*(*) | name | a unique character string | input |
Collective on the processor group inferred from the arguments.
This function can be used to assign a unique character string name to a Global Array handle that was obtained using the GA_Create_handle function.
subroutine ga_set_block_cyclic(g_a, dims)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | dims(ndim) | array of block dimensions | input |
Collective on the processor group inferred from the arguments.
This subroutine is used to create a global array with a simple block-cyclic data distribution. The array is broken up into blocks of size dims and each block is numbered sequentially using a column major indexing scheme. The blocks are then assigned in a simple round-robin fashion to processors.
Figure "stblkcy" below illustrates an array containing 25 blocks distributed on 4 processors.
Blocks at the edge of the array may be smaller than the block size specified in dims. In the example below, blocks 4, 9, 14, 19, 20, 21, 22, 23, and 24 might be smaller than the remaining blocks. Most global array operations are insensitive to whether or not a block-cyclic data distribution is used, although performance may be slower in some cases if the global array is using a block-cyclic data distribution. Individual data blocks can be accessesed using the block-cyclic access functions.
subroutine ga_set_block_cyclic_proc_grid(g_a, dims, proc_grid)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | dims(ndim) | array of block dimensions | input |
Collective on the processor group inferred from the arguments.
This subroutine is used to create a global array with a SCALAPACK-type block cyclic data distribution. The user specifies the dimensions of the processor grid in the array proc_grid. The product of the processor grid dimensions must equal the number of total number of processors and the number of dimensions in the processor grid must be the same as the number of dimensions in the global array. The data blocks are mapped onto the processor grid in a cyclic manner along each of the processor grid axes.
Figure "setblkcyprocgrid" below illustrates an array consisting of 25 data blocks distributed on 6 processors.
The 6 processors are configured in a 3 by 2 processor grid. Blocks at the edge of the array may be smaller than the block size specified in dims. Most global array operations are insensitive to whether or not a block-cyclic data distribution is used, although performance may be slower in some cases if the global array is using a block-cyclic data distribution. Individual data blocks can be accessesed using the block-cyclic access functions.
subroutine ga_set_chunk(g_a, chunk)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | chunk(ndim) | array of chunk widths | input |
Collective on the processor group inferred from the arguments.
This function is used to set the chunk array for a global array handle that was obtained using the GA_Create_handle function. The chunk array is used to determine the minimum number of array elements assigned to each processor along each coordinate direction.
subroutine ga_set_data (g_a, ndim, dims, type)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | dimension of array | input |
integer | dims(ndim) | array dimensions | input |
integer | type | data type (MT_DBL,MT_INT,etc.) | input |
Collective on the processor group inferred from the arguments.
This function can be used to set the array dimension, the coordinate dimensions, and the data type assigned to a Global Array handle obtained using the GA_Create_handle function.
subroutine ga_set_debug(flag)
Type | Name | Description | Intent |
---|---|---|---|
logical | flag | value to set internal debug flag | input |
Local operation.
This function sets an internal flag in the GA library to either true or false. The value of this flag can be recovered at any time using the GA_Get_debug function. The flag is set to false when the the GA library is initialized. This can be useful in a number of debugging situations, especially when examining the behavior of routines that are called in multiple locations in a code.
subroutine ga_set_diagonal(g_a, g_v)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_v | array handles | input |
Collective on the processor group inferred from the arguments.
Sets the diagonal elements of this matrix g_a with the elements of the vector g_v.
subroutine ga_set_ghosts(g_a, width)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | width(ndim) | array of ghost cell widths | input |
Collective on the processor group inferred from the arguments.
This function can be used to set the ghost cell widths for a global array handle that was obtained using the GA_Create_handle function. The ghosts cells widths indicate how many ghost cells are used to pad the locally held array data along each dimension. The padding can be set independently for each coordinate dimension.
subroutine ga_set_irreg_distr(g_a, mapc, nblock)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | map(s) | starting index for for each block; the size s is a sum of all elements of nblock array | input |
integer | nblock(ndim) | no. of blocks each dimension is divided into | input |
Collective on the processor group inferred from the arguments.
This function can be used to partition the array data among the individual processors for a global array handle obtained using the GA_Create_handle function.
The distribution is specified as a Cartesian product of distributions for each dimension. For example, the following figure demonstrates the distribution of a 2-dimensional array 8x10 on 6 processors. nblock(2)={3,2}, the size of mapc array is s=5 and array mapc contains the following elements mapc={1, 3, 7, 1, 6}. The distribution is nonuniform because, P1 and P4 get 20 elements each and processors P0, P2, P3, and P5 only 10 elements each.
The array width() is used to control the width of the ghost cell boundary around the visible data on each processor. The local data of the global array residing on each processor will have a layer width(n) ghosts cells wide on either side of the visible data along the dimension n.
An example is shown in Figure "setirregdist" below .
subroutine ga_set_memory_dev(g_a, device)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | character*(*) | character string specifying the memory type | input |
Collective on the processor group inferred from the arguments.
This function allows users to specify what type of memory is used for allocating global arrays. This functionality relies on the Simpified Interface for Complex Memory library, available at https://github.com/lanl/SICM. If Global Arrays is not linked to SICM, then this property is ignored and the array will be allocated on ordinary memory. This should still produce correct code, but performance may be lower.
The type of memory available depends on the hardware being used. Currently supported options are "dram" (this is widely available but only corresponds to regular memory), "knl_hbm", and "ppc_hbm". The "hbm" options are high-bandwidth memory on the KNL and Power PC chips, respectively. If Global Arrays has not been compiled using SICM, the memory device option is ignored and a conventional global array is created.
subroutine ga_set_memory_limit(limit)
Type | Name | Description | Intent |
---|---|---|---|
integer | limit | the amount of memory in bytes per process | input |
Local operation.
Sets the amount of memory to be used (in bytes) per process
subroutine ga_set_pgroup(g_a, p_handle)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | p_handle | processor group handle | input |
Collective on the processor group inferred from the arguments.
This function can be used to set the processor configuration assigned to a global array handle that was obtained using the GA_Create_handle function. It can be used to create mirrored arrays by using the mirrored array processor configuration in this function call. It can also be used to create an array on a processor group by using a processor group handle in this call.
subroutine ga_set_property(g_a, property)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | character*(*) | character string representing the property | input |
Collective on the processor group inferred from the arguments.
This function allows users to assign properties to an array if there is advanced knowledge that an array will be used in a certain way. The two properties currently supported are "read_only" and "read_cache". These can both be set after the global array is allocated.
The read_only property is similar to creating a mirrored array in that the global array is replicated across an SMP node. This speeds up access to array data since all transfers can occur by shared memory copies on the same node. However, memory utilization increases since the global array is replicated across nodes even though it is distributed within a node. Setting the read_only property results in significant overhead since data for the array must be copied and redistributed. However, if the array is only used for read operations after initialization, this setting may be usefull in applications.
The read_cache property is useful when an application may be repeatedly accessing the same data. The read_cache property stores the recent blocks of data obtained via "get" operations. If a subsequent "get" requests the same data or a subset of the data, the data is obtained from the stored request instead of requesting it over the network. This can speed up performance if there are many duplicate requests for data. The total number of requests that are cached is limited and requests are stored in the cache on a first in, first out basis.
subroutine ga_set_restricted(g_a, list, nproc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | list(nproc) | list of processor IDs that contain data | input |
integer | nproc | number of processors that contain data | input |
Collective on the processor group inferred from the arguments.
This function restricts data in the global array g_a to only the nproc processors listed in the array list. The value of nproc must be less than or equal to the number of available processors. If this call is used in conjunction with GA_Set_irreg_distr, then the decomposition in the GA_Set_irreg_distr call must be done assuming that the number of processors is nproc. The data that ordinarily would be mapped to process 0 is mapped to the process in list[0], the data that would be mapped to process 1 will be mapped to list[1], etc. This can be used to remap the data distribution to different processors, even if nproc equals the number of available processors.
subroutine ga_set_restricted_range(g_a, lo_proc, hi_proc)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | lo_proc | range of processors (inclusive) that contain data | input |
integer | hi_proc | range of processors (inclusive) that contain data | input |
Collective on the processor group inferred from the arguments.
This function restricts data in the global array to the range of processors beginning with lo_proc and ending with hi_proc. Both lo_proc and hi_proc must be less than or equal to the total number of processors minus one (e.g., in the range [0,N-1], where N is the total number of processors) and lo_proc must be less than or equal to hi_proc. If lo_proc = 0 and hi_proc = N-1 then this call has no effect on the data distribution.
subroutine ga_set_tiled_irreg_proc_grid(g_a, mapc, nblocks, proc_grid)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | mapc(s) | starting index of each block. s is equal to the sum of indices in nblocks array | input |
integer | nblocks(ndim) | array containing the number of blocks in each dimension | input |
integer | proc_grid(ndim) | processor grid dimensions | input |
Collective on the processor group inferred from the arguments.
This subroutine is a combination of the function that set an irregular distribution and the function for creating a tiled block-cyclic distribution. Like the conventional tiled block-cylic distribution, individual blocks are contiguous in memory. Unlike the conventional tiled distribution, individual blocks can vary in size. The mapc array consists of the starting index of each block along each dimension and the number of blocks along each axis is stored in the nblocks array. The number of entries in mapc equals the sum of the entries in nblocks. The proc_grid array describes how blocks are distributed over processors using a block-cylic distribution. The product of the entries in proc_grid is equal to the number of processors in the group that the global array is created on.
Note that the values in the nblocks arrays should be greater than or equal to the corresponding values in the proc_grid array.
subroutine ga_set_tiled_proc_grid(g_a, block_dims, proc_grid)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | block_dims(ndim) | array of block dimensions | input |
integer | proc_grid(ndim) | array of processor grid dimensions | input |
Collective on the processor group inferred from the arguments.
This subroutine is used to create a global array with a SCALAPACK-type block cyclic data distribution but with individual blocks that are contiguous in memory. This layout may be preferable for algorithms that are moving single blocks of data since the corresponding messages will involve contiguous data transfers instead of strided calls. This may result in performance gains for communication bound programs. Otherwise, the data layout is similar to that created by the block-cyclic processor grid function.
The user specifies the dimensions of the processor grid in the array proc_grid. The product of the processor grid dimensions must equal the number of total number of processors and the number of dimensions in the processor grid must be the same as the number of dimensions in the global array. The data blocks are mapped onto the processor grid in a cyclic manner along each of the processor grid axes. The dimensions of individual blocks are located in the block_dims array.
subroutine ga_shift_diagonal(g_a, c)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
double/complex/integer/float | c | shift value | input |
Collective on the processor group inferred from the arguments.
Adds this constant to the diagonal elements of the matrix.
integer function ga_solve(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | coefficient matrix | input |
integer | g_b | rhs/solution matrix | input |
Collective on the processor group inferred from the arguments.
Solves a system of linear equations
A * X = B
It first will call the Cholesky factorization routine and, if sucessfully, will solve the system with the Cholesky solver. If Cholesky will be not be able to factorize A, then it will call the LU factorization routine and will solve the system with forward/backward substitution. On exit B will contain the solution X.
It returns
= 0 : Cholesky factoriztion was succesful > 0 : the leading minor of this order is not positive definite, Cholesky factorization could not be completed and LU factoriztion was used
integer function ga_spd_invert(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | matrix | input/output |
Collective on the processor group inferred from the arguments.
It computes the inverse of a double precision using the Cholesky factorization of a NxN double precision symmetric positive definite matrix A stored in the global array represented by g_a. On successful exit, A will contain the inverse.
It returns
= 0 : successful exit > 0 : the leading minor of this order is not positive definite and the factorization could not be completed < 0 : it returns the index i of the (i,i) element of the factor L/U that is zero and the inverse could not be computed
subroutine ga_step_max(g_a, g_b, step)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_b | array handles | input |
step | double precision/integer; the maximum step | output |
Collective on the processor group inferred from the arguments.
Calculates the largest multiple of a vector g_b that can be added to this vector g_a while keeping each element of this vector non-negative.
subroutine ga_step_max_patch(g_a, alo, ahi, g_b, blo, bhi, step)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a,g_b | array handles where g_b is step direction | input |
integer | alo,ahi,blo,bhi | patch coordinates of g_a and g_b | input |
double precision | step | the maximum step | output |
Collective on the processor group inferred from the arguments.
Calculates the largest multiple of a vector g_b that can be added to this vector g_a while keeping each element of this vector non-negative.
subroutine nga_strided_acc(g_a, lo, hi, skip, buf, ld, alpha)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
integer | skip(ndim) | array of strides for each dimension | input |
type | buf | local buffer array where the data comes from | input |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
type | alpha | scale argument for accumulate | input |
One-sided (non-collective).
This operation is the same as NGA_Acc, except that the values corresponding to dimension n in buf are accumulated to every skip[n] values of the global array g_a.
Combines data from buffer with data in the global array patch.
The buffer array is assumed to be have the same number of dimensions as the global array.
global array section (lo[],hi[]) += alpha * buffer
subroutine nga_strided_get(g_a, lo, hi, skip, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global array section | input |
integer | hi(ndim) | array of ending indices for global array section | input |
integer | skip(ndim) | array of strides for each dimension | input |
type | buf | local buffer array where the data comes from | output |
integer | ld(ndim-1) | array specifying leading dimensions for buffer array | input |
One-sided (non-collective).
This operation is the same as NGA_Get, except that the values corresponding to dimension n in buf correspond to every skip[n] values of the global array g_a.
The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in the input arguments are fatal.
Returns: The local array buffer.
subroutine nga_strided_put(g_a, lo, hi, skip, buf, ld)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
integer | ndim | number of dimensions of the global array | input |
integer | lo(ndim) | array of starting indices for global | input |
integer | hi(ndim) | array of ending indices for global array | input |
integer | skip(ndim) | array of strides for each dimension | input |
type | buf | local buffer array where the data comes from | output |
integer | ld(ndim-1) | array specifying leading dimensions for array section | input |
One-sided (non-collective).
Strided version of put. This operation is the same as NGA_Put, except that the values corresponding to dimension n in buf are copied to every skip[n] values of the global array g_a.
Copies data from local array buffer to the global array section.
The local array is assumed to be have the same number of dimensions as the global array.
Any detected inconsitencies/errors in input arguments are fatal.
subroutine ga_summarize(verbose)
Type | Name | Description | Intent |
---|---|---|---|
logical | verbose | If true print distribution info | input |
Local operation.
Prints info about allocated arrays.
subroutine ga_symmetrize(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input/output |
Collective on the processor group inferred from the arguments.
Symmetrizes matrix A represented with handle g_a: A:= .5 * (A+A').
subroutine ga_sync()
Collective on the default processor group.
Synchronize processes (a barrier) and ensure that all GA operations completed.
subroutine ga_terminate()
Collective on the world processor group.
Delete all active arrays and destroy internal data structures.
integer function ga_total_blocks(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | total number of blocks in the block-cyclic distribution | output |
Local operation.
This function returns the total number of blocks contained in a global array with a block-cyclic data distribution.
subroutine ga_transpose(g_a, g_b)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | remains unchanged | input |
integer | g_b | solution matrix | input |
Collective on the processor group inferred from the arguments.
Transposes a matrix: B = A', where A and B are represented by handles g_a and g_b.
subroutine ga_unlock(mutex)
Type | Name | Description | Intent |
---|---|---|---|
integer | mutex | mutex id | input |
One-sided (non-collective).
Unlocks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to unlock a mutex which has not been locked by this process.
subroutine ga_unpack(g_src, g_dest, g_mask, lo, hi, icount)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_src | handle for source array | input |
integer | g_dest | handle for destination array | output |
integer | g_mask | handle for integer array representing a bit mask | input |
integer | lo,hi | low and high values of range on which operation is performed | input |
integer | icount | number of values in uncompressed array | output |
Collective on the processor group inferred from the arguments.
The unpack subroutine is designed to expand the values in the source vector g_src into a larger destination array g_dest based on the values in an integer mask array g_mask. The values lo and hi denote the range of elements that should be compressed and icount is a variable that on output lists the number of values placed in the uncompressed array. This operation is the complement of the GA_Pack operation. An example is shown below.
GA_Unpack(g_src, g_dest, g_mask, 1, n, &icount); g_src: 1 7 9 12 15 16 g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 g_dest: 1 0 0 0 0 0 7 0 9 0 0 12 0 0 15 16 0 icount: 6
The current implementation requires that the distribution of the g_mask array matches the distribution of the g_dest array.
subroutine ga_unset_property(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | global array handle | input |
Collective on the processor group inferred from the arguments.
This function clears properties from the global array.
logical function nga_update_ghost_dir(g_a,dimension,idir,cflag)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
integer | dimension | array dimension that is to be updated | input |
integer | idir | direction of update (+/-1) | input |
logical | cflag | flag to include corners in update | input |
Collective on the processor group inferred from the arguments.
This function can be used to update the ghost cells along individual directions. It is designed for algorithms that can overlap updates with computation. The variable dimension indicates which coordinate direction is to be updated (e.g. dimension = 1 would correspond to the y axis in a two or three dimensional system), the variable idir can take the values +/-1 and indicates whether the side that is to be updated lies in the positive or negative direction, and cflag indicates whether or not the corners on the side being updated are to be included in the update. The following calls would be equivalent to a call to GA_Update_ghosts for a 2-dimensional system:
status = NGA_Update_ghost_dir(g_a,0,-1,1); status = NGA_Update_ghost_dir(g_a,0,1,1); status = NGA_Update_ghost_dir(g_a,1,-1,0); status = NGA_Update_ghost_dir(g_a,1,1,0);
The variable cflag is set equal to 1 (or non-zero) in the first two calls so that the corner ghost cells are update, it is set equal to 0 in the second two calls to avoid redundant updates of the corners. Note that updating the ghosts cells using several independent calls to the nga_update_ghost_dir functions is generally not as efficient as using GA_Update_ghosts unless the individual calls can be effectively overlapped with computation.
subroutine ga_update_ghosts(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
This call updates the ghost cell regions on each processor with the corresponding neighbor data from other processors. The operation assumes that all data is wrapped around using periodic boundary data so that ghost cell data that goes beyound an array boundary is wrapped around to the other end of the array. The GA_Update_ghosts call contains two GA_Sync calls before and after the actual update operation. For some applications these calls may be unecessary, if so they can be removed using the GA_Mask_sync subroutine.
logical function ga_uses_ma()
Local operation.
Returns "1" if memory in global arrays comes from the Memory Allocator (MA). "0" means that memory comes from another source, for example System V shared memory is used.
TODO
double precision function ga_wtime()
Local operation.
This function returns a wall (or elapsed) time on the calling processor. Returns time in seconds representing elapsed wall-clock time since an arbitrary time in the past. Example:
double starttime, endtime; starttime = GA_Wtime(); .... code snippet to be timed .... endtime = GA_Wtime(); printf("Time taken = %lf seconds\n", endtime-starttime);
This function is only available in release 4.1 or greater.
subroutine ga_zero(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
Sets value of all elements in the array to zero.
subroutine ga_zero_diagonal(g_a)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | array handle | input |
Collective on the processor group inferred from the arguments.
Sets the diagonal elements of this matrix g_a with zeros.
subroutine nga_zero_patch(g_a, alo, ahi)
Type | Name | Description | Intent |
---|---|---|---|
integer | g_a | input | |
integer | ndim | number of dimensions | input |
integer | alo(ndim), ahi(ndim) | g_a patch coordinates | input |
Collective on the processor group inferred from the arguments.
Set all the elements in the patch to zero.