CrossPy Quickstart#

Designed as a superset of the well-known NumPy and CuPy packages, CrossPy’s main object is the heterogeneous multidimensional array. It is used as a table of numbers, indexed by a tuple of non-negative integers, the same abstraction as NumPy/CuPy – under the hood, the table can be composed of arrays of different types. Following NumPy conventions, dimensions are called axes, (currently at most one) of which could be heterogeneous.

Array Creation#

A CrossPy array can be created from:

  • a NumPy/CuPy array;

  • a composition of them.

You can create an CrossPy array using the array function:

>>> import crosspy as xp
>>> import numpy as np
>>> xp.array(np.arange(4))  
array {((0, 4),): array([0, 1, 2, 3])}
>>> import cupy as cp
>>> xp.array(cp.zeros((2, 3)))  
array {((0, 2), (0, 3)): array([[0., 0., 0.], [0., 0., 0.]])}

A CrossPy array x_cross is printed as a dictionary, where:

  • the key stands for a span, which is a tuple of size equal to the number of dimensions and each element is a pair representing a left-closed-right-open interval. For example, ((0, 2), (0, 3)) stands for a 2-D span ranging [0:2] on the first dimension and [0:3] on the second.

  • the value is the data component that makes up the span.

>>> a = xp.array([np.arange(3), cp.arange(3)])
>>> a  
array {((0, 1), (0, 3)): array([0, 1, 2]),
       ((1, 2), (0, 3)): array([0, 1, 2])}
>>> a.shape
(2, 3)

In this example, the input object is a composition of a NumPy array and a CuPy array. Typically, you may want to “concatenate” the heterogeneous components along some axis (currently only supports axis 0). This can be specified with the axis parameter.

>>> a = xp.array([np.arange(3), cp.arange(3)], axis=0)
>>> a  
array {((0, 3),): array([0, 1, 2]), ((3, 6),): array([0, 1, 2])}
>>> a.shape
(6,)

Indexing, Slicing and Iterating#

A CrossPy array is a heterogeneous array that can be sliced with integers or Python built-in slices. The result of slicing is also a CrossPy array.

>>> import crosspy as xp
>>> import numpy as np
>>> import cupy as cp
>>> a = xp.array([np.array([0, 2, 5]), cp.array([4, 1])], axis=0)
>>> a  
array {((0, 3),): array([0, 2, 5]),
       ((3, 5),): array([4, 1])}
>>> a[0]             # slicing with a single integer  
0
>>> a[[1, 4, 3]]     # slicing with a list of integers  
array {((0, 1),): array([2]),
       ((1, 3),): array([1, 4])}
>>> a[2:4]           # slicing with Python built-in slices  
array {((0, 1),): array([5]),
       ((1, 2),): array([4])}

Iterating over a CrossPy array can be done with respect to either partitions blocks or devices.

block_view() returns the list of underlying partition blocks.

>>> a.block_view()
[array([0, 2, 5]), array([4, 1])]

You can also apply a function to each partition block, quivalent to map(func, a.block_view()).

>>> a.block_view(lambda x: x + 1)
[array([1, 3, 6]), array([5, 2])]
>>> a  # unchanged since the lambda is not inplace  
array {((0, 3),): array([0, 2, 5]),
       ((3, 5),): array([4, 1])}
>>> a.block_view(lambda x: x.sort())  # apply inplace changes
[None, None]
>>> a  
array {((0, 3),): array([0, 2, 5]),
       ((3, 5),): array([1, 4])}

device_view() returns an iterable of lists where each list holds partition blocks on the same device.

>>> type(a.device_view())
<class 'generator'>
>>> tuple(a.device_view())  
([array([0, 2, 5])], [array([1, 4])])
>>> for device_id, blocks_on_device_i in enumerate(a.device_view()):
...   print(device_id, blocks_on_device_i)
0 [array([0, 2, 5])]
1 [array([1, 4])]

Basic Operations#

CrossPy arrays support common arithmetic operations without worrying about the underlying data distribution.

>>> import crosspy as xp
>>> import numpy as np
>>> import cupy as cp
>>> a = xp.array([cp.arange(3), np.arange(2)], axis=0)
>>> a  
array {((0, 3),): array([0, 1, 2]), ((3, 5),): array([0, 1])}
>>> a[0] = a[2] + a[4]
>>> a  
array {((0, 1),): array([6]),
       ((1, 3),): array([1, 2]),
       ((3, 4),): array([3]),
       ((4, 6),): array([4, 5])}

Note

While performing arithmetic operations, the operator only knows about the source operands, including their residing devices, but not the destination where the result of the arithmetic operation will be assigned to. There is also no way to specify the canonical device among those of the operands. Therefore, CrossPy by default performs the operation on the device of the first/left operand (thus the “canonical device”), pulling operands on other devices. If the canonical device is different from that of the destination, an unavoidable copy will occur.

Interoperability with NumPy/CuPy#

CrossPy arrays can be assigned value(s) from/to NumPy/CuPy arrays.

When assigning values from NumPy/CuPy arrays to CrossPy arrays, there are two possible behaviors. The first one scatters the data from the source array to the underlying devices of the CrossPy array, i.e., the heterogeneity of the CrossPy array is unchanged. The second one overwrites the corresponding part of the target array with both the data and the device of the source array. The built-in assignment operation in Python is not overloadable and we chose to implement it with the scatter behavior.

>>> x_cross[0] = np.array([5, 4, 3, 2, 1])
>>> x_cross 
array {((0, 1), (0, 3)): array([[5, 4, 3]]),
       ((0, 1), (3, 5)): array([[2, 1]])}
>>> x_cross[0] = cp.array([6, 7, 8, 9, 0])
>>> x_cross 
array {((0, 1), (0, 3)): array([[6, 7, 8]]),
       ((0, 1), (3, 5)): array([[9, 0]])}

For assigning values from CrossPy arrays to NumPy/CuPy arrays, since the target is distinguishable by devices (NumPy arrays are always on CPU while CuPy arrays are on GPU devices), we use to to convert the array. We simply use negative integers as CPU devices and otherwise GPU devices.

>>> y_cpu = x_cross.to(-1)
>>> y_cpu
array([[6, 7, 8, 9, 0]])
>>> type(y_cpu)
<class 'numpy.ndarray'>
>>> y_gpu0 = x_cross[:1, (0, 2, 4)].to(0)
>>> y_gpu0
array([[6, 8, 0]])
>>> type(y_gpu0)
<class 'cupy.core.core.ndarray'>
>>> y_gpu0.device
<CUDA Device 0>
>>> y_gpu1 = x_cross.to(1)
>>> y_gpu1
array([[6, 7, 8, 9, 0]])
>>> type(y_gpu1)
<class 'cupy.core.core.ndarray'>
>>> y_gpu1.device
<CUDA Device 1>

With to, we can use NumPy/CuPy computational functions as usual.

>>> np.linalg.norm(x_cross.to(-1))
15.165750888103101
>>> cp.linalg.norm(x_cross.to(0))
array(15.16575089)

Note

It seems tedious to have the ugly tail to. However, third-party APIs always have fixed signatures and those in CuPy for example is inherently not compatible with CrossPy objects (same with NumPy). Therefore, an explicit operation is necessary to satisfy the input requirements of third-party APIs.

Heterogeneous Partitioning#

CrossPy provides notations for heterogeneous devices. Specifically, one can use crosspy.cpu and/or crosspy.gpu to refer to all CPU and/or GPU devices

>>> import crosspy as xp
>>> from crosspy import cpu, gpu

To refer to a specific CPU/GPU device, pass an integer as the device ID, e.g. gpu(0). With the device notations, a CrossPy array can be created with initial distribution. Data will be equally distributed to the specified devices accordingly.

>>> import numpy as np
>>> a = xp.array(np.arange(6), distribution=[cpu(0), gpu(0), gpu(1)])
>>> a  
array {((0, 1), (0, 2)): array([0, 1]),
       ((1, 2), (0, 2)): array([2, 3]),
       ((2, 3), (0, 2)): array([4, 5])}

Note that if the parameter axis is not specified, there will be an additional dimension for partitioning. To keep the shape of the original object, set axis as the dimension along which the partition is expected to perform.

>>> a = xp.array(np.arange(6), distribution=[cpu(0), gpu(0), gpu(1)], axis=0)
>>> a  
array {((0, 2),): array([0, 1]),
       ((2, 4),): array([2, 3]),
       ((4, 6),): array([4, 5])}
>>> a.device_map  
{((0, 2),): 'cpu',
 ((2, 4),): <CUDA Device 0>,
 ((4, 6),): <CUDA Device 1>}

More flexible partitioning scheme (aka “coloring”) can be expressed with the help of an auxiliary PartitionScheme, which is conceptually a mask over some shape indicating the device of each element. The following example creates a partitioning scheme for any 1-D array of size 6.

>>> from crosspy import PartitionScheme
>>> partition = PartitionScheme(6, default_device=cpu(0))

Note that one can specify default_device for the schema so that all elements are by default mapped to this device. If default_device is not specified or None, the mapping is uninitialized - be careful! In this case, the scheme is invalid until all elements have their devices specified.

To specify the coloring scheme, assign devices to corresponding parts.

>>> partition[0:2] = cpu(0)
>>> partition[2:6] = gpu(1)

With the PartitionScheme object, a CrossPy array can be created accordingly by passing the scheme as distribution.

>>> a = xp.array(np.arange(6), distribution=partition, axis=0)
>>> a 
array {((0, 2),): array([0, 1]),
       ((2, 6),): array([2, 3, 4, 5])}
>>> a.device_map 
{((0, 2),): 'cpu',
 ((2, 6),): <CUDA Device 1>}

Data Exchange#

In scientific computing, a common data exchange pattern is b = a[indices] where a and b are arrays and indices is a collection of discrete integers. This communication form is similar to the alltoallv MPI call. CrossPy supports this data exchange pattern by providing the alltoallv function.

>>> import crosspy as xp
>>> import numpy as np
>>> import cupy as cp
>>> with cp.cuda.Device(0):
...   a0 = cp.array([1, 3, 5])
...   b0 = cp.array([22, 44])
>>> with cp.cuda.Device(1):
...   a1 = cp.array([2, 4])
...   b1 = cp.array([11, 33, 55])
>>> a = xp.array([a0, a1], axis=0)
>>> b = xp.array([b0, b1], axis=0)
>>> a  
array {((0, 3),): array([1, 3, 5]), ((3, 5),): array([2, 4])}
>>> b  
array {((0, 2),): array([22, 44]), ((2, 5),): array([11, 33, 55])}
>>> xp.alltoallv(a, np.array([0, 3, 1, 4, 2]), b)  # semantics: b = a[[0, 3, 1, 4, 2]]
>>> b  
array {((0, 2),): array([1, 2]), ((2, 5),): array([3, 4, 5])}

CrossPy also provides assignment for writeback b[indices] = a.

>>> xp.assignment(b, np.arange(len(b)), a, None)  # semantics: b[[0, 1, 2, 3, 4]] = a
>>> b  
array {((0, 2),): array([1, 3]), ((2, 5),): array([5, 2, 4])}