Thursday, March 27, 2008

Homogeneous, N-Dimensional Arrays (ndarrays)

The basic representational element for PLT Scheme Schemelab is a homogeneous, n-dimensional array - or ndarray. This post will discuss the initial design for ndarrays and discuss some of the operations on them.

Internally, an ndarray is represented by a structure with the following elements:
  • ordering - (symbols 'row 'column), specifies the order in which dimensions are stored ('row for row major and 'column for column major). Generally, this isn't that useful except for (eventually) interfacing with foreign code (C uses row major and FORTRAN uses column major). Also, the transpose operation converts from one to the other (although that is more a side effect that its real functionality). [read only]
  • shape - (listof natural-number/c), specifies the shape of the array. The length of the shape list is the number of dimensions for the array and each element is the cardinality of the corresponding dimension. This may be set to reshape the array.
  • ndim - natural-number/c, the number of dimensions for the array. This is equal to (length shape). [read only]
  • size - natural-number/c, the total number of elements in the array. [read only]
  • disp - natural-number/c, the displacement (in elements) of this subarray is the data vector. This will be zero for any ndarray that owns its own data. [read only]
  • strides - (listof natural-number/c), a list of the number of elements to skip to move to the next element in the corresponding dimension. [read only]
  • offset - natural-number/c, the offset (in elements) for each addressed element. This will be zero for any ndarray that owns its own data. [read only]
  • data - typed-vector?, the typed vector containing the data for the array. [A typed vector encapsulates an SRFI 4 vector (for u8, u16, u32, u64, s8, s16, s32, s64, f32, or f64 arrays), a complex vector (for c64 or c128 arrays), or a Scheme vector (for Scheme object arrays).] [read only]
  • base - (or/c array? false/c), the base array for this (sub)array or #f if the array owns its own data (i.e. it is a base array). Note that the referenced array may itself base a non-#f base entry. [read only]
Note that I may not need both the displacement (disp) and offset fields. But, the general idea for array iteration is to add the displacement once when the iterator is initialized and to add the offset for each element. It might be possible to collapse those into one offset without loss of generality.

The most general array constructor is make-array, which has the following contract:


(->* ((listof natural-number/c))
(#:vtype (or/c vtype? symbol?)
#:ordering (symbols 'row 'column)
#:fill any/c)
array?)


The required argument is the shape of the array. The optional keyword arguments are #:vtype, which specifies the type of the array (default object), #:ordering, which defaults to 'row, and #:fill. If #fill is not specified, the array elements are not initialized.

Example:

(define a1 (make-array '(3 2) #:vtype f32 #:fill 0.0))

Creates an array whose shape is '(3 2) (i.e. three rows and two columns) and whose elements are 32-bit floating-point numbers that are initially 0.0.

A fully qualified reference for the array returns the corresponding (scalar) element. For example, (array-ref a1 '(1 1) returns the value of the element at '(1 1).

A partially qualified reference for the array returns a reference to the corresponding subarray. For example, (array-ref a1 '(: 1) ) returns the subarray of shape '(3) that corresponds to the second column of a1. [Note that this is a reference to the second column of a1 and not a copy of it. Array referencing never copies anything.] Likewise, (array-ref a1 '(1 :)) [or equivalently, (array-ref a1 '(1))] returns a reference to the second row of a1.

Array mutation also allows partially qualified references. In that case, the value specified must be broadcastable (described in a future post) to the shape of the reference. For example, (array-set! a1 '(: 1) 1.0) sets the elements of the second column of a1 to 1,0 (yes, the scalar 1.0 is broadcastable to shape '(3)). Likewise, (array-set! a1 '(1) '(4.0 8.0)) sets the elements of the second row of a1 to 4.0 and 8.0.

A complete set of array manipulation functions will be provided. This will include: build-array, array-map, array-map!, array-for-each (i.e., similar to what SRFI 43 provides for vectors).

I haven't though about the semantics for things like array-fold and array-unfold. They may be well-defined someplace - I really haven't looked. If anyone knows where or has any ideas, please let me know.

Note that it is not my intention to define functions like add, subtract, multiply, divide, ... (as numpy does for Python). Rather, we will rely on the array mapping functions to extend scalar operations to arrays.

Labels:

0 Comments:

Post a Comment

<< Home