Mathematica Programming » Performance Tuning

Packed Arrays

Mathematica has a wide variety of low-level optimizations that we never run into in our high-level usage. One of the most useful of these is the concept of the PackedArray .

These are efficiently stored arrays of a single type of object that perform much faster on many simple numerical manipulations than their "unpacked" companions.

Most internal functions that return arrays will return them in packed form. We can check that an array is packed with Developer`PackedArrayQ :

 reals = RandomReal[{-1, 1}, {1000, 1000}];
Developer`PackedArrayQ@reals
 True

The memory footprint of these is much lower than standard arrays:

 ureals=Developer`FromPackedArray@reals;
Developer`PackedArrayQ@ureals
ByteCount@reals
ByteCount@ureals
 False
 8000152
 24208200

And operations that can return a packed array are faster than their unpacked variants:

 1+reals//RepeatedTiming//First
1+ureals//RepeatedTiming//First
 0.0024543400000000002`3.
 0.0099898333333333315`2.

Unpacking

The biggest thing to worry about with packed arrays is the time/memory footprint that occurs when a packed array must be unpacked to be processed by a function that can't handle it in packed form.

For instance the function Chop necessarily unpacks as it replaces the Real number 0. with the Integer number 0 . This means Chop will actually be slower on a packed array:

 Chop@reals//RepeatedTiming//First
Chop@ureals//RepeatedTiming//First
 0.1129493750000000046`2.
 0.052316666666666671`2.

But if we use a function that can leverage the packing of the array it will be faster. For instance, let's look at the Real variant of Chop , a function called Threshold :

 Threshold@reals//RepeatedTiming//First
Threshold@ureals//RepeatedTiming//First
 0.0030930624999999999`2.
 0.0107768695652173899`2.

The packed version is much faster now.

RawArray

The RawArray is a relatively new addition to the language. It works rather like PackedArray in its requirement ot use a single type of object in it, but is a fundamentally different object.

It's main benefit is how little memory it consumes:

 reals//ByteCount
 8000152

If we use a 64-bit real we get similar memory consumption to the packed array:

 raw64=RawArray["Real64", reals];
raw64//ByteCount
 8000096

But we can specify that it out to use half as many bytes, as the memory usage will drop correspondingly:

 raw32=RawArray["Real32", reals];
raw32//ByteCount
 4000096

These can also be used more effectively via Library Link but we won't get into that now. For many data-intensive custom Import formats RawArray can help cut down on the memory consumption, particularly where some large portions of the data may not be immediately useful.

SparseArray

We won't discuss SparseArray much, as the sparse array is a well-known concept especially in numerical linear algebra.

It is worth keeping in mind, though, for places where one has highly sparse data. One classic place this can show up is in quantum mechanical calculations in a direct-product basis constructed from orthonormal bases. Here often one will obtain expressions that looks like

 H[n, m, i, j] = H1*δ[i, j] + H2*δ[n, m]

Assuming N1 elements in the first basis and N2 in the second, this can be more efficiently represented as:

 H = KroneckerProduct[H1, IdentityMatrix[N2]] + KroneckerProduct[IdentityMatrix[N1], H2]

And now if we visualize this with H1 and H2 being odd-looking matrices which arise in this context:

 H1=Table[(1/2)(-1)^(i-j)If[i==j, π^2/3, 2/(i-j)^2], {i, 50}, {j, 50}];
H2=Table[((-1)^(i-j)/(2*(.05)^2))If[i==j, π^2/3, 2/(i-j)^2], {i, 75}, {j, 75}];
H1//MatrixPlot
252specialarrays-7323871970862718619

We can directly construct H in a the standard fashion:

 H = 
  KroneckerProduct[H1, IdentityMatrix[Length@H2]] + 
    KroneckerProduct[IdentityMatrix[Length@H1], H2];//AbsoluteTiming//First
 4.523075`

It takes a while and we see it has a large memory footprint:

 H // ByteCount
 340260728

But if we visualize it we see it's most zeros:

 H // MatrixPlot
252specialarrays-7668129619617107787

If we build it instead use SparseArray nothing will be done with that vast field of zeros, so it'll be faster and less memory intensive:

 HS = 
  KroneckerProduct[H1, IdentityMatrix[Length@H2, SparseArray]] + 
    KroneckerProduct[IdentityMatrix[Length@H1, SparseArray], H2];//AbsoluteTiming//First
HS//ByteCount
 0.118826`
 11191512

And even better, when we ask for its smallest 5 Eigenvalues these will be computed faster, and increasingly so with matrix size:

 Eigenvalues[H, -5]//AbsoluteTiming
Eigenvalues[HS, -5]//AbsoluteTiming
 {4.943048`,{0.3917953580905429`,0.3745580141408224`,0.3611510903074643`,0.35157483336542145`,0.34582905618130677`}}
 {3.733548`,{0.39179535809072363`,0.37455801414116613`,0.3611510903079675`,0.35157483336567635`,0.34582905618161475`}}