Mathematica Programming » Performance Tuning
Packed Arrays
Mathematica has a wide variety of low-level optimizations that we never run into in our high-level usage. One of the most useful of these is the concept of the PackedArray
.
These are efficiently stored arrays of a single type of object that perform much faster on many simple numerical manipulations than their "unpacked" companions.
Most internal functions that return arrays will return them in packed form. We can check that an array is packed with Developer`PackedArrayQ
:
reals = RandomReal[{-1, 1}, {1000, 1000}];
Developer`PackedArrayQ@reals
True
The memory footprint of these is much lower than standard arrays:
ureals=Developer`FromPackedArray@reals;
Developer`PackedArrayQ@ureals
ByteCount@reals
ByteCount@ureals
False
8000152
24208200
And operations that can return a packed array are faster than their unpacked variants:
1+reals//RepeatedTiming//First
1+ureals//RepeatedTiming//First
0.0024543400000000002`3.
0.0099898333333333315`2.
Unpacking
The biggest thing to worry about with packed arrays is the time/memory footprint that occurs when a packed array must be unpacked to be processed by a function that can't handle it in packed form.
For instance the function Chop
necessarily unpacks as it replaces the Real
number 0.
with the Integer
number 0
. This means Chop
will actually be slower on a packed array:
Chop@reals//RepeatedTiming//First
Chop@ureals//RepeatedTiming//First
0.1129493750000000046`2.
0.052316666666666671`2.
But if we use a function that can leverage the packing of the array it will be faster. For instance, let's look at the Real
variant of Chop
, a function called Threshold
:
Threshold@reals//RepeatedTiming//First
Threshold@ureals//RepeatedTiming//First
0.0030930624999999999`2.
0.0107768695652173899`2.
The packed version is much faster now.
RawArray
The RawArray
is a relatively new addition to the language. It works rather like PackedArray
in its requirement ot use a single type of object in it, but is a fundamentally different object.
It's main benefit is how little memory it consumes:
reals//ByteCount
8000152
If we use a 64-bit real we get similar memory consumption to the packed array:
raw64=RawArray["Real64", reals];
raw64//ByteCount
8000096
But we can specify that it out to use half as many bytes, as the memory usage will drop correspondingly:
raw32=RawArray["Real32", reals];
raw32//ByteCount
4000096
These can also be used more effectively via Library Link but we won't get into that now. For many data-intensive custom Import
formats RawArray
can help cut down on the memory consumption, particularly where some large portions of the data may not be immediately useful.
SparseArray
We won't discuss SparseArray
much, as the sparse array is a well-known concept especially in numerical linear algebra.
It is worth keeping in mind, though, for places where one has highly sparse data. One classic place this can show up is in quantum mechanical calculations in a direct-product basis constructed from orthonormal bases. Here often one will obtain expressions that looks like
H[n, m, i, j] = H1*δ[i, j] + H2*δ[n, m]
Assuming N1
elements in the first basis and N2
in the second, this can be more efficiently represented as:
H = KroneckerProduct[H1, IdentityMatrix[N2]] + KroneckerProduct[IdentityMatrix[N1], H2]
And now if we visualize this with H1
and H2
being odd-looking matrices which arise in this context:
H1=Table[(1/2)(-1)^(i-j)If[i==j, π^2/3, 2/(i-j)^2], {i, 50}, {j, 50}];
H2=Table[((-1)^(i-j)/(2*(.05)^2))If[i==j, π^2/3, 2/(i-j)^2], {i, 75}, {j, 75}];
H1//MatrixPlot
We can directly construct H
in a the standard fashion:
H =
KroneckerProduct[H1, IdentityMatrix[Length@H2]] +
KroneckerProduct[IdentityMatrix[Length@H1], H2];//AbsoluteTiming//First
4.523075`
It takes a while and we see it has a large memory footprint:
H // ByteCount
340260728
But if we visualize it we see it's most zeros:
H // MatrixPlot
If we build it instead use SparseArray
nothing will be done with that vast field of zeros, so it'll be faster and less memory intensive:
HS =
KroneckerProduct[H1, IdentityMatrix[Length@H2, SparseArray]] +
KroneckerProduct[IdentityMatrix[Length@H1, SparseArray], H2];//AbsoluteTiming//First
HS//ByteCount
0.118826`
11191512
And even better, when we ask for its smallest 5 Eigenvalues
these will be computed faster, and increasingly so with matrix size:
Eigenvalues[H, -5]//AbsoluteTiming
Eigenvalues[HS, -5]//AbsoluteTiming
{4.943048`,{0.3917953580905429`,0.3745580141408224`,0.3611510903074643`,0.35157483336542145`,0.34582905618130677`}}
{3.733548`,{0.39179535809072363`,0.37455801414116613`,0.3611510903079675`,0.35157483336567635`,0.34582905618161475`}}