# Random stuff

I had considered putting a blog in this space, but decided that that would be too structured and that I would be obligated to update it regularly. I do occasionally have to urge to write something down or share a great recipe. This page won't be updated regularly but it should fill with interesting tidbits eventually. There will almost certainly not be any overarching theme, and I reserve the right to post about personal, culinary, philosophical, and highly technical (research related) issues.

# Use more memory than your machine has

In performance computing, there are times when the size of the
problem
you can treat is limited not by computing speed, but by your
machine's total memory. This is the case for parts of the map making
code I am developing. For linear algebra, I'm using AMD graphics
cards and the *acml-gpu* libraries to speed things up, and am now
completely limited by the amount of memory on the machine. I want to
hold off working on a cluster as long as possible because waiting in a
queue and transferring data back and forth from the cluster would slow
development. Just to put some numbers to things, the server I'm using
has 144 GB of memory and 4 GPUs meaning I can deal with square matrices
of order 100 000 on the side and each operation takes about an hour (in
double precision).

I have two tricks for stretching my memory further. This first is
to use memory maps. A memory map is essentially a file on disk that
your machine treats as memory. Now, your hard disk is a lot slower than
main memory, but the beauty of a memory map is that the parts of the
file that you are currently using get cached in memory, and so access
and assignment are fast. This works great if you have a
lot of operations to perform at a time on a small part of a huge piece
of data. However, this requires you to be very careful. If you access
your data too sporadically, the program will grind to a halt. Still
I've found that memory maps are unbelievably forgiving, and tend to
perform quite a bit better than you would initially expect. The memory
map functionality is available to python users through
*numpy*, and is trivial to use.

My second trick is much more subtle, and relies on modern operating
systems being virtual memory systems. In a virtual memory system, physical
memory is not allocated when a process allocates (mallocs) memory.
Instead, virtual memory is. Physical memory is only allocated when
the virtual memory is used. So when programming, you can feel free to
allocate as much memory as you want, as long as you don't use it. This
is useful for me because the matrices I deal with are symmetric. The
linear algebra routines I'm using (standard *LAPACK* routines such
as a Cholesky decomposition) never touch
the lower triangular part of the matrix. So, what I do is allocate a
huge chunk of memory, only fill half of it, and have matrix operations
performed in place. On the machine I'm using, the maximum matrix size
I can work with is 180 000 on each side, which takes about 6 hours.
Not bad for not having to use a cluster.

There is a slight complication in that *numpy* will not allow
you to allocate more memory than your machine has (using the command
np.empty). I had to hack together some *cython* code so I could
malloc the memory in *C* and convert it to a *numpy* array.
I've posted the code below, feel free to use it, change it,
whatever.

# The Binomial Inverse theorem

Here is a matrix trick that I'm sure I wasn't taught in school. Let me prefix everything by saying that I'm currently working on a map making code for the 21 cm experiment I'm involved in. Map making is big business in the CMB community and I'm trying to translate what is used there to 21 cm. Map making is all about linear algebra and dealing with large matrices and one of the big challenges is inverting some of these matrices.

The binomial inverse theorem
(a.k.a. the Woodbury matrix identity or
the Sherman-Morrison-Woodbury formula) tells you how to invert sums of
matrices for which you already know the inverses. For a general pair of
matrices it isn't that useful, since applying the formula is as much work
as computing an inverse the normal way. Where the theorem comes in
handy is when one of the matrices is of low rank. Lets say you have an
*(n*n)* matrix *A* for which you already have the inverse.
You have another matrix *B* that is the same size but only has rank
*m* which is a relatively small number. The inverse of their sum
can be computed in *O(m*n ^{2})* operations instead of the
normal

*O(n*. This is assuming you already know how to rotate

^{3})*B*to an

*(m*m)*dense matrix, but this is probably how you got

*B*in the first place (or else why would it have such a low rank?).

All fine and good, but I still need to invert *A* so I haven't
gained anything. This brings us to where the theorem is **really**
useful; if *A* of is full rank but is diagonal or block diagonal.
This symmetry lets you invert and multiply by *A* quickly but
does nothing for *A+B* unless you use binomial inverse theorem.
This might seem like a pretty limited range of applicability but this
situation has come up no less than 3 times in my map making code.

An example is instructive. Lets say you have a set of *n*
random numbers arranged in a vector *d*. Additionally, these
numbers are uncorrelated, that is their covariance matrix
*D* is diagonal, and *D ^{-1}*
is trivially computed. Now lets
say you want to add a highly correlated component

*c = d + sv*, where

*s*is a random number with variance

*S*and

*v*is some projection vector. The new covariance matrix

*C = D + vSv*is dense but its inverse can still be computed trivially:

^{T}*C*. This example is somewhat simplistic but is vaguely representative of the kind of situation that comes up all the time in cosmology.

^{-1}= D^{-1}- D^{-1}v(S^{-1}+ v^{T}D^{-1}v)^{-1}v^{T}D^{-1} Now for the catch. For the most part the identity is numerically
fairly stable. However a problem arises when the update term (*S*
in the above example) is much larger than the corresponding part of the
original matrix (*v ^{T}Dv*).
I will try to motivate the problem using the above example,
but if your not into statistics you can skip to the bottom of the
paragraph. First consider the quantities

*v*and

^{T}Dv*v*. These are the variance and information for mode

^{T}D^{-1}v*v*before we added the extra variance

*S*. Now when you add variance, you subtract information, that is,

*v*. So here we see the problem. If

^{T}C^{-1}v < S^{-1}<< v^{T}D^{-1}v*S*is large, then the update going from

*D*to

^{-1}*C*needs to subtract out and nearly cancel all the information about mode

^{-1}*v*. This is numerical disaster and tends to make your matrices cease to be positive definite. Bottom line, the identity is numerically unstable if

*S >> v*(if

^{T}Dv*v*isn't normalized you need a factor of the normalization squared on the right hand side). The generalization for multiple modes (

*v*a rectangular matrix instead of a vector) is obvious. Despite a pretty lengthy search for any reference or warning about using the identity, I had to learn this the hard way (weeks of debugging).