Sparse Matrix Conjugate Gradient
In the previous blog post, I introduced Hotelling's method for computing the inverse of a matrix. In this page, I would like to describe a similar tool: the sparse matrix conjugate gradient. We would like to solve the following equation:
$$\begin{equation} AX = B \end{equation}$$
where $A$, $X$, and $B$ are also sparse matrices. This can of course be used to invert a matrix, by setting $B=I$.
I will presume that you are familiar with the standard linear equation:
$$\begin{equation} Ax = b \end{equation}$$
where now $x$ and $b$ are vectors. One way to solve this is to use the conjugate gradient method. Below I've converted the algorithm shown on the Wikipedia[1] into python:
# Libraries
from scipy.sparse import rand
import numpy
from sys import argv
# Input Parameters
dimension = int(argv[1])
sparsity = float(argv[2])
# Initial Matrix and Vectors
A = rand(dimension, dimension, density=sparsity, format="csr")
A = A + A.T
x = numpy.zeros(dimension)
x[0] = 1
b = numpy.random.rand(dimension)
# Iterate
r = b - A.dot(x)
p = r.copy()
for i in range(0, 100):
Ap = A.dot(p)
top = numpy.dot(r.T, r)
bottom = numpy.dot(p.T, Ap)
alpha = top / bottom
x = x + alpha * p
r = r - alpha * Ap
norm_value = numpy.linalg.norm(r)
if norm_value < 1e-8:
break
new_top = numpy.dot(r.T, r)
beta = new_top/top
p = r + beta * p
print("Done:", i, norm_value)
Usual caveats about the condition number, existence of solutions, etc. Also, check out the cleverly titled reference [2] if you want a more detailed review of this method.
Now lets expand to the matrix case. It's easiest to think of the matrix case as being the vector case, but you're trying to solve for multiple right hand sides. Matrix-vector multiplication is now matrix-matrix multiplication, because that's how you multiply a matrix by a set of vectors. For the dot products, we just pairwise multiply the matrices and sum up their elements (see the last blog post).
# Libraries
import numpy
from scipy.sparse import identity, rand
from scipy.sparse.linalg import norm
from sys import argv
# Input Parameters
dimension = int(argv[1])
sparsity = float(argv[2])
# Initial Matrix and Vectors
A = rand(dimension, dimension, density=sparsity, format="csr")
A = A + A.T + identity(dimension)
X = identity(dimension)
B = identity(dimension)
# Iterate
R = B - A.dot(X)
P = R.copy()
for i in range(0, 1000):
AP = A.dot(P)
AP = 0.5*(AP.T + AP)
top = ((R.T).multiply(R)).sum()
bottom = ((P.T).multiply(AP)).sum()
alpha = top / bottom
X = X + alpha * P
R = R - alpha * AP
norm_value = norm(R)
if norm_value < 1e-8:
break
new_top = ((R.T).multiply(R)).sum()
beta = new_top/top
P = R + beta * P
print("Done:", i, norm_value)
I snuck in a symmetrizing operation because I noticed some drift.
One important difference between using CG and Hotelling's method is that we can also introduce a preconditioner into the mix. Additionally, there are often situations where we have a good guess for a solution, and just want to use CG to refine it. Perhaps in a future blog post we can study the performance comparison closer...
[1] https://en.wikipedia.org/wiki/Conjugate_gradient_method
[2] Shewchuk, Jonathan Richard. "An introduction to the conjugate gradient method without the agonizing pain." (1994).