Understanding Matrices | Part 1: Matrix-Vector Multiplication

Contents

Multiplication of a matrix by a vector Rectangular matrices Observing several special matrices in X-way interpretation Scale / diagonal matrix 90° rotation matrix Exchange matrix Shift matrix Permutation matrix Triangular matrix Conclusion

are fundamental objects in various fields of modern computer science and mathematics, including but not limited to linear Algebra, machine learning, and computer graphics.

In the current series of 4 stories, I will present a way of interpreting algebraic matrices so that the physical meaning of various Matrix analysis formulas will become clearer. For example, the formula for multiplying 2 matrices:

\[\begin{equation}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation}\]

or the formula for inverting a chain of matrices:

\[\begin{equation}
(ABC)^{-1} = C^{-1}B^{-1}A^{-1}
\end{equation}\]

Probably for most of us, when we were reading matrix-related definitions and formulas for the first time, questions like the following ones arose:

what does a matrix actually represent,
what is the physical meaning of multiplying a matrix by a vector,
why multiplication of 2 matrices is performed by such a non-standard formula,
why for multiplication the number of columns of the first matrix must be equal to the number of rows of the second one,
what is the meaning of transposing a matrix,
why for certain types of matrices, inversion equals to transposition,
… and so on.

In this series, I plan to present one way of answering most of the listed questions. So let’s dive in!

But before starting, here are a couple of notation rules that I use throughout this series:

Matrices are denoted by uppercase (like A, B), while vectors and scalars are denoted by lowercase (like x, y or m, n),
a_i,j – The value of i-th row and j-th column of matrix ‘A‘,
x_i – the i-th value of vector ‘x‘.

Multiplication of a matrix by a vector

Let’s put aside for now the simplest operations on matrices, which are addition and subtraction. The next simplest manipulation is probably the multiplication of a matrix by a vector:

\[\begin{equation}
y = Ax
\end{equation}\]

We know that the result of such an operation is another vector ‘y‘, which has a length equal to the number of rows of ‘A‘, while the length of ‘x‘ should be equal to the number of columns of ‘A‘.

Let’s concentrate on “n*n” square matrices for now (those with equal numbers of rows and columns). We will observe the behavior of rectangular matrices a bit later.

The formula for calculating y_i is:

\[\begin{equation}
y_i = \sum_{j=1}^{n} a_{i,j}*x_j
\end{equation}\]

… which, if written in the expanded way, is:

\[\begin{equation}
\begin{cases}
y_1 = a_{1,1}x_1 + a_{1,2}x_2 + \dots + a_{1,n}x_n \\
y_2 = a_{2,1}x_1 + a_{2,2}x_2 + \dots + a_{2,n}x_n \\
\;\;\;\;\; \vdots \\
y_n = a_{n,1}x_1 + a_{n,2}x_2 + \dots + a_{n,n}x_n
\end{cases}
\end{equation}\]

Such expanded notation clearly shows that every cell a_i,j is present in the system of equations only once. More precisely, a_i,j is present as the factor of x_j, and participates only in the sum of y_i. This leads us to the following interpretation:

In the product of a matrix by a vector “y = Ax”, a certain cell a_i,j describes how much the output value y_i is affected by the input value x_j.

Having that said, we can draw the matrix geometrically, in the following way:

Geometrical interpretation of a 3×3 matrix “A” (written on the left). The right stack (purple items) corresponds to the inputs of the matrix, which are the values of vector ‘x’. The left stack (green items) corresponds to the outputs of the matrix, which are the values of vector ‘y’. Every arrow starting at ‘x_j‘ and finishing at ‘y_i‘ corresponds to a certain cell “a_i,j“.

And as we are going to interpret matrix ‘A‘ as influences of values x_j on values y_i, it is reasonable to attach values of ‘x‘ to the right stack, which will result in values of ‘y‘ being present at the left stack.

*Placing values of an input vector “x = (x₁, x₂, x₃)” at the right stack clearly shows how the values of the output vector “y = (y₁, y₂, y₃)” are obtained at the left stack.*

I prefer to call this interpretation of matrices as “X-way interpretation”, as the placement of presented arrows looks like the English letter “X”. And for a certain matrix ‘A‘, I prefer to call such a drawing as “X-diagram” of ‘A‘.

Such interpretation clearly shows that the input vector ‘x‘ goes through some kind of transformation, from right to left, and becomes vector ‘y‘. This is the reason why in Linear Algebra, matrices are also called “transformation matrices”.

If looking at any k-th item of the left stack, we can see how all the values of ‘x‘ are being accumulated towards it, while being multiplied by coefficients a_k,j (which are the k-th row of the matrix).

The accumulation of all input values (x₁, x₂, x₃) towards the output value ‘y₂‘ is highlighted with red arrows. The input values are multiplied by coefficients (9, 4, 6) respectively, which are the 2nd row of matrix ‘A’.

At the same time, if looking at any k-th item of the right stack, we can see how the value x_k is being distributed over all values of ‘y’, while being multiplied by coefficients a_i,k (which are now the k-th column of the matrix).

The distribution of the input value ‘x₃‘ towards all output values (y₁, y₂, y₃) is highlighted with red arrows. The input value ‘x₃‘ is being multiplied by coefficients (7, 6, 8) respectively, which are now the 3rd column of matrix ‘A’.

This already gives us another insight, that when interpreting a matrix in the X-way, the left stack can be associated with rows of the matrix, while the right stack can be associated with its columns.

Surely, if we are interested in locating some value a_i,j, looking at its X-diagram is not as convenient as looking at the matrix in its ordinary way – as a rectangular table of numbers. But, as we will see later and in the next stories of this series, X-way interpretation explicitly presents the meaning of various algebraic operations over matrices.

Rectangular matrices

Multiplication of the form “y = Ax” is allowed only if the length of vector ‘x‘ is equal to the number of columns of matrix ‘A‘. At the same time, the result vector ‘y‘ will have a length equal to the number of rows of ‘A‘. So, if ‘A‘ is a rectangular matrix, vector ‘x‘ will change its length while passing through its transformation. We can observe it by looking at X-way interpretation:

X-way interpretation of a 3*4 matrix ‘A’. We see that its left stack has a height of 3 (count of rows of ‘A’), while its right stack has a height of 4 (count of columns of ‘A’). There are 3*4=12 arrows overall, each corresponding to a single cell a_i,j.

Now it is clear why we can multiply ‘A‘ only on such a vector ‘x‘, the length of which is equal to the number of columns of ‘A‘: because otherwise the vector ‘x‘ will simply not fit on the right side of the X-diagram.

Similarly, it is clear why the length of the result vector “y = Ax” is equal to the number of rows of ‘A‘.

Viewing rectangular matrices in the X-way strokes, we have previously made an insight, which is that items of the left stack of the X-diagram correspond to rows of the illustrated matrix, while items of its right stack correspond to columns.

Observing several special matrices in X-way interpretation

Let’s see how X-way interpretation will help us to understand the behavior of certain special matrices:

Scale / diagonal matrix

A scale matrix is such a square matrix that has all cells of its main diagonal equal to some value ‘s‘, while having all other cells equal to 0. Multiplying a vector “x” by such a matrix results in every value of “x” being multiplied by ‘s‘:

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
s & 0 & \dots & 0 & 0 \\
0 & s & \dots & 0 & 0 \\
& & \vdots \\
0 & 0 & \dots & s & 0 \\
0 & 0 & \dots & 0 & s
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
s x_1 \\ s x_2 \\ \vdots \\ s x_{n-1} \\ s x_n
\end{pmatrix}
\end{equation*}\]

The X-way interpretation of a scale matrix shows its physical meaning. As the only non-zero cells here are the ones on the diagonal – a_i,i, the X-diagram will have arrows only between corresponding pairs of input and output values, which are x_i and y_i.

X-diagram of a scale matrix. Every output value ‘y_i‘ is affected only by the input value ‘x_i‘, which is why all the arrows in the diagram are strictly horizontal. Scale matrix multiples values of input vector “x” by ‘s’, which is why coefficients near all arrows are equal to ‘s’.

A special case of a scale matrix is the diagonal matrix (also called an “identity matrix”), often denoted with the letters “E” or “I” (we will use “E” in the current writing). It is a scale matrix with the parameter “s=1″.

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
1 & 0 & \dots & 0 & 0 \\
0 & 1 & \dots & 0 & 0 \\
& & \vdots \\
0 & 0 & \dots & 1 & 0 \\
0 & 0 & \dots & 0 & 1
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
\end{equation*}\]

*Identity matrix “E” is a scale matrix with the value “s=1” on the main diagonal.*

We see that doing the multiplication “y = Ex” will just leave the vector ‘x‘ unchanged, as every value x_i is just multiplied by 1.

90° rotation matrix

A matrix, which rotates a given point (x₁, x₂) around the zero-point (0,0) by 90 degrees counter-clockwise, has a simple form:

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2
\end{pmatrix}
=
\begin{bmatrix}
0 & -1 \\
1 & \phantom{-}0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2
\end{pmatrix}
=
\begin{pmatrix}
-x_2 \\ \phantom{-}x_1
\end{pmatrix}
\end{equation*}\]

*Counter-clockwise rotation on a plane. We see that if the original (red) point has coordinates (x1, x2), then the rotated (blue) point’s coordinates are (y1, y2) = (-x2, x1).*

X-way interpretation of the 90° rotation matrix shows that behavior:

*The X-way interpretation of 90° rotation matrix shows the exchange between x₁ and x₂ coordinates on the plane.*

Exchange matrix

An exchange matrix ‘J‘ is such a matrix that has 1s on its anti-diagonal, and has 0s at all other cells. Multiplying it by a vector ‘x‘ results in reversing the order of values of ‘x‘:

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
0 & 0 & \dots & 0 & 1 \\
0 & 0 & \dots & 1 & 0 \\
& & \vdots \\
0 & 1 & \dots & 0 & 0 \\
1 & 0 & \dots & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
x_n \\ x_{n-1} \\ \vdots \\ x_2 \\ x_1
\end{pmatrix}
\end{equation*}\]

This fact is explicitly shown in the X-way interpretation of the exchange matrix ‘J‘:

X-way interpretation of ‘J’ shows that the i-th from the top value of input vector “x” goes only to the i-th from the bottom value of output vector “y”. The coefficients of those arrows are always 1. That’s why vector “y” becomes the reverse of sequence “x”.

The 1s reside only on the anti-diagonal here, which means that output value y₁ is affected only by input value x_n, then y₂ is affected only by x_n-1, and so on, having y_n affected only by x₁. This is visible on the X-diagram of the exchange matrix ‘J‘.

Shift matrix

A shift matrix is such a matrix that has 1s on some diagonal, parallel to the main diagonal, and has 0s at all remaining cells:

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5
\end{pmatrix}
=
\begin{bmatrix}
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5
\end{pmatrix}
=
\begin{pmatrix}
x_2 \\ x_3 \\ x_4 \\ x_5 \\ 0
\end{pmatrix}
\end{equation*}\]

Multiplying such a matrix by a vector “x” results in the same vector but all values shifted by ‘k‘ positions up or down. ‘k‘ is equal to the distance between the diagonal with 1s and the main diagonal. In the presented example, we have “k=1″ (diagonal with 1s is only one position above the main diagonal). If the diagonal with 1s is in the upper-right triangle, as it is in the presented example, then the shift of values of “x” is performed upwards. Otherwise, the shift of values is performed downwards.

Shift matrix can also be illustrated explicitly in the X-way:

The X-diagram of a shift matrix shows that every input value “x_i” is just being transferred to the output value “y_i-k“, where ‘k’ is the distance between the diagonal with 1s and the main diagonal. This results in the values of the input vector “x” being shifted by ‘k’ positions. Here we have “k=1”.

Permutation matrix

A permutation matrix is a matrix composed of 0s and 1s, which rearranges all values of the input vector “x” in a certain way. The impression is that when multiplied by such a matrix, the values of “x” are being permuted.

To achieve that, the n*n-sized permutation matrix ‘P‘ must have ‘n‘ 1s, while all other cells must be 0. Also, no two 1s must appear in the same row or the same column. An example of a permutation matrix is:

\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5
\end{pmatrix}
=
\begin{bmatrix}
0 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 & 0 \\
0 & 1 & 0 & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5
\end{pmatrix}
=
\begin{pmatrix}
x_4 \\ x_1 \\ x_5 \\ x_3 \\ x_2
\end{pmatrix}
\end{equation*}\]

If drawing the X-diagram of the mentioned permutation matrix ‘P‘, we will see the explanation of such behavior:

*X-diagram of the permutation matrix presented above.*

The constraint that no two 1s must appear in the same column means that only one arrow should depart from any item of the right stack. The constraint that no two 1s must appear at the same row means that only one arrow must arrive at every item of the left stack. Finally, the constraint that all the non-zero cells of a permutation matrix must be 1 means that a certain input value x_j, while arriving at output value y_i, will not be multiplied by any coefficient. All this results in the values of vector “x” being rearranged in a certain manner.

Triangular matrix

A triangular matrix is a matrix that has 0s at all cells either below or above its main diagonal. Let’s observe upper-triangular matrices (where 0s are below the main diagonal), as the lower-triangular ones have similar properties.

\[
\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4
\end{pmatrix}
=
\begin{bmatrix}
a_{1,1} & a_{1,2} & a_{1,3} & a_{1,4} \\
0 & a_{2,2} & a_{2,3} & a_{2,4} \\
0 & 0 & a_{3,3} & a_{3,4} \\
0 & 0 & 0 & a_{4,4}
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4
\end{pmatrix}
=
\begin{pmatrix}
\begin{aligned}
a_{1,1}x_1 + a_{1,2}x_2 + a_{1,3}x_3 + a_{1,4}x_4 \\
a_{2,2}x_2 + a_{2,3}x_3 + a_{2,4}x_4 \\
a_{3,3}x_3 + a_{3,4}x_4 \\
a_{4,4}x_4
\end{aligned}
\end{pmatrix}
\end{equation*}
\]

Such an expanded notation illustrates that any output value y_i is affected only by input values with greater or equal indexes, which are x_i, x_i+1, x_i+2, …, x_N. If drawing the X-diagram of the mentioned upper-triangular matrix, that fact becomes obvious:

In the X-diagram of an upper-triangular matrix, all arrows are either horizontal or directed upwards, which illustrates the fact that any output value ‘y_i‘ is affected only by input values with the same or greater index – ‘x_i‘, ‘x_i+1‘, ‘x_i+2‘, …, ‘x_N‘.

Conclusion

In the first story of the series, which is devoted to the interpretation of algebraic matrices, we looked at how matrices can be presented geometrically, and called it “X-way interpretation”. Such interpretation explicitly highlights various properties of matrix-vector multiplication, as well as the behavior of matrices of several special types.

In the next story of this series, we will find an interpretation of the multiplication of two matrices by operating on their X-diagrams, so stay tuned for the second arrival!

My gratitude to:
– Roza Galstyan, for careful review of the draft,
– Asya Papyan, for the precise design of all the used illustrations ( https://www.behance.net/asyapapyan ).

If you enjoyed reading this story, feel free to connect with me on LinkedIn, where, among other things, I will also post updates ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ ).

All used images, unless otherwise noted, are designed by request of the author.