are fundamental objects in various fields of modern computer science and mathematics, including but not limited to linear Algebra, machine learning, and computer graphics.
In the current series of 4 stories, I will present a way of interpreting algebraic matrices so that the physical meaning of various Matrix analysis formulas will become clearer. For example, the formula for multiplying 2 matrices:
\[\begin{equation}
c_{i,j} = \sum_{k=1}^{p} a_{i,k}*b_{k,j}
\end{equation}\]
or the formula for inverting a chain of matrices:
\[\begin{equation}
(ABC)^{-1} = C^{-1}B^{-1}A^{-1}
\end{equation}\]
Probably for most of us, when we were reading matrix-related definitions and formulas for the first time, questions like the following ones arose:
- what does a matrix actually represent,
- what is the physical meaning of multiplying a matrix by a vector,
- why multiplication of 2 matrices is performed by such a non-standard formula,
- why for multiplication the number of columns of the first matrix must be equal to the number of rows of the second one,
- what is the meaning of transposing a matrix,
- why for certain types of matrices, inversion equals to transposition,
- … and so on.
In this series, I plan to present one way of answering most of the listed questions. So let’s dive in!
But before starting, here are a couple of notation rules that I use throughout this series:
- Matrices are denoted by uppercase (like A, B), while vectors and scalars are denoted by lowercase (like x, y or m, n),
- ai,j – The value of i-th row and j-th column of matrix ‘A‘,
- xi – the i-th value of vector ‘x‘.
Multiplication of a matrix by a vector
Let’s put aside for now the simplest operations on matrices, which are addition and subtraction. The next simplest manipulation is probably the multiplication of a matrix by a vector:
\[\begin{equation}
y = Ax
\end{equation}\]
We know that the result of such an operation is another vector ‘y‘, which has a length equal to the number of rows of ‘A‘, while the length of ‘x‘ should be equal to the number of columns of ‘A‘.
Let’s concentrate on “n*n” square matrices for now (those with equal numbers of rows and columns). We will observe the behavior of rectangular matrices a bit later.
The formula for calculating yi is:
\[\begin{equation}
y_i = \sum_{j=1}^{n} a_{i,j}*x_j
\end{equation}\]
… which, if written in the expanded way, is:
\[\begin{equation}
\begin{cases}
y_1 = a_{1,1}x_1 + a_{1,2}x_2 + \dots + a_{1,n}x_n \\
y_2 = a_{2,1}x_1 + a_{2,2}x_2 + \dots + a_{2,n}x_n \\
\;\;\;\;\; \vdots \\
y_n = a_{n,1}x_1 + a_{n,2}x_2 + \dots + a_{n,n}x_n
\end{cases}
\end{equation}\]
Such expanded notation clearly shows that every cell ai,j is present in the system of equations only once. More precisely, ai,j is present as the factor of xj, and participates only in the sum of yi. This leads us to the following interpretation:
In the product of a matrix by a vector “y = Ax”, a certain cell ai,j describes how much the output value yi is affected by the input value xj.
Having that said, we can draw the matrix geometrically, in the following way:
And as we are going to interpret matrix ‘A‘ as influences of values xj on values yi, it is reasonable to attach values of ‘x‘ to the right stack, which will result in values of ‘y‘ being present at the left stack.

I prefer to call this interpretation of matrices as “X-way interpretation”, as the placement of presented arrows looks like the English letter “X”. And for a certain matrix ‘A‘, I prefer to call such a drawing as “X-diagram” of ‘A‘.
Such interpretation clearly shows that the input vector ‘x‘ goes through some kind of transformation, from right to left, and becomes vector ‘y‘. This is the reason why in Linear Algebra, matrices are also called “transformation matrices”.
If looking at any k-th item of the left stack, we can see how all the values of ‘x‘ are being accumulated towards it, while being multiplied by coefficients ak,j (which are the k-th row of the matrix).


At the same time, if looking at any k-th item of the right stack, we can see how the value xk is being distributed over all values of ‘y’, while being multiplied by coefficients ai,k (which are now the k-th column of the matrix).


This already gives us another insight, that when interpreting a matrix in the X-way, the left stack can be associated with rows of the matrix, while the right stack can be associated with its columns.
Surely, if we are interested in locating some value ai,j, looking at its X-diagram is not as convenient as looking at the matrix in its ordinary way – as a rectangular table of numbers. But, as we will see later and in the next stories of this series, X-way interpretation explicitly presents the meaning of various algebraic operations over matrices.
Rectangular matrices
Multiplication of the form “y = Ax” is allowed only if the length of vector ‘x‘ is equal to the number of columns of matrix ‘A‘. At the same time, the result vector ‘y‘ will have a length equal to the number of rows of ‘A‘. So, if ‘A‘ is a rectangular matrix, vector ‘x‘ will change its length while passing through its transformation. We can observe it by looking at X-way interpretation:


Now it is clear why we can multiply ‘A‘ only on such a vector ‘x‘, the length of which is equal to the number of columns of ‘A‘: because otherwise the vector ‘x‘ will simply not fit on the right side of the X-diagram.
Similarly, it is clear why the length of the result vector “y = Ax” is equal to the number of rows of ‘A‘.
Viewing rectangular matrices in the X-way strokes, we have previously made an insight, which is that items of the left stack of the X-diagram correspond to rows of the illustrated matrix, while items of its right stack correspond to columns.
Observing several special matrices in X-way interpretation
Let’s see how X-way interpretation will help us to understand the behavior of certain special matrices:
Scale / diagonal matrix
A scale matrix is such a square matrix that has all cells of its main diagonal equal to some value ‘s‘, while having all other cells equal to 0. Multiplying a vector “x” by such a matrix results in every value of “x” being multiplied by ‘s‘:
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
s & 0 & \dots & 0 & 0 \\
0 & s & \dots & 0 & 0 \\
& & \vdots \\
0 & 0 & \dots & s & 0 \\
0 & 0 & \dots & 0 & s
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
s x_1 \\ s x_2 \\ \vdots \\ s x_{n-1} \\ s x_n
\end{pmatrix}
\end{equation*}\]
The X-way interpretation of a scale matrix shows its physical meaning. As the only non-zero cells here are the ones on the diagonal – ai,i, the X-diagram will have arrows only between corresponding pairs of input and output values, which are xi and yi.

A special case of a scale matrix is the diagonal matrix (also called an “identity matrix”), often denoted with the letters “E” or “I” (we will use “E” in the current writing). It is a scale matrix with the parameter “s=1″.
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
1 & 0 & \dots & 0 & 0 \\
0 & 1 & \dots & 0 & 0 \\
& & \vdots \\
0 & 0 & \dots & 1 & 0 \\
0 & 0 & \dots & 0 & 1
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
\end{equation*}\]

We see that doing the multiplication “y = Ex” will just leave the vector ‘x‘ unchanged, as every value xi is just multiplied by 1.
90° rotation matrix
A matrix, which rotates a given point (x1, x2) around the zero-point (0,0) by 90 degrees counter-clockwise, has a simple form:
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2
\end{pmatrix}
=
\begin{bmatrix}
0 & -1 \\
1 & \phantom{-}0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2
\end{pmatrix}
=
\begin{pmatrix}
-x_2 \\ \phantom{-}x_1
\end{pmatrix}
\end{equation*}\]

X-way interpretation of the 90° rotation matrix shows that behavior:

Exchange matrix
An exchange matrix ‘J‘ is such a matrix that has 1s on its anti-diagonal, and has 0s at all other cells. Multiplying it by a vector ‘x‘ results in reversing the order of values of ‘x‘:
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n
\end{pmatrix}
=
\begin{bmatrix}
0 & 0 & \dots & 0 & 1 \\
0 & 0 & \dots & 1 & 0 \\
& & \vdots \\
0 & 1 & \dots & 0 & 0 \\
1 & 0 & \dots & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix}
=
\begin{pmatrix}
x_n \\ x_{n-1} \\ \vdots \\ x_2 \\ x_1
\end{pmatrix}
\end{equation*}\]
This fact is explicitly shown in the X-way interpretation of the exchange matrix ‘J‘:

The 1s reside only on the anti-diagonal here, which means that output value y1 is affected only by input value xn, then y2 is affected only by xn-1, and so on, having yn affected only by x1. This is visible on the X-diagram of the exchange matrix ‘J‘.
Shift matrix
A shift matrix is such a matrix that has 1s on some diagonal, parallel to the main diagonal, and has 0s at all remaining cells:
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5
\end{pmatrix}
=
\begin{bmatrix}
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5
\end{pmatrix}
=
\begin{pmatrix}
x_2 \\ x_3 \\ x_4 \\ x_5 \\ 0
\end{pmatrix}
\end{equation*}\]
Multiplying such a matrix by a vector “x” results in the same vector but all values shifted by ‘k‘ positions up or down. ‘k‘ is equal to the distance between the diagonal with 1s and the main diagonal. In the presented example, we have “k=1″ (diagonal with 1s is only one position above the main diagonal). If the diagonal with 1s is in the upper-right triangle, as it is in the presented example, then the shift of values of “x” is performed upwards. Otherwise, the shift of values is performed downwards.
Shift matrix can also be illustrated explicitly in the X-way:

Permutation matrix
A permutation matrix is a matrix composed of 0s and 1s, which rearranges all values of the input vector “x” in a certain way. The impression is that when multiplied by such a matrix, the values of “x” are being permuted.
To achieve that, the n*n-sized permutation matrix ‘P‘ must have ‘n‘ 1s, while all other cells must be 0. Also, no two 1s must appear in the same row or the same column. An example of a permutation matrix is:
\[\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5
\end{pmatrix}
=
\begin{bmatrix}
0 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 & 0 \\
0 & 1 & 0 & 0 & 0
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5
\end{pmatrix}
=
\begin{pmatrix}
x_4 \\ x_1 \\ x_5 \\ x_3 \\ x_2
\end{pmatrix}
\end{equation*}\]
If drawing the X-diagram of the mentioned permutation matrix ‘P‘, we will see the explanation of such behavior:

The constraint that no two 1s must appear in the same column means that only one arrow should depart from any item of the right stack. The constraint that no two 1s must appear at the same row means that only one arrow must arrive at every item of the left stack. Finally, the constraint that all the non-zero cells of a permutation matrix must be 1 means that a certain input value xj, while arriving at output value yi, will not be multiplied by any coefficient. All this results in the values of vector “x” being rearranged in a certain manner.
Triangular matrix
A triangular matrix is a matrix that has 0s at all cells either below or above its main diagonal. Let’s observe upper-triangular matrices (where 0s are below the main diagonal), as the lower-triangular ones have similar properties.
\[
\begin{equation*}
\begin{pmatrix}
y_1 \\ y_2 \\ y_3 \\ y_4
\end{pmatrix}
=
\begin{bmatrix}
a_{1,1} & a_{1,2} & a_{1,3} & a_{1,4} \\
0 & a_{2,2} & a_{2,3} & a_{2,4} \\
0 & 0 & a_{3,3} & a_{3,4} \\
0 & 0 & 0 & a_{4,4}
\end{bmatrix}
*
\begin{pmatrix}
x_1 \\ x_2 \\ x_3 \\ x_4
\end{pmatrix}
=
\begin{pmatrix}
\begin{aligned}
a_{1,1}x_1 + a_{1,2}x_2 + a_{1,3}x_3 + a_{1,4}x_4 \\
a_{2,2}x_2 + a_{2,3}x_3 + a_{2,4}x_4 \\
a_{3,3}x_3 + a_{3,4}x_4 \\
a_{4,4}x_4
\end{aligned}
\end{pmatrix}
\end{equation*}
\]
Such an expanded notation illustrates that any output value yi is affected only by input values with greater or equal indexes, which are xi, xi+1, xi+2, …, xN. If drawing the X-diagram of the mentioned upper-triangular matrix, that fact becomes obvious:

Conclusion
In the first story of the series, which is devoted to the interpretation of algebraic matrices, we looked at how matrices can be presented geometrically, and called it “X-way interpretation”. Such interpretation explicitly highlights various properties of matrix-vector multiplication, as well as the behavior of matrices of several special types.
In the next story of this series, we will find an interpretation of the multiplication of two matrices by operating on their X-diagrams, so stay tuned for the second arrival!
My gratitude to:
– Roza Galstyan, for careful review of the draft,
– Asya Papyan, for the precise design of all the used illustrations ( https://www.behance.net/asyapapyan ).If you enjoyed reading this story, feel free to connect with me on LinkedIn, where, among other things, I will also post updates ( https://www.linkedin.com/in/tigran-hayrapetyan-cs/ ).
All used images, unless otherwise noted, are designed by request of the author.