Jordan Normal Forms
This section addresses the question regarding how we determine if two matrices are similar.
Definition. A $\alpha$-invariant subspace $U$ of $V$ is called cyclic relative to $\alpha$ if $U \not= 0$ and there exists a vector $u \in U$ such that $U$ is generated by $\Set{u, \alpha u, \alpha^2 u, …, \alpha^k u}$ for some $k$.
Proposition. A subspace $U$ is cyclic relative to $\alpha$ iff every vector in $U$ can be expressed in the form $f(\alpha) u_1$ for some $u_1 \in U$ and $f \in \mathbf{F}[x]$.
Proof.
($\Rightarrow$) Since $U = u_1, \alpha u_1, …, \alpha^k u_1 \rangle$, for any $u \in U$, we can write $u = a_0(u_1) + a_1(\alpha u_1) + … + a_k (\alpha^k u)$ so $f(x) = a_0 + a_1 x + … + a_k x^k$.
($\Leftarrow$) Since for any $u = f(\alpha)u_1$, $U = \langle u_1, \alpha u_1, … \rangle$ and by finite dimensionality there exists $k$ such that $U = \langle u_1, \alpha u_1, …, \alpha^k u_1 \rangle$.
In other words, the subspace of $V$ consisting of all vectors $\Set{f(\alpha) v}$ is a cyclic subspace generated by $v$.
Proposition. There exists a nonzero polynomial in $\mathbf{F}[x]$ with leading coefficient $1$, i.e.
\[m_v(x) = x^r - a_{r-1} x^{r-1} - ... - a_0\]such that $m_v(\alpha)v = 0$ and any polynomial $f \in \mathbf{F}[x]$, $f(\alpha) v = 0$ implies $m_v(x) \mid f(x)$.
Proof.
By finite dimensionality, there exists $r \ge 0$ such that $\Set{v, \alpha v, …, \alpha^{r-1} v}$ are linearly independent and $\Set{v, \alpha v, …, \alpha^{r-1} v, \alpha^r v}$ are linearly dependent, i.e.
\[\alpha^r v = a_{r-1} \alpha^{r-1} v + ... + a_0 v\]so the $m_v(x)$ stated satisfies $m_v(\alpha) v = 0$.
Consider $f(x) = m(x) q(x) + r(x)$ then by substituting $\alpha$, we have $r(\alpha) = 0$. Thus, $r(\alpha) v = 0$ but $\deg r < \deg m$ so the only possibility is $r(x) = 0$ and $m(x) \mid f(x)$.
Definition. The unique nonzero monic polynomial $m_v(x)$ such that $m_v(\alpha)v = 0$ is called the order of $v$.
Proposition. Let $\langle v \rangle$ be a nonzero cyclic subspace of $V$ and $\alpha_v$ be the restriction of $\alpha$ to the subspace $\langle v \rangle$. Then the order $m_v(x)$ of $v$ is equal to the minimal polynomial $m_{\alpha_v}(x)$ of $\alpha_v$.
Proof.
A polynomial $f \in \mathbf{F}[x]$ has the property that $f(\alpha) v = 0$ iff $f(\alpha_v) = 0$, meaning if it annihilates the starting vector $v$, it annihilates every vectors in $\langle v \rangle$. Therefore, $m_{\alpha_v}(x) \mid m_v(x)$. Conversely, $m(\alpha) v = 0$ so $m_v(x) \mid m_{\alpha_v}(x)$.
Another useful observation regarding the minimal polynomial and order is the following one.
Proposition. Suppose that $m(x)$ is the minimal polynomial of $\alpha$ on $V$. Then for every vector $v \in V$, the order $m_v(x) \mid m(x)$.
Proof.
Since $m(\alpha) v = 0$ for any vector $v \in V$, the result follows from the above.
We will now state the main governing theorem without proof since it is quite advance.
Theorem. [Elementary Divisor Theorem] Let $\alpha \in \text{End}(V)$ where $V$ is a finite dimensional vector space over an arbitrary field such that $V \not= 0$. Then there exist nonzero vectors $\Set{v_1, …, v_r}$ in $V$, whose orders are powers of prime polynomials in $\mathbf{F}[x]$, $\Set{p_1(x)^{e_1}, …, p_r(x)^{e_r}}$, and are such that $V$ is the direct sum of the cyclic subspaces
\[V = \langle v_1 \rangle \oplus \cdots \oplus \langle v_r \rangle\]The number $r$ is uniquely determined and the orders $\Set{p_i(x)^{e_i}}$ may contain repetitions and are uniquely determined up to a rearrangement.
Definition. The set of prime powers $\Set{p_1(x)^{e_1}, …, p_r(x)^{e_r}}$, with repetitions if necessary, is called the set of elementary divisors of $\alpha$.
Rational Canonical Form
We will now see that the elementary divisors describe the matrix of $\alpha$ with respect to a certain basis in a completely unambiguous way.
Proposition. Suppose that $\langle v \rangle$ is a cyclic subspace of $V$ and the order of $v$ is the prime polynomial
\[p(x) = x^d - a_{d-1} x^{d-1} - ... - a_0\]Then the matrix of $\alpha_v$ with respect to the basis $\Set{v, \alpha v, …, \alpha^{d-1} v}$ of $\langle v \rangle$ is
\[A_{p(x)} = \begin{pmatrix} 0 & 0 & 0 & \cdots & 0 & a_0 \\ 1 & 0 & 0 & \cdots & 0 & a_1 \\ 0 & 1 & 0 & \cdots & 0 & a_2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 & a_{d-1} \\ \end{pmatrix}\]Proof.
From the choice of basis, we have $\alpha(\alpha^i(v)) = 1(\alpha^{i+1}(v))$ for $0 \le i \le d-2$ and
\[\alpha(\alpha^{d-1}(v)) = \alpha^d(v) = a_0(v) + a_1 (\alpha(v)) + ... + a_{d-1}(\alpha^{d-1}(v))\]so the matrix with respect to the basis is as shown.
Definition. The matrix $A_{p(x)}$ is called the companion matrix of polynomial $p(x)$.
Proposition. Suppose that $\langle v \rangle$ is a cyclic subspace of $V$ and the order of $v$ is $p(x)^e$ where
\[p(x) = x^d - a_{d-1} x^{d-1} - ... - a_0\]Then the matrix of $\alpha_v$ with respect to basis
\[\begin{align*} \{ &p(\alpha)^{e-1}v, \alpha p(\alpha)^{e-1}v, \cdots, \alpha^{d-1} p(\alpha)^{e-1}v, \\ &p(\alpha)^{e-2}v, \alpha p(\alpha)^{e-2}v, \cdots, \alpha^{d-1} p(\alpha)^{e-2}v, \\ &\cdots, \\ &v, \alpha v, \cdots, \alpha^{d-1} v \} \end{align*}\]is
\[C = \begin{pmatrix} A_{p(x)} & B & 0 & \cdots & 0 & 0 \\ 0 & A_{p(x)} & B & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & A_{p(x)} & B \\ 0 & 0 & 0 & \cdots & 0 & A_{p(x)} \\ \end{pmatrix}\]where the matrix has $e$ blocks of the companion matrix $A_{p(x)}$ and $B$ is the $d$-by-$d$ matrix
\[B = \begin{pmatrix} 0 & \cdots & 0 & 1 \\ 0 & \cdots & 0 & 0 \\ \vdots & \ddots & \vdots & \vdots \\ 0 & \cdots & 0 & 0 \\ \end{pmatrix}\]Proof.
There are $d \times e$ vectors in the basis and these $d \times e$ vectors are linearly independent since a relation of linear dependence will produce a polynomial $g(x)$ such that $g(\alpha)v = 0$ and $\deg g(x) < \deg p(x)^e$.
We can see that $\alpha$ maps each of the vector into the next one except
\[\alpha^{d-1} p(\alpha)^{e-1}v, \alpha^{d-1} p(\alpha)^{e-2}v, ..., \alpha^{d-1}v\]By the fact that $x^d = a_0 + a_1 x + … + a_{d-1} x^{d-1} + p(x)$, for these vectors, we have
\[\alpha(\alpha^{d-1} p(\alpha)^i v) = a_0(p(\alpha)^i v) + a_1(\alpha p(\alpha)^i v) + ... + a_{d-1}(\alpha^{d-1} p(\alpha)^i v) + p(\alpha)^{i+1}v\]which matches the form of the matrix described.
Definition. The $de$-by-$de$ matrix $C$ is called the companion matrix of polynomial $p(x)^e$.
Definition. Suppose that the elementary divisors of a linear map $\alpha$ of $V$ are
\[\Set{p_1(x)^{e_1}, ..., p_r(x)^{e_r}}\]and $V = \langle v_1 \rangle \oplus \cdots \oplus \langle v_r \rangle$ where the order of $v_i$ is $p_i(x)^{e_i}$. Choose a basis of $V$ consisting of bases for the subspaces $\langle v_i \rangle$ according to the above. The rational canonical form of $\alpha$ is the matrix with respect to this basis, which is of the form
\[C = \begin{pmatrix} C_1 & 0 & \cdots & 0 \\ 0 & C_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & C_r \\ \end{pmatrix}\]where $C_i$ is the companion matrix of $p_i(x)^{e_i}$.
Proposition. The rational canonical form of a linear transformation is uniquely determined up to a rearrangement of the blocks on the diagonal.
Proof.
By the elementary divisor theorem, the elementary divisors are determined up to a rearrangement and the corresponding companion matrices are uniquely determined.
Proposition. Two $n$-by-$n$ matrices are similar if the have the same rational canonical forms (up to a rearrangement of blocks on the diagonal).
Proof.
$C = P^{-1}AP$ and $C = Q^{-1}BP$ so $B = (PQ^{-1})^{-1}A(PQ^{-1})$.
Jordan Normal Form
Proposition. The Jordan normal form of a linear map is defined to be the rational canonical form in case all the eigenvalues belong to the field $\mathbf{F}$. In that case the elementary divisors all have the form $(x - \lambda_i)^{e_i}$ and the $e_i$-by-$e_i$ companion matrix / Jordan block is of the form
\[J_{e_i}(\lambda_i) = \begin{pmatrix} \lambda_i & 1 & 0 & \cdots & 0 & 0 \\ 0 & \lambda_i & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda_i & 1 \\ 0 & 0 & 0 & \cdots & 0 & \lambda_i \\ \end{pmatrix}\]The matrix representing the linear map is a block diagonal matrix
\[A = \begin{pmatrix} J_{e_1}(\lambda_1) & 0 & \cdots & 0 \\ 0 & J_{e_2}(\lambda_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & J_{e_r}(\lambda_r) \\ \end{pmatrix}\]
Note that $J_m(\lambda) = \lambda I_m + J_m(0)$.
Consider a single Jordan block $J_n(\lambda)$, i.e.
\[J_n(\lambda) = \begin{pmatrix} \lambda & 1 & 0 & 0 \\ 0 & \lambda & \ddots & 0 \\ 0 & 0 & \ddots & 1 \\ 0 & 0 & 0 & \lambda \\ \end{pmatrix}\]For the standard basis, we have $(J_n(\lambda) - \lambda I_n)e_1 = 0$ and $(J_n(\lambda) - \lambda I_n)e_i = e_{i-1}$ for $1 < i \le n$. Thus, $(J_n(\lambda) - \lambda I_n)^k$ maps $\Set{e_1, …, e_n}$ to $\Set{0, …, e_1, …, e_{n-k}}$, i.e.
\[(J_n(\lambda) - \lambda I_n)^k = \begin{pmatrix} 0 & I_{n-k} \\ 0 & 0 \\ \end{pmatrix}\]for $k < n$ and $(J_n(\lambda) - \lambda I_n)^k = 0$ for $k \ge n$. Thus, $\chi_{J_n(\lambda)}(x) = m_{J_n(\lambda)}(x) = (x - \lambda)^n$. Moreover, $\dim E(\lambda) = 1$ since $J_n(\lambda) - \lambda I_n$ only maps $e_1$ to $0$. Hence, $\lambda$ is the only eigenvalue with $a_\lambda = c_\lambda = n$ and $g_\lambda = 1$.
In general, $a_\lambda$ is the sum of the sizes of the blocks with eigenvalue $\lambda$, i.e. the number of $\lambda$s in the diagonal. $g_\lambda$ is the number of blocks with eigenvalue $\lambda$ and $c_\lambda$ is the size of the largest block with eigenvalue $\lambda$.
References
- Simon Wadsley Linear Algebra Lectures Notes, 2016 - Chapter 6
- Charles W. Curtis Linear Algebra - An Introductory Approach, 1984 - Section 25