1.2.1 Minus Inverse Definition 1.2.1 (Minus Inverse) Let A A A be an n × p n \times p n × p matrix. If there exists a p × n p \times n p × n matrix X X X such that
A X A = A , A\,X\,A = A, A X A = A ,
then X X X is called a minus inverse of A A A , denoted by A − . A^-. A − .
Property 1.2.1 If A A A is nonsingular (invertible), then A − A^- A − is unique and
A − = A − 1 . A^- = A^{-1}. A − = A − 1 .
Proof. Since A × A − 1 × A = A A \times A^{-1} \times A = A A × A − 1 × A = A , A − 1 A^{-1} A − 1 is a generalized inverse of A A A . If X X X is another generalized inverse, then A × X × A = A A \times X \times A = A A × X × A = A . Multiplying on the left and right by A − 1 A^{-1} A − 1 yields X = A − 1 X = A^{-1} X = A − 1 . Hence uniqueness.
Property 1.2.2 Every matrix A A A has at least one minus inverse, but it may not be unique (unless A A A is invertible).
Sketch of construction/proof. Suppose r a n k ( A ) = r \mathrm {rank}(A) = r rank ( A ) = r . There exist nonsingular matrices P ( size n × n ) P \quad (\text{size } n \times n) P ( size n × n ) and Q ( size p × p ) Q \quad (\text{size } p \times p) Q ( size p × p ) such that
A = P ( I r 0 0 0 ) Q . A = P \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} Q. A = P ( I r 0 0 0 ) Q .
Equivalently:
A = P ( I r 0 0 0 ) Q = ( P ) ( I r 0 ) ( I r 0 ) ( Q ) , A = P \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} Q = \bigl(P\bigr) \Bigl( I_r \quad 0 \Bigr) \begin{pmatrix} I_r \\ 0 \end{pmatrix} \bigl(Q\bigr), A = P ( I r 0 0 0 ) Q = ( P ) ( I r 0 ) ( I r 0 ) ( Q ) ,
where P P P and Q Q Q are invertible, and the middle block has rank r r r .
To construct A − A^- A − , consider:
X = Q − 1 ( I r T 12 T 21 T 22 ) P − 1 , X = Q^{-1} \begin{pmatrix} I_r & T_{12} \\ T_{21} & T_{22} \end{pmatrix} P^{-1}, X = Q − 1 ( I r T 21 T 12 T 22 ) P − 1 ,
where T 12 , T 21 , T 22 T_{12}, T_{21}, T_{22} T 12 , T 21 , T 22 can be arbitrarily chosen (of compatible dimensions). One checks that A X A = A . A\,X\,A = A. A X A = A . Thus X X X is a minus inverse of A A A . This shows existence but also shows that in general it is not unique (unless A A A is invertible).
Property 1.2.3 For any n × p n \times p n × p matrix A A A :
r a n k ( A − ) ≥ r a n k ( A ) . \mathrm{rank}(A^-) \;\ge\; \mathrm{rank}(A). rank ( A − ) ≥ rank ( A ) .
Proof. From the same decomposition
A = P ( I r 0 0 0 ) Q , A = P \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} Q, A = P ( I r 0 0 0 ) Q ,
we take
A − = Q − 1 ( I r T 12 T 21 T 22 ) P − 1 . A^- = Q^{-1} \begin{pmatrix} I_r & T_{12}\\ T_{21} & T_{22} \end{pmatrix} P^{-1}. A − = Q − 1 ( I r T 21 T 12 T 22 ) P − 1 .
Because
( I r T 12 T 21 T 22 ) \begin{pmatrix} I_r & T_{12}\\ T_{21} & T_{22} \end{pmatrix} ( I r T 21 T 12 T 22 )
has rank at least (r), it follows that
r a n k ( A − ) ≥ r = r a n k ( A ) . \mathrm{rank}(A^-) \;\ge\; r \;=\; \mathrm{rank}(A). rank ( A − ) ≥ r = rank ( A ) .
Property 1.2.4 For any n × p n \times p n × p matrix A A A :
r a n k ( A ) = r a n k ( A A − ) = r a n k ( A − A ) = t r ( A A − ) = t r ( A − A ) . \mathrm{rank}(A) = \mathrm{rank}\bigl(A\,A^-\bigr) = \mathrm{rank}\bigl(A^-\,A\bigr) = \mathrm{tr}\bigl(A\,A^-\bigr) = \mathrm{tr}\bigl(A^-\,A\bigr). rank ( A ) = rank ( A A − ) = rank ( A − A ) = tr ( A A − ) = tr ( A − A ) .
Proof. Let r a n k ( A ) = r \mathrm{rank}(A) = r rank ( A ) = r . Then
A = P ( I r 0 0 0 ) Q , A − = Q − 1 ( I r T 12 T 21 T 22 ) P − 1 . A = P \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} Q, \quad A^- = Q^{-1} \begin{pmatrix} I_r & T_{12}\\ T_{21} & T_{22} \end{pmatrix} P^{-1}. A = P ( I r 0 0 0 ) Q , A − = Q − 1 ( I r T 21 T 12 T 22 ) P − 1 .
Hence
A A − = P ( I r 0 0 0 ) Q Q − 1 ( I r T 12 T 21 T 22 ) P − 1 = P ( I r T 12 0 0 ) P − 1 . A\,A^- = P \begin{pmatrix} I_r & 0 \\ 0 & 0 \end{pmatrix} Q \,Q^{-1} \begin{pmatrix} I_r & T_{12}\\ T_{21} & T_{22} \end{pmatrix} P^{-1} = P \begin{pmatrix} I_r & T_{12}\\ 0 & 0 \end{pmatrix} P^{-1}. A A − = P ( I r 0 0 0 ) Q Q − 1 ( I r T 21 T 12 T 22 ) P − 1 = P ( I r 0 T 12 0 ) P − 1 .
It follows that
r a n k ( A A − ) = r . \mathrm{rank}(A\,A^-) = r. rank ( A A − ) = r .
Also,
( A A − ) 2 = A A − A A − = A A − , (A\,A^-)^2 = A\,A^-\,A\,A^- = A\,A^-, ( A A − ) 2 = A A − A A − = A A − ,
which implies A A − A\,A^- A A − is idempotent; hence
r a n k ( A A − ) = t r ( A A − ) = r . \mathrm{rank}(A\,A^-) = \mathrm{tr}(A\,A^-) = r. rank ( A A − ) = tr ( A A − ) = r .
A similar argument shows
r a n k ( A − A ) = t r ( A − A ) = r . \mathrm{rank}(A^-\,A) = \mathrm{tr}(A^-\,A) = r. rank ( A − A ) = tr ( A − A ) = r .
In particular:
If r a n k ( A ) = p \mathrm{rank}(A)=p rank ( A ) = p (full column rank), then A − A = I p A^-\,A = I_p A − A = I p . If r a n k ( A ) = n \mathrm {rank}(A)=n rank ( A ) = n (full row rank), then A A − = I n A\,A^- = I_n A A − = I n . Property 1.2.5 For any n × p n \times p n × p matrix A A A ,
A ′ ( A ′ A ) − A ′ = A ′ , A ( A ′ A ) − A ′ A = A . A'(A' A)^- A' \;=\; A', \quad A\,(A' A)^-\;A' A \;=\; A. A ′ ( A ′ A ) − A ′ = A ′ , A ( A ′ A ) − A ′ A = A .
(Here A ′ A' A ′ denotes A A A transposed.)
Sketch of the proof.
Show
A x = 0 ⟺ A ′ A x = 0. A\,x = 0 \;\;\Longleftrightarrow\;\; A'A\,x = 0. A x = 0 ⟺ A ′ A x = 0.
Show
A x = A y ⟺ A ′ A x = A ′ A y . A\,x = A\,y \;\;\Longleftrightarrow\;\; A'A\,x = A'A\,y. A x = A y ⟺ A ′ A x = A ′ A y .
From these, deduce
A ′ ( A ′ A ) − A ′ = A ′ and A ( A ′ A ) − A ′ A = A . A'(A'A)^-A' = A' \quad\text{and}\quad A\,(A'A)^-\,A'A = A. A ′ ( A ′ A ) − A ′ = A ′ and A ( A ′ A ) − A ′ A = A .
A key step is “canceling” a factor with proper rank conditions. The result is
A ( A ′ A ) − A ′ A = A and A ′ ( A ′ A ) − A ′ = A ′ . A\,(A'A)^-A' A = A \quad\text{and}\quad A'(A'A)^-A' = A'. A ( A ′ A ) − A ′ A = A and A ′ ( A ′ A ) − A ′ = A ′ .
(Additional Note on “Cancelation”) A B C = 0 and B C = 0 ⟺ r a n k ( A B ) = r a n k ( B ) . A\,B\,C = 0 \;\;\text{and}\;\; B\,C = 0 \quad\Longleftrightarrow\quad \mathrm{rank}(A\,B) = \mathrm{rank}(B). A B C = 0 and B C = 0 ⟺ rank ( A B ) = rank ( B ) . C A B = 0 and C A = 0 ⟺ r a n k ( A B ) = r a n k ( A ) . C\,A\,B = 0 \;\;\text{and}\;\; C\,A = 0 \quad\Longleftrightarrow\quad \mathrm{rank}(A\,B) = \mathrm{rank}(A). C A B = 0 and C A = 0 ⟺ rank ( A B ) = rank ( A ) .
Property 1.2.6 For any matrix A A A , the matrix
A ( A ′ A ) − A ′ A\,(A'A)^-\,A' A ( A ′ A ) − A ′
is a projection matrix (i.e., it is idempotent) and does not depend on the particular choice of the minus inverse A ′ A − A'A^- A ′ A − .
1. Independence of the choice If ( A ′ A ) 1 − 1 and ( A ′ A ) 2 − 1 (A'A)^{-1}_1 \quad \text{and} \quad (A'A)^{-1}_2 ( A ′ A ) 1 − 1 and ( A ′ A ) 2 − 1 are two different minus inverses of A ′ A A'A A ′ A , one can show
A ′ A ( A ′ A ) 1 − A ′ A = A ′ A = A ′ A ( A ′ A ) 2 − A ′ A , A' A \,(A'A)^-_1\, A' A \;=\; A' A \;=\; A' A \,(A'A)^-_2\, A' A, A ′ A ( A ′ A ) 1 − A ′ A = A ′ A = A ′ A ( A ′ A ) 2 − A ′ A ,
which implies
A ( A ′ A ) 1 − A ′ = A ( A ′ A ) 2 − A ′ . A\,(A'A)^-_1\,A' \;=\; A\,(A'A)^-_2\,A'. A ( A ′ A ) 1 − A ′ = A ( A ′ A ) 2 − A ′ .
Hence the product ( A , ( A ′ A ) − 1 , A ′ ) (A, (A'A)^{-1}, A') ( A , ( A ′ A ) − 1 , A ′ ) is indeed independent of which minus inverse is chosen.
2. Symmetry Since A ′ A A'A A ′ A is symmetric, it can be diagonalized by an orthogonal matrix. A suitable choice of ( A ′ A ) − 1 (A'A)^{-1} ( A ′ A ) − 1 can also be made symmetric, implying that
A ( A ′ A ) − A ′ A\,(A'A)^-\,A' A ( A ′ A ) − A ′
is itself symmetric.
3. Idempotence One checks
( A ( A ′ A ) − A ′ ) 2 = A ( A ′ A ) − A ′ A ⏟ common factor ( A ′ A ) − A ′ = A ( A ′ A ) − A ′ . \bigl(A\,(A'A)^-\,A'\bigr)^2 \;=\; A\,(A'A)^-\, \underbrace{A'\,A}_{\text{common factor}}\, (A'A)^-\, A' \;=\; A\,(A'A)^-\,A'. ( A ( A ′ A ) − A ′ ) 2 = A ( A ′ A ) − common factor A ′ A ( A ′ A ) − A ′ = A ( A ′ A ) − A ′ .
Hence
A ( A ′ A ) − A ′ A\,(A'A)^-\,A' A ( A ′ A ) − A ′
is idempotent; in other words, it is a projection matrix.
Property 1.2.7 For any arbitrary matrix A A A , the matrix
A ( A ′ A ) − A ′ A (A'A)^{-} A' A ( A ′ A ) − A ′
is the projection matrix onto the column space R ( A ) R(A) R ( A ) . Denote it by
P A ≡ A ( A ′ A ) − A ′ . P_A \;\equiv\; A (A'A)^{-} A'. P A ≡ A ( A ′ A ) − A ′ .
For every x ∈ R ( A ) x \in R(A) x ∈ R ( A ) , we have P A x = x . P_A x \;=\; x. P A x = x .
For every u ∈ R n u \in \mathbb{R}^n u ∈ R n , we have P A u ∈ R ( A ) . P_A u \;\in\; R(A). P A u ∈ R ( A ) .
Proof.
For every x ∈ R ( A ) , x \in R(A), x ∈ R ( A ) , there exists some y ∈ R p y \in \mathbb{R}^p y ∈ R p such that x = A y x = A y x = A y . Then P A x = A ( A ′ A ) − A ′ A y = A y = x . P_A x \;=\; A (A'A)^{-} A' \,A y \;=\; A y \;=\; x. P A x = A ( A ′ A ) − A ′ A y = A y = x .
For every u ∈ R n u \in \mathbb{R}^n u ∈ R n , P A u = A ( A ′ A ) − A ′ A u = A ( ( A ′ A ) − A ′ A u ) ∈ R ( A ) . P_A u \;=\; A (A'A)^{-} A' \,A u \;=\; A\bigl((A'A)^{-} A' \,A u\bigr) \;\in\; R(A). P A u = A ( A ′ A ) − A ′ A u = A ( ( A ′ A ) − A ′ A u ) ∈ R ( A ) .
Therefore,
P A = A ( A ′ A ) − A ′ P_A \;=\; A (A'A)^{-} A' P A = A ( A ′ A ) − A ′
is indeed the projection matrix onto R ( A ) R(A) R ( A ) .