MSA_Week5 : Matrix Differentiation and Jacobian of Transformation

Matrix differentiation...

Matrix differentiation is a generalization of ordinary differentiation. It is a tool for finding maximum likelihood estimates and least squares estimates. It can also be conveniently used to find the Jacobian determinant of multivariate integral transformations.

Four Types of Derivatives

  • Derivative of a matrix with respect to a scalar
  • Derivative of a scalar function of a matrix with respect to the matrix
  • Derivative of a vector with respect to a vector
  • Derivative of a matrix with respect to a matrix

I. Derivative of a Matrix with Respect to a Scalar

Definition 1.4.1

Let $Y=(y_{ij}(t))$ be a $p \times q$ matrix, whose elements are functions of the variable $t$. The derivative of $Y$ with respect to $t$ is

$$ \frac{\partial\{Y\}}{\partial t} = \left( \frac{\partial y_{ij}(t)}{\partial t} \right) = \begin{pmatrix} \frac{\partial y_{11}(t)}{\partial t} & \dots & \frac{\partial y_{1q}(t)}{\partial t} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y_{p1}(t)}{\partial t} & \dots & \frac{\partial y_{pq}(t)}{\partial t} \end{pmatrix}. $$

Property 1.4.1

If the elements of matrices $X$ and $Y$ are functions of the variable $t$, then

$$ (1) \quad \frac{\partial\{X+Y\}}{\partial t} = \frac{\partial\{X\}}{\partial t} + \frac{\partial\{Y\}}{\partial t} $$ $$ (2) \quad \frac{\partial\{XY\}}{\partial t} = \frac{\partial\{X\}}{\partial t} Y + X \frac{\partial\{Y\}}{\partial t} $$ $$ (3) \quad \frac{\partial\{X \otimes Y\}}{\partial t} = \frac{\partial\{X\}}{\partial t} \otimes Y + X \otimes \frac{\partial\{Y\}}{\partial t} $$ $$ (4) \quad \left( \frac{\partial\{X\}}{\partial t} \right)' = \frac{\partial\{X'\}}{\partial t} $$ $$ (5) \quad \frac{\partial\{X^{-1}\}}{\partial t} = -X^{-1} \frac{\partial\{X\}}{\partial t} X^{-1} $$

Property 1.4.2

If the elements of matrices $A$ and $B$ are not related to the elements of matrix $X$, and the elements of $X$ are also not related to each other, then

$$ (1) \quad \frac{\partial\{X\}}{\partial x_{ij}} = E_{ij} $$ $$ (2) \quad \frac{\partial\{AXB\}}{\partial x_{ij}} = AE_{ij}B $$

where $E_{ij}$ is an elementary matrix with 1 in position $(i,j)$ and 0 elsewhere.

II. Derivative of a Scalar Function of a Matrix with Respect to the Matrix

Definition 1.4.2

Let $y=f(X)$ be a function of an $m \times n$ matrix $X$. The derivative of $y$ with respect to $X$ is

$$ \frac{\partial y}{\partial\{X\}} = \left( \frac{\partial y}{\partial x_{ij}} \right) = \begin{pmatrix} \frac{\partial y}{\partial x_{11}} & \dots & \frac{\partial y}{\partial x_{1n}} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y}{\partial x_{m1}} & \dots & \frac{\partial y}{\partial x_{mn}} \end{pmatrix}. $$

Property 1.4.3

If $X$ is an $m \times n$ matrix, and $f(X)$ is a function of matrix $X$, then

$$ \left( \frac{\partial f(X)}{\partial\{X\}} \right)' = \frac{\partial f(X)}{\partial\{X'\}}. $$

Property 1.4.4

If $X$ is an $n$-order square matrix, then

$$ \frac{\partial \mathrm{tr}(X)}{\partial\{X\}} = I_n. $$

Property 1.4.5

If the elements of column vector $a$ are not related to the elements of column vector $x$, and the elements of $x$ are also not related to each other, then

$$ \frac{\partial a'x}{\partial\{x\}} = a. $$

Property 1.4.6

If the elements of matrices $A$ and $B$ are not related to the elements of matrix $X$, and the elements of $X$ are also not related to each other, then

$$ \frac{\partial \mathrm{tr}(AXB)}{\partial\{X\}} = A'B'. $$

In particular,

$$ \frac{\partial \mathrm{tr}(AX)}{\partial\{X\}} = A'. $$

Proof

$\mathrm{tr}(AXB) = \sum_i (e_i'AXBe_i)$,

$$ \frac{\partial \mathrm{tr}(AXB)}{\partial x_{kl}} = \frac{\partial}{\partial x_{kl}} \sum_i (e_i'AXBe_i) = \sum_i (e_i'A \frac{\partial X}{\partial x_{kl}} Be_i) = \sum_i (e_i'AE_{kl}Be_i) $$ $$ = \sum_i (e_i'Ae_k e_l'Be_i) = \sum_i (a_{ik} b_{li}) = \sum_i (b_{li} a_{ik}) = (BA)_{lk} = (A'B')_{kl}. $$

Thus, the matrix $(\frac{\partial \mathrm{tr}(AXB)}{\partial x_{kl}})$ is $(A'B')$.

Note: The step $\frac{\partial X}{\partial x_{kl}} = E_{kl}$ uses Property 1.4.2 (1). The calculation also implicitly uses Property 1.4.2 (2) in spirit within the trace.

Property 1.4.7

If matrices $A$ and $X$ are both $n$-order square matrices, and the elements of matrix $A$ are not related to the elements of matrix $X$, and $X$ is a symmetric matrix, then

$$ \frac{\partial \mathrm{tr}(AX)}{\partial\{X\}} = A + A' - \mathrm{diag}(a_{11}, a_{22}, \dots, a_{nn}). $$

Proof

For the diagonal element $x_{kk}$,

$$ \frac{\partial \mathrm{tr}(AX)}{\partial x_{kk}} = \frac{\partial \sum_i (e_i'AXe_i)}{\partial x_{kk}} = \sum_i (e_i'A \frac{\partial X}{\partial x_{kk}} e_i) $$

Since $X$ is symmetric, $\frac{\partial X}{\partial x_{kk}} = E_{kk}$.

$$ = \sum_i (e_i' A E_{kk} e_i) = \sum_i (e_i' A e_k e_k' e_i) = a_{kk}. $$

If $k \neq l$, then for the off-diagonal element $x_{kl}$, since $X$ is symmetric ($x_{kl} = x_{lk}$),

$$ \frac{\partial X}{\partial x_{kl}} = E_{kl} + E_{lk}. $$

So,

$$ \frac{\partial \mathrm{tr}(AX)}{\partial x_{kl}} = \frac{\partial \sum_i (e_i'AXe_i)}{\partial x_{kl}} = \sum_i (e_i' A \frac{\partial X}{\partial x_{kl}} e_i) = \sum_i (e_i' A (E_{kl} + E_{lk}) e_i) $$ $$ = \sum_i (e_i'AE_{kl}e_i) + \sum_i (e_i'AE_{lk}e_i) = \sum_i (e_i'Ae_k e_l'e_i) + \sum_i (e_i'Ae_l e_k'e_i) $$ $$ = a_{lk} + a_{kl}. $$

Therefore, the $(k,l)$-th element of $\frac{\partial \mathrm{tr}(AX)}{\partial\{X\}}$ is $a_{kk}$ if $k=l$ and $a_{lk} + a_{kl}$ if $k \neq l$. This corresponds to the matrix $A + A' - \mathrm{diag}(a_{11}, a_{22}, \dots, a_{nn})$.

Property 1.4.8

If the elements of matrix $A$ are not related to the elements of column vector $x$, and the elements of $x$ are also not related to each other, then

$$ \frac{\partial x'Ax}{\partial\{x\}} = (A + A')x. $$

Similarly, if $X$ is a matrix whose elements are unrelated, we have

$$ \frac{\partial \mathrm{tr}(X'AX)}{\partial\{X\}} = (A + A')X. $$

Proof

$x'Ax = \mathrm{tr}(x'Ax) = \mathrm{tr}(Axx')$. We calculate the derivative with respect to $x_l$.

$$ \frac{\partial (x'Ax)}{\partial x_l} = \frac{\partial}{\partial x_l} \sum_{j,k} a_{jk} x_j x_k = \sum_{j,k} a_{jk} (\frac{\partial x_j}{\partial x_l} x_k + x_j \frac{\partial x_k}{\partial x_l}) $$ $$ = \sum_{j,k} a_{jk} (\delta_{jl} x_k + x_j \delta_{kl}) = \sum_k a_{lk} x_k + \sum_j a_{jl} x_j $$ $$ = (Ax)_l + (A'x)_l = ((A+A')x)_l. $$

Thus, the gradient vector $\frac{\partial x'Ax}{\partial\{x\}}$ is $(A+A')x$.

(The derivation in the image uses the trace trick: $x'Ax = \mathrm{tr}(x'Ax) = \mathrm{tr}(Axx')$):

$$ \frac{\partial (x'Ax)}{\partial x_l} = \frac{\partial}{\partial x_l} \mathrm{tr}(Axx') = \mathrm{tr}\left( \frac{\partial}{\partial x_l}(Axx') \right) $$

Using the product rule for differentiation within the trace:

$$ = \mathrm{tr}\left( A \left(\frac{\partial x}{\partial x_l}\right) x' + A x \left(\frac{\partial x'}{\partial x_l}\right) \right) $$

Since $\frac{\partial x}{\partial x_l} = e_l$ and $\frac{\partial x'}{\partial x_l} = e_l'$:

$$ = \mathrm{tr}(A e_l x' + A x e_l') $$

Using linearity and cyclic property of trace:

$$ = \mathrm{tr}(A e_l x') + \mathrm{tr}(A x e_l') = \mathrm{tr}(x' A e_l) + \mathrm{tr}(e_l' A x) $$

Since the trace of a scalar is the scalar itself:

$$ = x' A e_l + e_l' A x $$

Recognizing these as components of vectors: $(A'x)_l = x' A e_l$ and $(Ax)_l = e_l' A x$

$$ = (A'x)_l + (Ax)_l = ((A' + A)x)_l. $$

This shows the $l$-th component of the gradient $\frac{\partial (x'Ax)}{\partial x}$ is the $l$-th component of $(A'+A)x$.

Property 1.4.9

If the elements of matrix $X$ are not related to each other, then

$$ \frac{\partial \det(X)}{\partial\{X\}} = \det(X)(X')^{-1}. $$

(Note: This matrix is the transpose of the adjugate matrix, or the matrix of cofactors).

Proof

By cofactor expansion, $\det(X) = \sum_j x_{ij}X_{ij}$ for any $i$, where $X_{ij}$ is the cofactor of $x_{ij}$.

It is easy to see that $\frac{\partial \det(X)}{\partial x_{ij}} = X_{ij}$.

Therefore, the derivative matrix is the matrix of cofactors $C = (X_{ij})$:

$$ \frac{\partial \det(X)}{\partial\{X\}} = \begin{pmatrix} X_{11} & \dots & X_{1n} \\[6pt] \vdots & \ddots & \vdots \\[6pt] X_{m1} & \dots & X_{mn} \end{pmatrix} = C. $$

Also, recall the formula for the inverse matrix using the adjugate matrix ($\mathrm{adj}(X) = C'$):

$$ X^{-1} = \frac{1}{\det(X)} \mathrm{adj}(X) = \frac{1}{\det(X)} C'. $$

From this, $C' = \det(X) X^{-1}$, so $C = (\det(X) X^{-1})' = \det(X) (X^{-1})'$.

Since $(X^{-1})' = (X')^{-1}$, we have $C = \det(X) (X')^{-1}$.

Therefore,

$$ \frac{\partial \det(X)}{\partial\{X\}} = \det(X) (X')^{-1}. $$

III. Derivative of a Vector with Respect to a Vector

Definition 1.4.3

Let $x$ be an $n$-dimensional column vector, $y = (y_1, y_2, \dots, y_m)' = (f_1(x), f_2(x), \dots, f_m(x))'$. Then the derivative of $y$ with respect to $x$ is defined as the Jacobian matrix:

$$ \frac{\partial y'}{\partial x} = \left( \frac{\partial y_j}{\partial x_i} \right) = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_1} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y_1}{\partial x_n} & \dots & \frac{\partial y_m}{\partial x_n} \end{pmatrix}. $$

(Note the layout: the $(i,j)$ element is $\partial y_j / \partial x_i$. This is the transpose of what is sometimes called the Jacobian matrix.)

Property 1.4.10

If $y = Ax$, and the elements of column vector $x$ are not related to each other, then

$$ \frac{\partial y'}{\partial x} = A'. $$

Proof

$y_j = \sum_k a_{jk} x_k$, for $j = 1, 2, \dots, m$.

$$ \frac{\partial y_j}{\partial x_i} = a_{ji} = (A')_{ij}. $$

Therefore, the matrix $(\frac{\partial y_j}{\partial x_i})$ is $A'$.

So,

$$ \frac{\partial y'}{\partial x} = A'. $$

IV. Derivative of a Matrix with Respect to a Matrix

Definition 1.4.4

Let $X$ be an $m \times n$ matrix, $F(X)$ be a $p \times q$ matrix. Denote $F(X) = (f_{ij}(X)) = (f_{ij})$. Then the derivative of $F(X)$ with respect to $X$ is defined as:

$$ \frac{\partial F(X)}{\partial X} = \frac{\partial [\mathrm{vec}(F(X))]'}{\partial \mathrm{vec}(X)}. $$

This is a $(mn) \times (pq)$ matrix.

Property 1.4.11 (Restated using Definition 1.4.4)

If $Y = AXB$, and the elements of matrix $X$ ($m \times p$) are not related to each other, and $A$ ($n \times m$), $B$ ($p \times q$) are constant matrices, then

$$ \frac{\partial Y}{\partial X} = \frac{\partial (\mathrm{vec}(Y))'}{\partial \mathrm{vec}(X)} = B \otimes A'. $$

Proof

We know $\mathrm{vec}(Y) = \mathrm{vec}(AXB) = (B' \otimes A)\mathrm{vec}(X)$. Let $y_{vec} = \mathrm{vec}(Y)$, $x_{vec} = \mathrm{vec}(X)$, and $M = B' \otimes A$. Then $y_{vec} = M x_{vec}$. By Definition 1.4.4 and Property 1.4.10 (applied to vectors $y_{vec}$ and $x_{vec}$),

$$ \frac{\partial Y}{\partial X} = \frac{\partial y_{vec}'}{\partial x_{vec}} = M' = (B' \otimes A)' = (B')' \otimes A' = B \otimes A'. $$

Property 1.4.12

If $X$ is an $m \times n$ matrix, and the elements of $X$ are not related to each other, then

$$ \frac{\partial X}{\partial X} = \frac{\partial (\mathrm{vec}(X))'}{\partial \mathrm{vec}(X)} = I_{mn}. $$

Proof

Let $Y=X$. Then $A=I_m$, $B=I_n$. Using Property 1.4.11:

$$ \frac{\partial X}{\partial X} = I_n \otimes I_m' = I_n \otimes I_m = I_{mn}. $$

Alternatively, let $y_{vec} = \mathrm{vec}(X)$ and $x_{vec} = \mathrm{vec}(X)$. Then $y_{vec} = I_{mn} x_{vec}$. By Property 1.4.10, $\frac{\partial y_{vec}'}{\partial x_{vec}} = (I_{mn})' = I_{mn}$.

Property 1.4.13

If the elements of matrices $A$ ($n \times m$) and $B$ ($p \times q$) are not related to the elements of matrix $X$ ($m \times p$), and the elements of $X$ are also not related to each other, then

$$ \frac{\partial (AXB)}{\partial X} = B \otimes A'. $$

This is identical to Property 1.4.11.

Property 1.4.14 (Product Rule)

If $F(X)$ is a $p \times q$ matrix function, $G(X)$ is a $q \times r$ matrix function, $X$ is an $m \times n$ matrix, and the elements of $X$ are not related to each other, then

$$ \frac{\partial F(X)G(X)}{\partial X} = \frac{\partial F(X)}{\partial X} (G(X) \otimes I_p) + \frac{\partial G(X)}{\partial X} (I_r \otimes F'(X)). $$

Property 1.4.15

If $X$ is an $n$-order nonsingular matrix, and the elements of $X$ are not related to each other, then

$$ \frac{\partial X^{-1}}{\partial X} = -[X^{-1} \otimes (X^{-1})']. $$

Proof

From $XX^{-1} = I$, and Property 1.4.14 (Product Rule). Let $F(X)=X$ and $G(X)=X^{-1}$. The derivative of the constant matrix $I$ is the zero matrix.

$$ \frac{\partial (XX^{-1})}{\partial X} = \frac{\partial X}{\partial X} (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$

Using Property 1.4.12, $\frac{\partial X}{\partial X} = I_{n^2}$.

$$ I_{n^2} (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$ $$ (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$ $$ \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = -(X^{-1} \otimes I_n). $$

Multiply on the right by $(I_n \otimes X')^{-1} = I_n \otimes (X')^{-1} = I_n \otimes (X^{-1})'$.

$$ \frac{\partial X^{-1}}{\partial X} = -(X^{-1} \otimes I_n) (I_n \otimes (X^{-1})') $$

Using the mixed product property $(A \otimes B)(C \otimes D) = (AC) \otimes (BD)$:

$$ \frac{\partial X^{-1}}{\partial X} = -( (X^{-1}I_n) \otimes (I_n (X^{-1})') ) = -( X^{-1} \otimes (X^{-1})' ). $$

Rearranging the equation yields the result.

V. Formulas for Chain Rule

Theorem 1.4.1 (Chain Rule Formula for Scalar Function)

Let $\psi(X)$ be a scalar function of a matrix variable $X$ (size $m \times n$), and each element $x_{ij}$ of $X$ is a function of a variable $t$. Then

$$ \frac{d \psi(X)}{d t} = \sum_{j=1}^n \sum_{i=1}^m \frac{\partial \psi(X)}{\partial x_{ij}} \frac{\partial x_{ij}}{\partial t} = \mathrm{tr} \left[ \left( \frac{\partial \psi(X)}{\partial \{X\}} \right)' \left( \frac{\partial \{X\}}{\partial t} \right) \right]. $$

(Note: The image shows $\mathrm{tr}[\frac{\partial\{X\}}{\partial t} (\frac{\partial \psi}{\partial\{X\}})']$, which is equivalent due to $\mathrm{tr}(AB)=\mathrm{tr}(BA)$.)

Theorem 1.4.2 (Chain Rule Formula for Matrices)

Let $F(G(X))$ be a composite function of two matrix functions $F$ and $G$. Let $U = G(X)$. Then

$$ \frac{\partial F(G(X))}{\partial X} = \frac{\partial G(X)}{\partial X} \frac{\partial F(U)}{\partial U}. $$

Property 1.4.16

If the elements of matrix $X$ are functions of a variable $t$, then

$$ (1) \quad \frac{d \det(X(t))}{d t} = \det(X) \mathrm{tr}\left(X^{-1} \frac{\partial\{X\}}{\partial t}\right) $$ $$ (2) \quad \frac{d \ln |\det(X)|}{d t} = \mathrm{tr}\left(X^{-1} \frac{\partial\{X\}}{\partial t}\right) $$

(Assuming $\det(X) > 0$ for the second formula).

Proof

Using Theorem 1.4.1 (Chain Rule) and Property 1.4.9: Let $\psi(X) = \det(X)$. Then $\frac{\partial \psi(X)}{\partial \{X\}} = \det(X) (X')^{-1}$.

$$ \frac{d \det(X(t))}{d t} = \mathrm{tr} \left[ \left( \frac{\partial \det(X)}{\partial \{X\}} \right)' \left( \frac{\partial \{X\}}{\partial t} \right) \right] $$ $$ = \mathrm{tr} \left[ (\det(X) (X')^{-1})' \left( \frac{\partial \{X\}}{\partial t} \right) \right] = \mathrm{tr} \left[ \det(X) ((X')^{-1})' \left( \frac{\partial X}{\partial t} \right) \right] $$ $$ = \det(X) \mathrm{tr} \left[ X^{-1} \frac{\partial X}{\partial t} \right]. $$

(The image derivation uses the identity $\frac{d\psi}{dt} = \mathrm{tr}\left[ \frac{\partial\{X\}}{\partial t} \left( \frac{\partial \psi}{\partial\{X\}} \right)' \right]$):

$$ \frac{d \det(X)}{dt} = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \left( \det(X) (X')^{-1} \right)' \right] $$ $$ = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \det(X) ((X')^{-1})' \right] = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \det(X) X^{-1} \right] $$ $$ = \det(X) \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} X^{-1} \right] = \det(X) \mathrm{tr}\left[ X^{-1} \frac{\partial \{X\}}{\partial t} \right]. $$

For the second formula:

$$ \frac{d \ln |\det(X)|}{d t} = \frac{1}{\det(X)} \frac{d \det(X)}{d t} $$ $$ = \frac{1}{\det(X)} \left( \det(X) \mathrm{tr}\left(X^{-1} \frac{\partial \{X\}}{\partial t}\right) \right) $$ $$ = \mathrm{tr}\left(X^{-1} \frac{\partial \{X\}}{\partial t}\right). $$

VI. Calculation of the Jacobian Determinant

For a change of variables $y=f(x)$ in a multiple integral:

$$ \int_D g(x_1, x_2, \dots, x_n) dx_1 dx_2 \dots dx_n = \int_T g(f^{-1}(y)) |J(x \to y)| dy $$

where $D \subset \mathbb{R}^n$, $T = \{y | y = f(x), x \in D\}$, and $J(x \to y)$ is the Jacobian determinant.

$$ |J(x \to y)| = \left| \det \left( \frac{\partial x}{\partial y'} \right) \right| = \left| \det \left( \frac{\partial x_i}{\partial y_j} \right) \right|. $$

The notation $|\frac{\partial x'}{\partial y}|_+$ is used in the image, where $|A|_+ = |\det(A)|$. $f$ is the transformation.

Definition 1.4.5

Let $X \in \mathbb{R}^{m \times n}$ be a matrix variable, $Y = F(X) \in \mathbb{R}^{m \times n}$ be a transformation. If $F(X)$ is differentiable, then the Jacobian determinant of the transformation $Y=F(X)$ is defined as

$$ J(Y \to X) = J(Y: X) = \left| \frac{\partial Y}{\partial X} \right|_+ = \left| \det \left( \frac{\partial (\mathrm{vec}(Y))'}{\partial \mathrm{vec}(X)} \right) \right|. $$

(Note: The image uses $|\frac{\partial y}{\partial x}|_+$ where $y=\mathrm{vec}(Y), x=\mathrm{vec}(X)$. This corresponds to $|\det(\frac{\partial Y}{\partial X})|$ based on Definition 1.4.4.)

Property 1.4.17

If $Y = AXB$, where $X$ is $m \times n$, $A$ is an $m \times m$ nonsingular square matrix, and $B$ is an $n \times n$ nonsingular square matrix, then

$$ J(Y \to X) = |(\det(A))^n (\det(B))^m| = |\det(A)|^n |\det(B)|^m = |A|_+^n |B|_+^m. $$

In the special vector case $y = Ax$ (where $x, y$ are $m \times 1$ vectors, $A$ is $m \times m$), then $J(y \to x) = |\det(A)| = |A|_+$.

Proof

Because $\mathrm{vec}(Y) = (B' \otimes A)\mathrm{vec}(X)$. Let $y_{vec} = \mathrm{vec}(Y)$ and $x_{vec} = \mathrm{vec}(X)$. The Jacobian matrix (using derivative definition 1.4.4) is

$$ \frac{\partial Y}{\partial X} = \frac{\partial y_{vec}'}{\partial x_{vec}} = (B' \otimes A)' = B \otimes A'. $$

The Jacobian determinant is

$$ J(Y \to X) = \left| \det \left( B \otimes A' \right) \right|_+. $$

Using the property $\det(P \otimes Q) = (\det P)^q (\det Q)^p$ where $P$ is $p \times p$ and $Q$ is $q \times q$. Here $B$ is $n \times n$ and $A'$ is $m \times m$.

$$ \det(B \otimes A') = (\det B)^m (\det A')^n = (\det B)^m (\det A)^n. $$

So,

$$ J(Y \to X) = |(\det A)^n (\det B)^m| = |\det A|_+^n |\det B|_+^m. $$

Property 1.4.18

If $X$ is an $n$-order invertible square matrix, consider the transformation $Y = X^{-1}$. Then

$$ J(Y \to X) = |\det(X)|^{-2n} = |X|_+^{-2n}. $$

Proof

From Property 1.4.15, the derivative of the transformation $Y=X^{-1}$ is

$$ \frac{\partial Y}{\partial X} = \frac{\partial X^{-1}}{\partial X} = -[X^{-1} \otimes (X^{-1})']. $$

Therefore, the Jacobian determinant is

$$ J(Y \to X) = \left| \det \left( \frac{\partial Y}{\partial X} \right) \right|_+ = \left| \det \left( -[X^{-1} \otimes (X^{-1})'] \right) \right|_+. $$

The matrix size is $(n^2) \times (n^2)$.

$$ \det(-M) = (-1)^{n^2} \det(M). $$ $$ \det(X^{-1} \otimes (X^{-1})') = (\det(X^{-1}))^n (\det((X^{-1})'))^n = (\det X)^{-n} (\det X^{-1})^n = (\det X)^{-n} (\det X)^{-n} = (\det X)^{-2n}. $$

So,

$$ J(Y \to X) = |(-1)^{n^2} (\det X)^{-2n}|_+ = |(\det X)^{-2n}| = |\det X|^{-2n} = |X|_+^{-2n}. $$

(The image uses the property $|\det(A \otimes B)| = |\det(A)|^q |\det(B)|^p$ where $A$ is $p \times p$ and $B$ is $q \times q$, along with $|M|_+ = |\det(M)|$):

$$ J(Y \to X) = \left| \det\left( -\left[X^{-1} \otimes (X^{-1})'\right] \right) \right|_+ $$

Since the derivative matrix is $(n^2) \times (n^2)$ and using properties of determinants:

$$ = \left| (-1)^{n^2} \det(X^{-1} \otimes (X^{-1})') \right|_+ $$ $$ = \left| (-1)^{n^2} (\det(X^{-1}))^n (\det((X^{-1})'))^n \right|_+ $$ $$ = \left| (\det(X^{-1}))^n (\det(X^{-1}))^n \right|_+ = \left| ((\det X)^{-1})^n ((\det X)^{-1})^n \right|_+ $$ $$ = \left| (\det X)^{-2n} \right|_+ = |\det X|^{-2n} = |X|_+^{-2n}. $$

Chapter 1 Summary

  • Operations, determinant, and inverse formulas for partitioned matrices.
  • Definition and properties of generalized inverse.
  • Vectorization operation and Kronecker product of matrices.
  • Definition and properties of matrix derivatives.