MSA_Week5 : Matrix Differentiation and Jacobian of Transformation
Matrix differentiation...
Matrix differentiation is a generalization of ordinary differentiation. It is a tool for finding maximum likelihood estimates and least squares estimates. It can also be conveniently used to find the Jacobian determinant of multivariate integral transformations.
Four Types of Derivatives
- Derivative of a matrix with respect to a scalar
- Derivative of a scalar function of a matrix with respect to the matrix
- Derivative of a vector with respect to a vector
- Derivative of a matrix with respect to a matrix
I. Derivative of a Matrix with Respect to a Scalar
Definition 1.4.1
Let $Y=(y_{ij}(t))$ be a $p \times q$ matrix, whose elements are functions of the variable $t$. The derivative of $Y$ with respect to $t$ is
$$ \frac{\partial\{Y\}}{\partial t} = \left( \frac{\partial y_{ij}(t)}{\partial t} \right) = \begin{pmatrix} \frac{\partial y_{11}(t)}{\partial t} & \dots & \frac{\partial y_{1q}(t)}{\partial t} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y_{p1}(t)}{\partial t} & \dots & \frac{\partial y_{pq}(t)}{\partial t} \end{pmatrix}. $$
Property 1.4.1
If the elements of matrices $X$ and $Y$ are functions of the variable $t$, then
$$ (1) \quad \frac{\partial\{X+Y\}}{\partial t} = \frac{\partial\{X\}}{\partial t} + \frac{\partial\{Y\}}{\partial t} $$ $$ (2) \quad \frac{\partial\{XY\}}{\partial t} = \frac{\partial\{X\}}{\partial t} Y + X \frac{\partial\{Y\}}{\partial t} $$ $$ (3) \quad \frac{\partial\{X \otimes Y\}}{\partial t} = \frac{\partial\{X\}}{\partial t} \otimes Y + X \otimes \frac{\partial\{Y\}}{\partial t} $$ $$ (4) \quad \left( \frac{\partial\{X\}}{\partial t} \right)' = \frac{\partial\{X'\}}{\partial t} $$ $$ (5) \quad \frac{\partial\{X^{-1}\}}{\partial t} = -X^{-1} \frac{\partial\{X\}}{\partial t} X^{-1} $$
Property 1.4.2
If the elements of matrices $A$ and $B$ are not related to the elements of matrix $X$, and the elements of $X$ are also not related to each other, then
$$ (1) \quad \frac{\partial\{X\}}{\partial x_{ij}} = E_{ij} $$ $$ (2) \quad \frac{\partial\{AXB\}}{\partial x_{ij}} = AE_{ij}B $$
where $E_{ij}$ is an elementary matrix with 1 in position $(i,j)$ and 0 elsewhere.
II. Derivative of a Scalar Function of a Matrix with Respect to the Matrix
Definition 1.4.2
Let $y=f(X)$ be a function of an $m \times n$ matrix $X$. The derivative of $y$ with respect to $X$ is
$$ \frac{\partial y}{\partial\{X\}} = \left( \frac{\partial y}{\partial x_{ij}} \right) = \begin{pmatrix} \frac{\partial y}{\partial x_{11}} & \dots & \frac{\partial y}{\partial x_{1n}} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y}{\partial x_{m1}} & \dots & \frac{\partial y}{\partial x_{mn}} \end{pmatrix}. $$
Property 1.4.3
If $X$ is an $m \times n$ matrix, and $f(X)$ is a function of matrix $X$, then
$$ \left( \frac{\partial f(X)}{\partial\{X\}} \right)' = \frac{\partial f(X)}{\partial\{X'\}}. $$
Property 1.4.4
If $X$ is an $n$-order square matrix, then
$$ \frac{\partial \mathrm{tr}(X)}{\partial\{X\}} = I_n. $$
Property 1.4.5
If the elements of column vector $a$ are not related to the elements of column vector $x$, and the elements of $x$ are also not related to each other, then
$$ \frac{\partial a'x}{\partial\{x\}} = a. $$
Property 1.4.6
If the elements of matrices $A$ and $B$ are not related to the elements of matrix $X$, and the elements of $X$ are also not related to each other, then
$$ \frac{\partial \mathrm{tr}(AXB)}{\partial\{X\}} = A'B'. $$
In particular,
$$ \frac{\partial \mathrm{tr}(AX)}{\partial\{X\}} = A'. $$
Proof
$\mathrm{tr}(AXB) = \sum_i (e_i'AXBe_i)$,
$$ \frac{\partial \mathrm{tr}(AXB)}{\partial x_{kl}} = \frac{\partial}{\partial x_{kl}} \sum_i (e_i'AXBe_i) = \sum_i (e_i'A \frac{\partial X}{\partial x_{kl}} Be_i) = \sum_i (e_i'AE_{kl}Be_i) $$ $$ = \sum_i (e_i'Ae_k e_l'Be_i) = \sum_i (a_{ik} b_{li}) = \sum_i (b_{li} a_{ik}) = (BA)_{lk} = (A'B')_{kl}. $$
Thus, the matrix $(\frac{\partial \mathrm{tr}(AXB)}{\partial x_{kl}})$ is $(A'B')$.
Note: The step $\frac{\partial X}{\partial x_{kl}} = E_{kl}$ uses Property 1.4.2 (1). The calculation also implicitly uses Property 1.4.2 (2) in spirit within the trace.
Property 1.4.7
If matrices $A$ and $X$ are both $n$-order square matrices, and the elements of matrix $A$ are not related to the elements of matrix $X$, and $X$ is a symmetric matrix, then
$$ \frac{\partial \mathrm{tr}(AX)}{\partial\{X\}} = A + A' - \mathrm{diag}(a_{11}, a_{22}, \dots, a_{nn}). $$
Proof
For the diagonal element $x_{kk}$,
$$ \frac{\partial \mathrm{tr}(AX)}{\partial x_{kk}} = \frac{\partial \sum_i (e_i'AXe_i)}{\partial x_{kk}} = \sum_i (e_i'A \frac{\partial X}{\partial x_{kk}} e_i) $$
Since $X$ is symmetric, $\frac{\partial X}{\partial x_{kk}} = E_{kk}$.
$$ = \sum_i (e_i' A E_{kk} e_i) = \sum_i (e_i' A e_k e_k' e_i) = a_{kk}. $$
If $k \neq l$, then for the off-diagonal element $x_{kl}$, since $X$ is symmetric ($x_{kl} = x_{lk}$),
$$ \frac{\partial X}{\partial x_{kl}} = E_{kl} + E_{lk}. $$
So,
$$ \frac{\partial \mathrm{tr}(AX)}{\partial x_{kl}} = \frac{\partial \sum_i (e_i'AXe_i)}{\partial x_{kl}} = \sum_i (e_i' A \frac{\partial X}{\partial x_{kl}} e_i) = \sum_i (e_i' A (E_{kl} + E_{lk}) e_i) $$ $$ = \sum_i (e_i'AE_{kl}e_i) + \sum_i (e_i'AE_{lk}e_i) = \sum_i (e_i'Ae_k e_l'e_i) + \sum_i (e_i'Ae_l e_k'e_i) $$ $$ = a_{lk} + a_{kl}. $$
Therefore, the $(k,l)$-th element of $\frac{\partial \mathrm{tr}(AX)}{\partial\{X\}}$ is $a_{kk}$ if $k=l$ and $a_{lk} + a_{kl}$ if $k \neq l$. This corresponds to the matrix $A + A' - \mathrm{diag}(a_{11}, a_{22}, \dots, a_{nn})$.
Property 1.4.8
If the elements of matrix $A$ are not related to the elements of column vector $x$, and the elements of $x$ are also not related to each other, then
$$ \frac{\partial x'Ax}{\partial\{x\}} = (A + A')x. $$
Similarly, if $X$ is a matrix whose elements are unrelated, we have
$$ \frac{\partial \mathrm{tr}(X'AX)}{\partial\{X\}} = (A + A')X. $$
Proof
$x'Ax = \mathrm{tr}(x'Ax) = \mathrm{tr}(Axx')$. We calculate the derivative with respect to $x_l$.
$$ \frac{\partial (x'Ax)}{\partial x_l} = \frac{\partial}{\partial x_l} \sum_{j,k} a_{jk} x_j x_k = \sum_{j,k} a_{jk} (\frac{\partial x_j}{\partial x_l} x_k + x_j \frac{\partial x_k}{\partial x_l}) $$ $$ = \sum_{j,k} a_{jk} (\delta_{jl} x_k + x_j \delta_{kl}) = \sum_k a_{lk} x_k + \sum_j a_{jl} x_j $$ $$ = (Ax)_l + (A'x)_l = ((A+A')x)_l. $$
Thus, the gradient vector $\frac{\partial x'Ax}{\partial\{x\}}$ is $(A+A')x$.
(The derivation in the image uses the trace trick: $x'Ax = \mathrm{tr}(x'Ax) = \mathrm{tr}(Axx')$):
$$ \frac{\partial (x'Ax)}{\partial x_l} = \frac{\partial}{\partial x_l} \mathrm{tr}(Axx') = \mathrm{tr}\left( \frac{\partial}{\partial x_l}(Axx') \right) $$
Using the product rule for differentiation within the trace:
$$ = \mathrm{tr}\left( A \left(\frac{\partial x}{\partial x_l}\right) x' + A x \left(\frac{\partial x'}{\partial x_l}\right) \right) $$
Since $\frac{\partial x}{\partial x_l} = e_l$ and $\frac{\partial x'}{\partial x_l} = e_l'$:
$$ = \mathrm{tr}(A e_l x' + A x e_l') $$
Using linearity and cyclic property of trace:
$$ = \mathrm{tr}(A e_l x') + \mathrm{tr}(A x e_l') = \mathrm{tr}(x' A e_l) + \mathrm{tr}(e_l' A x) $$
Since the trace of a scalar is the scalar itself:
$$ = x' A e_l + e_l' A x $$
Recognizing these as components of vectors: $(A'x)_l = x' A e_l$ and $(Ax)_l = e_l' A x$
$$ = (A'x)_l + (Ax)_l = ((A' + A)x)_l. $$
This shows the $l$-th component of the gradient $\frac{\partial (x'Ax)}{\partial x}$ is the $l$-th component of $(A'+A)x$.
Property 1.4.9
If the elements of matrix $X$ are not related to each other, then
$$ \frac{\partial \det(X)}{\partial\{X\}} = \det(X)(X')^{-1}. $$
(Note: This matrix is the transpose of the adjugate matrix, or the matrix of cofactors).
Proof
By cofactor expansion, $\det(X) = \sum_j x_{ij}X_{ij}$ for any $i$, where $X_{ij}$ is the cofactor of $x_{ij}$.
It is easy to see that $\frac{\partial \det(X)}{\partial x_{ij}} = X_{ij}$.
Therefore, the derivative matrix is the matrix of cofactors $C = (X_{ij})$:
$$ \frac{\partial \det(X)}{\partial\{X\}} = \begin{pmatrix} X_{11} & \dots & X_{1n} \\[6pt] \vdots & \ddots & \vdots \\[6pt] X_{m1} & \dots & X_{mn} \end{pmatrix} = C. $$
Also, recall the formula for the inverse matrix using the adjugate matrix ($\mathrm{adj}(X) = C'$):
$$ X^{-1} = \frac{1}{\det(X)} \mathrm{adj}(X) = \frac{1}{\det(X)} C'. $$
From this, $C' = \det(X) X^{-1}$, so $C = (\det(X) X^{-1})' = \det(X) (X^{-1})'$.
Since $(X^{-1})' = (X')^{-1}$, we have $C = \det(X) (X')^{-1}$.
Therefore,
$$ \frac{\partial \det(X)}{\partial\{X\}} = \det(X) (X')^{-1}. $$
III. Derivative of a Vector with Respect to a Vector
Definition 1.4.3
Let $x$ be an $n$-dimensional column vector, $y = (y_1, y_2, \dots, y_m)' = (f_1(x), f_2(x), \dots, f_m(x))'$. Then the derivative of $y$ with respect to $x$ is defined as the Jacobian matrix:
$$ \frac{\partial y'}{\partial x} = \left( \frac{\partial y_j}{\partial x_i} \right) = \begin{pmatrix} \frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_1} \\[6pt] \vdots & \ddots & \vdots \\[6pt] \frac{\partial y_1}{\partial x_n} & \dots & \frac{\partial y_m}{\partial x_n} \end{pmatrix}. $$
(Note the layout: the $(i,j)$ element is $\partial y_j / \partial x_i$. This is the transpose of what is sometimes called the Jacobian matrix.)
Property 1.4.10
If $y = Ax$, and the elements of column vector $x$ are not related to each other, then
$$ \frac{\partial y'}{\partial x} = A'. $$
Proof
$y_j = \sum_k a_{jk} x_k$, for $j = 1, 2, \dots, m$.
$$ \frac{\partial y_j}{\partial x_i} = a_{ji} = (A')_{ij}. $$
Therefore, the matrix $(\frac{\partial y_j}{\partial x_i})$ is $A'$.
So,
$$ \frac{\partial y'}{\partial x} = A'. $$
IV. Derivative of a Matrix with Respect to a Matrix
Definition 1.4.4
Let $X$ be an $m \times n$ matrix, $F(X)$ be a $p \times q$ matrix. Denote $F(X) = (f_{ij}(X)) = (f_{ij})$. Then the derivative of $F(X)$ with respect to $X$ is defined as:
$$ \frac{\partial F(X)}{\partial X} = \frac{\partial [\mathrm{vec}(F(X))]'}{\partial \mathrm{vec}(X)}. $$
This is a $(mn) \times (pq)$ matrix.
Property 1.4.11 (Restated using Definition 1.4.4)
If $Y = AXB$, and the elements of matrix $X$ ($m \times p$) are not related to each other, and $A$ ($n \times m$), $B$ ($p \times q$) are constant matrices, then
$$ \frac{\partial Y}{\partial X} = \frac{\partial (\mathrm{vec}(Y))'}{\partial \mathrm{vec}(X)} = B \otimes A'. $$
Proof
We know $\mathrm{vec}(Y) = \mathrm{vec}(AXB) = (B' \otimes A)\mathrm{vec}(X)$. Let $y_{vec} = \mathrm{vec}(Y)$, $x_{vec} = \mathrm{vec}(X)$, and $M = B' \otimes A$. Then $y_{vec} = M x_{vec}$. By Definition 1.4.4 and Property 1.4.10 (applied to vectors $y_{vec}$ and $x_{vec}$),
$$ \frac{\partial Y}{\partial X} = \frac{\partial y_{vec}'}{\partial x_{vec}} = M' = (B' \otimes A)' = (B')' \otimes A' = B \otimes A'. $$
Property 1.4.12
If $X$ is an $m \times n$ matrix, and the elements of $X$ are not related to each other, then
$$ \frac{\partial X}{\partial X} = \frac{\partial (\mathrm{vec}(X))'}{\partial \mathrm{vec}(X)} = I_{mn}. $$
Proof
Let $Y=X$. Then $A=I_m$, $B=I_n$. Using Property 1.4.11:
$$ \frac{\partial X}{\partial X} = I_n \otimes I_m' = I_n \otimes I_m = I_{mn}. $$
Alternatively, let $y_{vec} = \mathrm{vec}(X)$ and $x_{vec} = \mathrm{vec}(X)$. Then $y_{vec} = I_{mn} x_{vec}$. By Property 1.4.10, $\frac{\partial y_{vec}'}{\partial x_{vec}} = (I_{mn})' = I_{mn}$.
Property 1.4.13
If the elements of matrices $A$ ($n \times m$) and $B$ ($p \times q$) are not related to the elements of matrix $X$ ($m \times p$), and the elements of $X$ are also not related to each other, then
$$ \frac{\partial (AXB)}{\partial X} = B \otimes A'. $$
This is identical to Property 1.4.11.
Property 1.4.14 (Product Rule)
If $F(X)$ is a $p \times q$ matrix function, $G(X)$ is a $q \times r$ matrix function, $X$ is an $m \times n$ matrix, and the elements of $X$ are not related to each other, then
$$ \frac{\partial F(X)G(X)}{\partial X} = \frac{\partial F(X)}{\partial X} (G(X) \otimes I_p) + \frac{\partial G(X)}{\partial X} (I_r \otimes F'(X)). $$
Property 1.4.15
If $X$ is an $n$-order nonsingular matrix, and the elements of $X$ are not related to each other, then
$$ \frac{\partial X^{-1}}{\partial X} = -[X^{-1} \otimes (X^{-1})']. $$
Proof
From $XX^{-1} = I$, and Property 1.4.14 (Product Rule). Let $F(X)=X$ and $G(X)=X^{-1}$. The derivative of the constant matrix $I$ is the zero matrix.
$$ \frac{\partial (XX^{-1})}{\partial X} = \frac{\partial X}{\partial X} (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$
Using Property 1.4.12, $\frac{\partial X}{\partial X} = I_{n^2}$.
$$ I_{n^2} (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$ $$ (X^{-1} \otimes I_n) + \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = 0. $$ $$ \frac{\partial X^{-1}}{\partial X} (I_n \otimes X') = -(X^{-1} \otimes I_n). $$
Multiply on the right by $(I_n \otimes X')^{-1} = I_n \otimes (X')^{-1} = I_n \otimes (X^{-1})'$.
$$ \frac{\partial X^{-1}}{\partial X} = -(X^{-1} \otimes I_n) (I_n \otimes (X^{-1})') $$
Using the mixed product property $(A \otimes B)(C \otimes D) = (AC) \otimes (BD)$:
$$ \frac{\partial X^{-1}}{\partial X} = -( (X^{-1}I_n) \otimes (I_n (X^{-1})') ) = -( X^{-1} \otimes (X^{-1})' ). $$
Rearranging the equation yields the result.
V. Formulas for Chain Rule
Theorem 1.4.1 (Chain Rule Formula for Scalar Function)
Let $\psi(X)$ be a scalar function of a matrix variable $X$ (size $m \times n$), and each element $x_{ij}$ of $X$ is a function of a variable $t$. Then
$$ \frac{d \psi(X)}{d t} = \sum_{j=1}^n \sum_{i=1}^m \frac{\partial \psi(X)}{\partial x_{ij}} \frac{\partial x_{ij}}{\partial t} = \mathrm{tr} \left[ \left( \frac{\partial \psi(X)}{\partial \{X\}} \right)' \left( \frac{\partial \{X\}}{\partial t} \right) \right]. $$
(Note: The image shows $\mathrm{tr}[\frac{\partial\{X\}}{\partial t} (\frac{\partial \psi}{\partial\{X\}})']$, which is equivalent due to $\mathrm{tr}(AB)=\mathrm{tr}(BA)$.)
Theorem 1.4.2 (Chain Rule Formula for Matrices)
Let $F(G(X))$ be a composite function of two matrix functions $F$ and $G$. Let $U = G(X)$. Then
$$ \frac{\partial F(G(X))}{\partial X} = \frac{\partial G(X)}{\partial X} \frac{\partial F(U)}{\partial U}. $$
Property 1.4.16
If the elements of matrix $X$ are functions of a variable $t$, then
$$ (1) \quad \frac{d \det(X(t))}{d t} = \det(X) \mathrm{tr}\left(X^{-1} \frac{\partial\{X\}}{\partial t}\right) $$ $$ (2) \quad \frac{d \ln |\det(X)|}{d t} = \mathrm{tr}\left(X^{-1} \frac{\partial\{X\}}{\partial t}\right) $$
(Assuming $\det(X) > 0$ for the second formula).
Proof
Using Theorem 1.4.1 (Chain Rule) and Property 1.4.9: Let $\psi(X) = \det(X)$. Then $\frac{\partial \psi(X)}{\partial \{X\}} = \det(X) (X')^{-1}$.
$$ \frac{d \det(X(t))}{d t} = \mathrm{tr} \left[ \left( \frac{\partial \det(X)}{\partial \{X\}} \right)' \left( \frac{\partial \{X\}}{\partial t} \right) \right] $$ $$ = \mathrm{tr} \left[ (\det(X) (X')^{-1})' \left( \frac{\partial \{X\}}{\partial t} \right) \right] = \mathrm{tr} \left[ \det(X) ((X')^{-1})' \left( \frac{\partial X}{\partial t} \right) \right] $$ $$ = \det(X) \mathrm{tr} \left[ X^{-1} \frac{\partial X}{\partial t} \right]. $$
(The image derivation uses the identity $\frac{d\psi}{dt} = \mathrm{tr}\left[ \frac{\partial\{X\}}{\partial t} \left( \frac{\partial \psi}{\partial\{X\}} \right)' \right]$):
$$ \frac{d \det(X)}{dt} = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \left( \det(X) (X')^{-1} \right)' \right] $$ $$ = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \det(X) ((X')^{-1})' \right] = \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} \det(X) X^{-1} \right] $$ $$ = \det(X) \mathrm{tr}\left[ \frac{\partial \{X\}}{\partial t} X^{-1} \right] = \det(X) \mathrm{tr}\left[ X^{-1} \frac{\partial \{X\}}{\partial t} \right]. $$
For the second formula:
$$ \frac{d \ln |\det(X)|}{d t} = \frac{1}{\det(X)} \frac{d \det(X)}{d t} $$ $$ = \frac{1}{\det(X)} \left( \det(X) \mathrm{tr}\left(X^{-1} \frac{\partial \{X\}}{\partial t}\right) \right) $$ $$ = \mathrm{tr}\left(X^{-1} \frac{\partial \{X\}}{\partial t}\right). $$
VI. Calculation of the Jacobian Determinant
For a change of variables $y=f(x)$ in a multiple integral:
$$ \int_D g(x_1, x_2, \dots, x_n) dx_1 dx_2 \dots dx_n = \int_T g(f^{-1}(y)) |J(x \to y)| dy $$
where $D \subset \mathbb{R}^n$, $T = \{y | y = f(x), x \in D\}$, and $J(x \to y)$ is the Jacobian determinant.
$$ |J(x \to y)| = \left| \det \left( \frac{\partial x}{\partial y'} \right) \right| = \left| \det \left( \frac{\partial x_i}{\partial y_j} \right) \right|. $$
The notation $|\frac{\partial x'}{\partial y}|_+$ is used in the image, where $|A|_+ = |\det(A)|$. $f$ is the transformation.
Definition 1.4.5
Let $X \in \mathbb{R}^{m \times n}$ be a matrix variable, $Y = F(X) \in \mathbb{R}^{m \times n}$ be a transformation. If $F(X)$ is differentiable, then the Jacobian determinant of the transformation $Y=F(X)$ is defined as
$$ J(Y \to X) = J(Y: X) = \left| \frac{\partial Y}{\partial X} \right|_+ = \left| \det \left( \frac{\partial (\mathrm{vec}(Y))'}{\partial \mathrm{vec}(X)} \right) \right|. $$
(Note: The image uses $|\frac{\partial y}{\partial x}|_+$ where $y=\mathrm{vec}(Y), x=\mathrm{vec}(X)$. This corresponds to $|\det(\frac{\partial Y}{\partial X})|$ based on Definition 1.4.4.)
Property 1.4.17
If $Y = AXB$, where $X$ is $m \times n$, $A$ is an $m \times m$ nonsingular square matrix, and $B$ is an $n \times n$ nonsingular square matrix, then
$$ J(Y \to X) = |(\det(A))^n (\det(B))^m| = |\det(A)|^n |\det(B)|^m = |A|_+^n |B|_+^m. $$
In the special vector case $y = Ax$ (where $x, y$ are $m \times 1$ vectors, $A$ is $m \times m$), then $J(y \to x) = |\det(A)| = |A|_+$.
Proof
Because $\mathrm{vec}(Y) = (B' \otimes A)\mathrm{vec}(X)$. Let $y_{vec} = \mathrm{vec}(Y)$ and $x_{vec} = \mathrm{vec}(X)$. The Jacobian matrix (using derivative definition 1.4.4) is
$$ \frac{\partial Y}{\partial X} = \frac{\partial y_{vec}'}{\partial x_{vec}} = (B' \otimes A)' = B \otimes A'. $$
The Jacobian determinant is
$$ J(Y \to X) = \left| \det \left( B \otimes A' \right) \right|_+. $$
Using the property $\det(P \otimes Q) = (\det P)^q (\det Q)^p$ where $P$ is $p \times p$ and $Q$ is $q \times q$. Here $B$ is $n \times n$ and $A'$ is $m \times m$.
$$ \det(B \otimes A') = (\det B)^m (\det A')^n = (\det B)^m (\det A)^n. $$
So,
$$ J(Y \to X) = |(\det A)^n (\det B)^m| = |\det A|_+^n |\det B|_+^m. $$
Property 1.4.18
If $X$ is an $n$-order invertible square matrix, consider the transformation $Y = X^{-1}$. Then
$$ J(Y \to X) = |\det(X)|^{-2n} = |X|_+^{-2n}. $$
Proof
From Property 1.4.15, the derivative of the transformation $Y=X^{-1}$ is
$$ \frac{\partial Y}{\partial X} = \frac{\partial X^{-1}}{\partial X} = -[X^{-1} \otimes (X^{-1})']. $$
Therefore, the Jacobian determinant is
$$ J(Y \to X) = \left| \det \left( \frac{\partial Y}{\partial X} \right) \right|_+ = \left| \det \left( -[X^{-1} \otimes (X^{-1})'] \right) \right|_+. $$
The matrix size is $(n^2) \times (n^2)$.
$$ \det(-M) = (-1)^{n^2} \det(M). $$ $$ \det(X^{-1} \otimes (X^{-1})') = (\det(X^{-1}))^n (\det((X^{-1})'))^n = (\det X)^{-n} (\det X^{-1})^n = (\det X)^{-n} (\det X)^{-n} = (\det X)^{-2n}. $$
So,
$$ J(Y \to X) = |(-1)^{n^2} (\det X)^{-2n}|_+ = |(\det X)^{-2n}| = |\det X|^{-2n} = |X|_+^{-2n}. $$
(The image uses the property $|\det(A \otimes B)| = |\det(A)|^q |\det(B)|^p$ where $A$ is $p \times p$ and $B$ is $q \times q$, along with $|M|_+ = |\det(M)|$):
$$ J(Y \to X) = \left| \det\left( -\left[X^{-1} \otimes (X^{-1})'\right] \right) \right|_+ $$
Since the derivative matrix is $(n^2) \times (n^2)$ and using properties of determinants:
$$ = \left| (-1)^{n^2} \det(X^{-1} \otimes (X^{-1})') \right|_+ $$ $$ = \left| (-1)^{n^2} (\det(X^{-1}))^n (\det((X^{-1})'))^n \right|_+ $$ $$ = \left| (\det(X^{-1}))^n (\det(X^{-1}))^n \right|_+ = \left| ((\det X)^{-1})^n ((\det X)^{-1})^n \right|_+ $$ $$ = \left| (\det X)^{-2n} \right|_+ = |\det X|^{-2n} = |X|_+^{-2n}. $$
Chapter 1 Summary
- Operations, determinant, and inverse formulas for partitioned matrices.
- Definition and properties of generalized inverse.
- Vectorization operation and Kronecker product of matrices.
- Definition and properties of matrix derivatives.