Multi Variant Statistical Analysis

created : Fri, 24 Sep 2021 12:38:16 +0900
modified : Sun, 12 Jan 2025 23:33:56 +0900

Chapter 0. Introduction

0.1 Visualization of Multivariate Data

library(psych) # Scatter Plot Matrix

# Chernoff's Faces
faces(USArrests, face.type=1, cex=0.5))

# Star plot

# 3-D scatter plot
scatterplot3d(USArrests[, -1], type="h", highlight.3d=TRUE, angle=55, scale.y=0.7, pch=16, main="USArrests")

# 3-D rotated plot

# Profile plot
parcoord(USArrests, col=c(1+(1:50)), var.label=T)

# Growth curves for longitudinal data
p <- ggplot(data = Orthodont, aes(x = age, y = distance, group = Subject, colour=Subject))
p + geom_line()

p + geom_line() + facet_grid(. ~ Sex)

Summary of Introudction

Chapter 1. Linear algebra

1.1 Scalars, vectors, matrices

1.2 Operations of matrices

1.3 Trace and determinant for square matrcies

1.4 Rank of a matrix

1.5 Inverse matrix

1.6 Partitioned matrices

Example 1.6.2

1.7 Positive definite matrix

1.8 Orthogonal vectors and matrices

1.9 Eigenvalues and eigenvectors

1.10 Spectral decomposition

1.11 Cauchy-Schwarz inequality

1.12 Differentiation in Vectors and Matrices

1.13 Some useful quantities

1.14 Random vectors and matrices

1.14.1 Parameter vectors and matrices

2. Chapter 2 Multivariate Normal Distribtuion

2.1 Definitions

2.2 Properties of multivariate normal distribution

2.3 Estimation for sampling from a multivariate normal distributions

2.3.1 Likelihood function of a sample from a multivariate normal distribution

2.3.2 Maximum likelihood estimations (MLEs) from a multivariate normal distribution

2.4 Sampling distributions of $\bar X$ and $S$

2.5 Definition and Properties of the Wishart Distirubiton

2.6 Large sample distributions for $\bar X$ and $S$

2.7 Assessing the assumption of multivariate normality

2.8 Transformations to near normality

  1. Theoreticall transformations:
  1. Power transformations: When all observations are nonnegative, we may consider a family of power transformations. If some measurements are negative, then we first add a constant to all measurements and then apply a power transformation.:
  1. Box-Cox transformations: The Box-Cox transformation family is similar to the power transformation. This family continuously connects the logarithmic transform as the power $\lambda$ approaches zero.:
  1. Note that we should not expect some transformation can always make the data close to normality.

Chapter 3 Hypothesis tests

  1. The use of p univariate tests inflates the Type I erro rate, $\alpha$, whereas the multivariate test preserves the exact $\alpha$ level.:
  1. The univariate tests completely ignore the correlations among the variables, wehreas the multivariate tests make direct use of the correlations.
  2. The multivariate tests are more powerful than univariate tests in many cases.:

3.1 Review of hypothesis tests for a univariate normal mean

3.1.1 When $\sigma^2$ is known

3.1.2 When $\sigma^2$ is unknown

3.2 Hypothesis test on one sample multivariate normal mean vector

3.2.1 When the covariance matrix $\Sigma$ is known

3.2.2 Hotelling’s $T^2$ Statistic: when $\Sigma$ is unknown

  1. Note $T^2 = Z’(\frac{W}{v})^{-1}Z ~ \frac{vp}{v + 1 - p}F_{p, v+ 1-p}$, where $Z ~ N_p(0, \Sigma)$ and $W ~ Wischart(p, v, \Sigma)$ are independent.
  2. Note that $pF_{p, n-p} \rightarrow \chi_p^2$ so that $T^2 ~ \chi_p^2$ for a large sample under $H_0$
  3. $T^2$ statistic is invariant under linear transformation, that is, Hotelling $T^2$ statistic does not depend on the measurement units.

3.3 Hotelling’s $T^2$ and likelihodd ratio tests

3.4 Confidence regions and multiple testing

3.4.1 Simultaneous confidence intervals

Chapter 4. Two Sample Comparision and MANOVA

4.1 Paired Comparisons and a Repeated Measures Design

Test nameCI
Hotelling’s $T^2$ confidence region$n(\delta - \bar D)‘S_d^{-1} (\delta - \bar D) \le \frac{(n-1)p}{n-p} F_{p, n-p, \alpha}$
Scheffe’s simultaneous CIs$\bar d_i \pm \sqrt{\frac{p(n-1)}{(n-p)} F_{p, n-p, \alpha}} \sqrt{\frac{s_{d_i}^2}{n}}$
Bonferroni’s simultaneous CIs$\bar d_i \pm t_{n-1, \frac{\alpha}{2p}} \sqrt{\frac{s_{d_i}^2}{n}}$

4.2 Comparing Mean Vectors from Independent Two Samples

4.2.1 When $\Sigma = \Sigma_1 = \Sigma_2$

4.4 Simultaneous Confidence Intervals for Treatment Effects

4.5 Testing for Equality of Covarinace Matrices

Chapter 5. Discriminant analysis and classification

5.1 Discriminant function

5.2 Discriminant functions for two groups

5.3 Classification analysis

5.4 Classification for multivariate normal distributions

5.5 discriminant analysis for several groups

5.5.1 Discrimant functions

5.6 Stepwise Discriminant Analysis

Chapter 6. Principal Component Analysis (PCA)

6.1 Introduction

6.2 Method

6.3 PCA from the correlation matrix

6.4 Plotting of principal components

6.5 How many components to retain?

Chapter 7 . Factor Analysis (FA)

7.1 Orthogonal factor model

7.2 Estimations

  1. (Principal component method) $$\begin{aligned} \Sigma &= \sum_{i=1}^p \lambda_i e_i e_i’ = \sum_{i=1}^p (\sqrt{\lambda_i} e_i)(\sqrt{\lambda_i}e_i)’ \ & = (\sqrt{\lambda_i} e_1 : \cdots : \sqrt{\lambda_p} e_p) \begin{pmatrix}\sqrt{\lambda_1}e_1’ \ \vdots \ \sqrt{\lambda_p} e_p’\end{pmatrix} \end{aligned}$$ If $\lambda_{m+1}, \cdots, \lambda_p$ are small, then we can approximate the covariance matrix by: $$\Sigma \approx ( \sqrt{\lambda_1} e_1 : \cdots : \sqrt{\lambda_m} e_m) \begin{pmatrix} \sqrt{\lambda_1} e_1’ \ \vdots \ \sqrt{\lambda_m} e_m’ \end{pmatrix} + \begin{pmatrix} \psi_1 & 0 & \cdots & 0 \ 0 & \psi_2 & \cdots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \cdots & \psi_p\end{pmatrix}$$ where $\psi_i = \Sigma_{ii} - \sum_{j=1}^m l_{ij}^2$ Communalities are $$h_i^2 = l_{i1}^2 + \cdots + l_{im}^2$$
  1. (Principal factors) We initially estimate $\Phi^{(0)}$, and apply the principal component solution to $S - \Psi^{(r)}$. $$\begin{aligned} S - \Psi^{(r)} &= \sum_{j=1}^m \lambda_j^{(r)}e_j^{(r)}e_j^{(r)T} + \sum_{j=m+1}^p \lambda_j^{(r)} e_j^{(r)} e_j^{(r)T}\ \Psi^{(r+1)} & = diag(S - L^{(r)}L^{(r)T}) \end{aligned}$$
    • Repeat these steps until converges. The common intial diagonal matrix $\Psi^{(0)}$ is chosen as $diag(S^{-1})$ for factoring the sample covariance matrix and $diag(R^{-1})$ for factoring the sample corelation matrix.
  2. (Maximum likelihood method) Assume $X_j - \mu = LF_j + \epsilon_j$ has a multivariate normal distribution. The likelihood function is given by $$\begin{aligned} L(\mu, \Sigma) &= \prod_{i=1}^N [\frac{1}{(2 \pi)^{p/2} \vert \Sigma \vert ^{1/2}} e^{-\frac{1}{2}(x_i - \mu)’ \Sigma^{-1} (x_i - \mu)}] \ &= \prod_{i=1}^N [\frac{1}{(2 \pi)^{p/2} \vert LL^T + \Psi \vert ^{1/2}} e^{-\frac{1}{2}(x_i - \mu)’ (LL^T + \Psi)^{-1} (x_i - \mu)}] \ \end{aligned}$$ Since $LQQ^TL^T = LL^T$ for any $m \times m$ orthogonal matrix $Q$, it is necessary to impose a condition to obtain a unique maximum likelihood solution: we need $m(m-1)/2$ constraints. Note $$(LL^T + \Psi)^{-1} = \Psi^{-1} - \Psi^{-1}L(I + L^T \Psi^{-1}L)^{-1}L^T \Psi^{-1}$$ If we impose a condition that $L^T \Psi^{-1}L$ is a diagonal matrix (it is exactly $m(m-1)/2$ constraints), then we acan numerically find the MLEs. Hence, we assume $$L^T\Psi^{-1}L = \Delta \text{ a diagonal matrix }$$ We numerically obtain $\hat L$ and $\hat \Psi$ asusuming $L^T \Psi^{-1}L$ is diagonal.

7.3 Hypothesis Testing on the Number of Factors

7.4 Factor Rotation

7.5 Factor scores

  1. (Weighted Least Squares Method) Bartlett suggested weighted least squares be used to estimate the common factor values: $$x - \mu = Lf + \epsilon$$ $$Var(\epsilon_i) = \psi_i$$ $$\text{Minimize } \sum_{i=1}^p \frac{\epsilon_i}{\psi_i} = \epsilon’ \Psi^{-1} \epsilon = (x - \mu - Lf)’ \Psi^{-1}(x - \mu - Lf)$$ $$\hat f = (L’ \Psi^{-1} L)^{-1} L’ \Psi^{-1} (x - \mu)$$ Hence, the estimated factor score is $$\begin{aligned} \hat f_j &= (\hat L’ \hat \Psi^{-1} \hat L)^{-1} \hat L’ \hat \Psi^{-1} (x_j - \bar x) \ & = \hat \Delta ^{-1} \hat L’ \hat \Psi^{-1} (x_j - \bar x) \end{aligned}$$ When the correlation matrix is factored, $$\hat f_j = (\hat L_z’ \hat \Psi_z ^{-1} \hat L_z)^{-1} \hat L_z’ \hat \Psi_z^{-1} z_j = \hat \Delta_z ^{-1} \hat L_z’ \hat \Psi_z^{-1} z_j$$ where $z_j = D^{-1/2}(x_j \bar x)$ and $\hat \rho = \hat L_z \hat L_z’ + \hat \Psi_z$. When $\hat L$ and $\hat \Psi$ are determined by the maximum likelihood method, these estimates must satisfy the uniqueness condition, $\hat L’ \hat \Psi^{-1} \hat L = \hat \Delta$, a diagonal matrix.

  2. (Regression Method) Since $X - \mu = LF + \epsilon \sim N_p(0, LL’ + \Psi)$ and $F \sim N_m(0, I)$, they have aj oint normal distribution $N_{p+m} (0, \Sigma^)$ where $$\Sigma^ = \begin{pmatrix} \Sigma = LL’ + \Psi & L \ L’ & I\end{pmatrix}$$ From the conditional mean vector of a partitioned normal random vector given the rest partitioned vector is $$E(F \vert x) = L’(LL’ + \Psi)^{-1} (x - \mu)$$ $$\hat f_j = \hat L’(\hat L \hat L’ + \hat \Psi)^{-1} (x_j - \bar x)$$ $$\hat f_j = \hat L’ S^{-1} (x_j - \bar x)$$ To reduce the ffects of a (possibly) incorrect determination of the number of factors, practitioners tend to calculate the factor scores by using $S$ (the original sample covarinace matrix) instead of $\hat \Sigma$. Inf a correlation matrix is factored, $$\hat f_j = \hat L_z ’ R^{-1} z_j$$

    • Remark1. If rotated loadings $\hat L ^* = \hat L T$ are used in place of the original loadings, the subsequence factor scores $\hat f_j^$ are obtained by $\hat f_j^ = T \hat f_j$
  3. (Principal component method) When the principal component solution is used, it is common to estimate the factor scores by a ordinary least squares method: $$F = (L’L)^{-1} L’(X- \mu)$$ $$\hat f_j = (\hat L’ \hat L)^{-1} \hat L’(x_j - \bar x)$$ Since $\hat L = (\sqrt{\lambda_1} \hat e_1 : \cdots : \sqrt{\lambda_m} \hat e_m)$, we have $\hat L \hat L = diag(\hat \lambda_1, \cdots, \hat \lambda_m)$ and $$\begin{aligned} \hat f_j & = (\hat L’ \hat L)^{-1} \hat L’ (x_j - \bar x) \ & = \begin{pmatrix} \frac{1}{\lambda_1} & 0 & \cdots & 0 \ 0 & \frac{1}{\lambda_2} & \cdots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \cdots & \frac{1}{\lambda_m} \end{pmatrix} \begin{pmatrix} \sqrt{\hat \lambda_1} e_1 ’ \ \sqrt{\hat \lambda_2} e_2 ’ \ \vdots \ \sqrt{\hat \lambda_m} e_m’ \end{pmatrix} (x_j - \bar x) \ &= \begin{aligned} \frac{1}{\sqrt{\lambda_1}} e_1’ (x_j - \bar x) \ \frac{1}{\sqrt{\lambda_2}} e_2’ (x_j - \bar x) \ \vdots \ \frac{1}{\sqrt{\lambda_m}} e_m’ (x_j - \bar x) \ \end{aligned} \end{aligned}$$

7.6 Strategy for Factor Analysis

  1. Perform a principal component factor analysis, including a varimax rotation
  2. Perform a maximum likelihood factor analysis, including a varimax rotation
  3. Compare the solutions
  4. Repeat 1- 3 for other number of common factors $m$

Chapter 8 Multivariate regression

8.1 The Classical (Univariate) Linear Regression Model

8.2 Least Squares Estimation

8.3 Sum of Squares Decomposition

8.4 Inferences About the Regression Model

8.4.1 Likelihood ratio tests (LRTs)

8.5 Inferences from the Estimated Regression Function

8.6 Model Checking and Other Aspects of Regression

8.7 Multivariate Multiple Regression