Title: | Partial Principal Component Analysis of Partitioned Large Sparse Matrices |
---|---|
Description: | Performs partial principal component analysis of a large sparse matrix. The matrix may be stored as a list of matrices to be concatenated (implicitly) horizontally. Useful application includes cases where the number of total nonzero entries exceed the capacity of 32 bit integers (e.g., with large Single Nucleotide Polymorphism data). |
Authors: | Srika Raja [aut, cre], Somak Dutta [aut] |
Maintainer: | Srika Raja <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2025-02-19 05:14:38 UTC |
Source: | https://github.com/srika1919/ppca |
Performs a partial principal component analysis on a large sparse matrices or a list of large sparse matrices and returns the results as an object compatible to class prcomp. Uses RSpectra library to compute the largest eigenvalues.
pPCA(x, rank, retX = TRUE, scale. = TRUE, normalize = FALSE, sd.tol = 1e-05)
pPCA(x, rank, retX = TRUE, scale. = TRUE, normalize = FALSE, sd.tol = 1e-05)
x |
A matrix, sparse matrix (Matrix::dgCMatrix), or a list of these. When a list is supplied, the entries are concatenated horizontally (implicitly). See description. |
rank |
An integer specifying the number of principal components to compute. |
retX |
A logical value indicating whether the rotated variables (PC scores) should be returned. |
scale. |
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. |
normalize |
A logical value indicating whether the principal component scores should be normalized. |
sd.tol |
A positive number, warnings are printed if the standard deviation of any column is less than this threshold. |
When the input argument is a matrix (of class "matrix" or "dgCMatrix"), principal component analysis
is performed to extract a few largest components. When a list of matrices is passed, the partial PCA
is performed on the horizontally concatenated matrix, i.e., if x = list(X1,X2,X3)
then the
partial PCA is done on the matrix [X1 X2 X3], without concatenating the matrices explicitly. This can be
useful when the matrix is so high-dimensional that the total number of non-zero entries
exceed 2^31-1 (roughly 9.33e10), the capacity of a 32 bit integer. For example, in PCA with very
high-dimensional SNP data, the sparse matrices can be stored for each chromosome within the capacity
of 32 bit integers.
pPCA returns a list with class "pPCA" (compatible with "prcomp") containing the following components:
sdev |
A vector of the singular values (standard deviations of the principal components). |
rotation |
A matrix whose columns contain the eigenvectors (loadings). |
x |
A matrix of the principal component scores, returned if retX is true. This is the centred (and scaled if requested) data multiplied by the rotation matrix. |
center |
column means. |
scale |
column standard deviations, if scale. is true. Otherwise, FALSE. |
The partial SVD is computed through the RSpectra package. All elements in the first row of the rotation matrix are positive.
Srika Raja and Somak Dutta
Raja, S. and Dutta, S. (2024). Matrix-free partial PCA of partitioned genetic data. REU project 2024, Iowa State University.
Dai, F., Dutta, S., and, Maitra, R. (2020). A Matrix-Free Likelihood Method for Exploratory Factor Analysis of High-Dimensional Gaussian Data. Journal of Computational and Graphical Statistics, 29(3), 675–680.
library(Matrix) set.seed(20190329) m <- rsparsematrix(50,100,density = 0.35) results <- pPCA(m, rank = 2) biplot(results) data <- list(rsparsematrix(nrow = 50,ncol = 10,density = 0.35), rsparsematrix(nrow = 50,ncol = 40,density = 0.35)) # Using a list of matrices result <- pPCA(data, rank = 3) print(result) biplot(result)
library(Matrix) set.seed(20190329) m <- rsparsematrix(50,100,density = 0.35) results <- pPCA(m, rank = 2) biplot(results) data <- list(rsparsematrix(nrow = 50,ncol = 10,density = 0.35), rsparsematrix(nrow = 50,ncol = 40,density = 0.35)) # Using a list of matrices result <- pPCA(data, rank = 3) print(result) biplot(result)
Prints the output of the pPCA
## S3 method for class 'pPCA' print(x, digits = 3, ...)
## S3 method for class 'pPCA' print(x, digits = 3, ...)
x |
An object of class |
digits |
The number of decimal places to use in printing results such as variance explained and PC scores. Defaults to 3. |
... |
Further arguments passed to |
None.