PCA — 2D → 1D
Math for AI
Dimensionality Reduction
Project a 2D point cloud onto k principal components (k∈{0,1,2}) and reconstruct. Adjust angle and anisotropy; watch retained variance and reconstruction error.
PCA — 2D → 1D projection & reconstruction
Speed
PC1
PC2
Projection (reconstruction)
Idea: PCA finds directions of maximum variance. Keeping the top
k components compresses the data; decoding from k shows what was lost.Step 0
dims: 2 → 1
retained variance =
recon MSE =
Loading…
What PCA does (in one line)
- Rotates data to new axes (principal components) so that PC1 captures the most variance, PC2 the next, etc., then we can keep only the top k to compress/denoise.
Where PCA is used
- Visualization: project high-D features to 2D/3D for plots.
- Preprocessing: whitening/orthogonalization before clustering or regression.
- Noise reduction / compression: keep top-k PCs; drop small-variance directions.
- Image processing: eigenfaces, background subtraction, low-rank denoising.
- Genomics & biology: reduce thousands of gene features to a few components.
- Finance: factor extraction from many correlated indicators.
- Recommenders / NLP: compress sparse embeddings or term–document matrices.
- Anomaly detection: model “normal” subspace; large residual ⇒ anomaly.
Why it helps
- Removes redundant correlations between features.
- Often improves training speed and stability for simple models.
- Acts like a low-pass filter: keeps strong signal, drops small noisy parts.
How to use it
- Center the data (subtract mean). (Standardize to unit variance if scales differ.)
- Fit PCA on training data → pick k by explained variance (e.g., 95%).
- Transform train/val/test with the fitted PCA; don’t refit on val/test.
- Keep the mean + components so we can reconstruct or invert later.
How to choose k
- Use the cumulative explained variance curve; look for the elbow.
- If we need a number: start with 95% variance; adjust up/down for our task.
Limitations / gotchas
- Linear only: can’t capture curved manifolds (consider kernel PCA or UMAP).
- Variance ≠ importance: low-variance directions can still matter for labels.
- Scale sensitive: without standardization, large-scale features dominate.
- Outliers: can tilt PCs; use robust scaling or outlier trimming first.
- Interpretability: PCs are linear mixes of features; naming them is hard.
- Data leakage risk: never fit PCA on the full dataset before splitting.
When not to use PCA
- When our downstream model already learns good low-dimensional structure (e.g., modern deep nets with normalization) and we’re not visualization-bound.
- When features are categorical or nonlinear interactions dominate.
Good companions / alternatives
- Kernel PCA / Isomap / LLE: nonlinear structure.
- t-SNE / UMAP: visualization of clusters in 2D (not for downstream features).
- ICA / NMF: additive or statistically independent components.
- Autoencoders: learn nonlinear low-dim representations (label-aware if supervised).
Checklist before using PCA
- Standardize features (if scales differ)
- Fit on train only; transform val/test
- Pick k via explained variance or cross-val on downstream metric
- Save mean + components to reproduce results
Earning Opportunity with AI
Grab Great Deals
Enjoyed this post?
If this article helped you,
☕ Buy me a coffee
Your support keeps more things coming!