PCA — 2D → 1D

Math for AI
Dimensionality Reduction
Project a 2D point cloud onto k principal components (k∈{0,1,2}) and reconstruct. Adjust angle and anisotropy; watch retained variance and reconstruction error.
Published

August 9, 2025

Automate Customer Service with AI Banner

PCA — 2D → 1D projection & reconstruction
Speed
PC1 PC2 Projection (reconstruction)
Idea: PCA finds directions of maximum variance. Keeping the top k components compresses the data; decoding from k shows what was lost.
Step 0
dims: 2 → 1
retained variance =
recon MSE =
Loading…

What PCA does (in one line)

  • Rotates data to new axes (principal components) so that PC1 captures the most variance, PC2 the next, etc., then we can keep only the top k to compress/denoise.

Where PCA is used

  • Visualization: project high-D features to 2D/3D for plots.
  • Preprocessing: whitening/orthogonalization before clustering or regression.
  • Noise reduction / compression: keep top-k PCs; drop small-variance directions.
  • Image processing: eigenfaces, background subtraction, low-rank denoising.
  • Genomics & biology: reduce thousands of gene features to a few components.
  • Finance: factor extraction from many correlated indicators.
  • Recommenders / NLP: compress sparse embeddings or term–document matrices.
  • Anomaly detection: model “normal” subspace; large residual ⇒ anomaly.

Why it helps

  • Removes redundant correlations between features.
  • Often improves training speed and stability for simple models.
  • Acts like a low-pass filter: keeps strong signal, drops small noisy parts.

How to use it

  • Center the data (subtract mean). (Standardize to unit variance if scales differ.)
  • Fit PCA on training data → pick k by explained variance (e.g., 95%).
  • Transform train/val/test with the fitted PCA; don’t refit on val/test.
  • Keep the mean + components so we can reconstruct or invert later.

How to choose k

  • Use the cumulative explained variance curve; look for the elbow.
  • If we need a number: start with 95% variance; adjust up/down for our task.

Limitations / gotchas

  • Linear only: can’t capture curved manifolds (consider kernel PCA or UMAP).
  • Variance ≠ importance: low-variance directions can still matter for labels.
  • Scale sensitive: without standardization, large-scale features dominate.
  • Outliers: can tilt PCs; use robust scaling or outlier trimming first.
  • Interpretability: PCs are linear mixes of features; naming them is hard.
  • Data leakage risk: never fit PCA on the full dataset before splitting.

When not to use PCA

  • When our downstream model already learns good low-dimensional structure (e.g., modern deep nets with normalization) and we’re not visualization-bound.
  • When features are categorical or nonlinear interactions dominate.

Good companions / alternatives

  • Kernel PCA / Isomap / LLE: nonlinear structure.
  • t-SNE / UMAP: visualization of clusters in 2D (not for downstream features).
  • ICA / NMF: additive or statistically independent components.
  • Autoencoders: learn nonlinear low-dim representations (label-aware if supervised).

Checklist before using PCA

  • Standardize features (if scales differ)
  • Fit on train only; transform val/test
  • Pick k via explained variance or cross-val on downstream metric
  • Save mean + components to reproduce results

Earning Opportunity with AI

Enjoyed this post?

If this article helped you,
☕ Buy me a coffee
Your support keeps more things coming!

© 2025 Aimling. All rights reserved. | Terms of Use | Privacy Policy