PCA — 2D → 1D

Math for AI

Dimensionality Reduction

Project a 2D point cloud onto k principal components (k∈{0,1,2}) and reconstruct. Adjust angle and anisotropy; watch retained variance and reconstruction error.

Published

August 9, 2025

Automate Customer Service with AI Banner

PCA — 2D → 1D projection & reconstruction

Speed

angle θ (°) anisotropy (σ₁/σ₂) keep dims k PC1 PC2 Projection (reconstruction)

Idea: PCA finds directions of maximum variance. Keeping the top k components compresses the data; decoding from k shows what was lost.

Step 0

dims: 2 → 1

retained variance =

recon MSE =

Loading…

What PCA does (in one line)

Rotates data to new axes (principal components) so that PC1 captures the most variance, PC2 the next, etc., then we can keep only the top k to compress/denoise.

Where PCA is used

Visualization: project high-D features to 2D/3D for plots.
Preprocessing: whitening/orthogonalization before clustering or regression.
Noise reduction / compression: keep top-k PCs; drop small-variance directions.
Image processing: eigenfaces, background subtraction, low-rank denoising.
Genomics & biology: reduce thousands of gene features to a few components.
Finance: factor extraction from many correlated indicators.
Recommenders / NLP: compress sparse embeddings or term–document matrices.
Anomaly detection: model “normal” subspace; large residual ⇒ anomaly.

Why it helps

Removes redundant correlations between features.
Often improves training speed and stability for simple models.
Acts like a low-pass filter: keeps strong signal, drops small noisy parts.

How to use it

Center the data (subtract mean). (Standardize to unit variance if scales differ.)
Fit PCA on training data → pick k by explained variance (e.g., 95%).
Transform train/val/test with the fitted PCA; don’t refit on val/test.
Keep the mean + components so we can reconstruct or invert later.

How to choose k

Use the cumulative explained variance curve; look for the elbow.
If we need a number: start with 95% variance; adjust up/down for our task.

Limitations / gotchas

Linear only: can’t capture curved manifolds (consider kernel PCA or UMAP).
Variance ≠ importance: low-variance directions can still matter for labels.
Scale sensitive: without standardization, large-scale features dominate.
Outliers: can tilt PCs; use robust scaling or outlier trimming first.
Interpretability: PCs are linear mixes of features; naming them is hard.
Data leakage risk: never fit PCA on the full dataset before splitting.

When not to use PCA

When our downstream model already learns good low-dimensional structure (e.g., modern deep nets with normalization) and we’re not visualization-bound.
When features are categorical or nonlinear interactions dominate.

Good companions / alternatives

Kernel PCA / Isomap / LLE: nonlinear structure.
t-SNE / UMAP: visualization of clusters in 2D (not for downstream features).
ICA / NMF: additive or statistically independent components.
Autoencoders: learn nonlinear low-dim representations (label-aware if supervised).

Checklist before using PCA

Standardize features (if scales differ)
Fit on train only; transform val/test
Pick k via explained variance or cross-val on downstream metric
Save mean + components to reproduce results

Earning Opportunity with AI

Grab Great Deals

Coursera Courses Deal Get AppSumo Deal

Enjoyed this post?

If this article helped you,
☕ Buy me a coffee
Your support keeps more things coming!