2.3 Dimensionality Reduction

In dimensionality reduction, the goal is to transform an N-dimensional dataset (i.e., a dataset with N number of independent variables) into a dataset with less than N dimensions, usually by performing an operation on all variables simultaneously.

2.3.1 Principal component analysis (PCA)

The goal of PCA is to find the directions in which the data is most variable. These “directions” are linear combinations of the original variables.

The first PC is the direction along which the data is most variable. The second PC is any direction orthogonal to the first along which the data is most variable. The next PC is any direction orthogonal to the previous PCs along which the data is most variable.

2.3.2 t-distributed Stochastic Neighbor Embedding (tSNE)

The goal of tSNE is to find a representation of the data in a lower-dimensional space that, as best as possible, preserves the distribution of local distances in the high dimensional space.