Principal Component Analysis Algorithm
Sometimes data can have large amounts of features, so many that further processing or inference can be hampered. When this occurs Principal Component Analysis Algorithm (PCA), an Unsupervised Learning algorithm, is used to reduce the number of features whilst retaining as much information as possible. This is Feature Engineering.
PCA has two modes: Regular and Randomised.
Regular mode | Randomized mode | |
Data density | Sparse | Denser |
Observations | Moderate | Large |
Features | Moderate | large |
Attributes
Problem attribute | Description |
Data types and format | Tabular |
Learning paradigm or domain | Unsupervised Learning |
Problem type | Feature engineering: dimensionality reduction |
Use case examples | Drop those columns from a dataset that have a weak relation with the label/target variable |
Training
Training data has to be in CSV or recordIO-wrapped-protobuf format.
Model artifacts and inference
Description | Artifacts |
Learning paradigm | Unsupervised Learning |
Request format | CSV, JSON, x-recordio-protobuf |
Result | JSON, x-recordio-protobuf |
Processing environment
Both CPU and GPU instances can be used.
Credits
Photo by Najib Kalil on Unsplash