(testing signal)

Tag: PCA

A Step By Step Implementation of Principal Component Analysis

A step-by-step tutorial to explain the working of PCA and implementing it from scratch in pythonImage By AuthorIntroductionPrincipal Component Analysis or PCA is a commonly used dimensionality reduction method. It works by computing the principal components and performing a change of basis. It retains the data in the direction of maximum variance. The reduced features are uncorrelated with each other. These features can be used for unsupervised clustering and classification. To reduce…

Eigenvalues and eigenvectors

What do they tell us about our data?

Eigenvalues and eigenvectors
(GIF by author)

I have learned about eigenvalues and eigenvectors in University in a linear algebra course. It was very dry and mathematical, so I did not get, what it is all about. But I want to present this topic to you in a more intuitive way and I will use many animations to illustrate it.

First, we will look at how applying a matrix to a vector rotates and scales a vector. This will show us what eigenvalues and eigenvectors are. Then we will learn about principal components and that they are the eigenvectors of the covariance matrix. This knowledge will help us understand our final topic, principal component analysis.

To understand eigenvalues and eigenvectors, we have to first take a look at matrix multiplication.

Read more...

Clustering types with various applications

Clustering types and their usage areas are explained with python implementation

Ibrahim Kovan

Unlabeled datasets can be grouped by considering their similar properties with the unsupervised learning technique. However, the point of view of these similar features is different in each algorithm. Unsupervised learning provides detailed information about the dataset as well as labeling the data.

Read more...

Comprehensive guide for Principal Component Analysis

The theoretical and practical part of Principal Component Analysis with python implementation

This article covers the definition of PCA, the Python implementation of the theoretical part of the PCA without Sklearn library, the difference between PCA and feature selection & feature extraction, the implementation of machine learning & deep learning, and explained PCA types with an example.

Read more...

Data Science Cheat Sheet 2.0

By Aaron Wang, Master of Business Analytics @ MIT | Data Science.

This Data Science cheat sheet covers over a semester of introductory machine learning and is based on MIT’s Machine Learning courses 6.867 and 15.072. You should have at least a basic understanding of statistics and linear algebra, although beginners may still find this resource helpful.

Inspired by Maverick’s Data Science Cheatsheet (hence the 2.0 in the name), located here.

Topics covered:

  • Linear and Logistic Regression
  • Decision Trees and Random Forest
  • SVM
  • K-Nearest Neighbors
  • Clustering
  • Boosting
  • Dimension Reduction (PCA, LDA, Factor Analysis)
  • Natural Language Processing
  • Neural Networks
  • Recommender Systems
  • Reinforcement Learning
  • Anomaly Detection
  • Time Series
  • A/B Testing

This cheat sheet will be occasionally updated with new and improved info, so consider a follow or star in the GitHub repo to stay up to date.

Read more...

PCA on HyperSpectral Data

The Hyperspectral data expands the capability of Image Classification. The Hyperspectral Data not only distinguishes different land cover types but it also provides the detailed characteristics of each land cover such as minerals, soil, man-made structures (buildings, roads, etc.) and vegetation types.

While dealing with the HyperSpectral data one disadvantage is that there are too many bands to process. Apart from that, it is a challenge to store such a large amount of data. With a large amount of data, the time complexity also increases.

Thus, it becomes crucial to either decrease the amount of data or to select only the relevant bands. It should be kept in mind that the classification quality should not degrade with the reduction in number of bands.

Read more...

Isomap Embedding — An Awesome Approach to Non-linear Dimensionality Reduction

As you can see, Isomap is an Unsupervised Machine Learning technique aimed at Dimensionality Reduction.

It differs from a few other techniques in the same category by using a non-linear approach to dimensionality reduction instead of linear mappings used by algorithms such as PCA. We will see how linear vs. non-linear approaches differ in the next section.

How does Isometric Mapping (Isomap) work?

Isomap is a technique that combines several different algorithms, enabling it to use a non-linear way to reduce dimensions while preserving local structures.

Before we look at the example of Isomap and compare it to a linear method of Principal Components Analysis (PCA), let’s list the high-level steps that Isomap performs:

  1. Use a KNN approach to find the k nearest neighbors of every data point.
Read more...

Credit Card Fraud Detection Using Machine Learning & Python

As we are moving towards the digital world — cybersecurity is becoming a crucial part of our life. When we talk about security in digital life then the main challenge is to find the abnormal activity.

When we make any transaction while purchasing any product online — a good amount of people prefer credit cards. The credit limit in credit cards sometimes helps us me making purchases even if we don’t have the amount at that time. but, on the other hand, these features are misused by cyber attackers.

To tackle this problem we need a system that can abort the transaction if it finds fishy.

Here, comes the need for a system that can track the pattern of all the transactions and if any pattern is abnormal then the transaction should be aborted.

Read more...

Mastering Clustering with a Segmentation Problem

By Indraneel Dutta Baruah, AI Driven Solutions Developer

Photo by Mel Poole on Unsplash

In the current age, the availability of granular data for a large pool of customers/products and technological capability to handle petabytes of data efficiently is growing rapidly. Due to this, it’s now possible to come up with very strategic and meaningful clusters for effective targeting. And identifying the target segments requires a robust segmentation exercise. In this blog, we will be discussing the most popular algorithms for unsupervised clustering algorithms and how to implement them in python.

In this blog, we will be working with clickstream data from an online store offering clothing for pregnant women.

Read more...