An engineer by profession, a bibliophile by heart!

Implementing Machine Learning Algorithms to classify emails

Email Classification is a Machine Learning problem that falls under the category of Supervised Learning.

Image for post
Photo by Sara Kurfeß on Unsplash

This mini-project of Email Classification is inspired by J.K. Rowling’s publishing of a book under a pen-name. Udacity’s “Introduction to Machine Learning” provides a comprehensive study of the algorithms and the project.

A couple of years ago, Rowling wrote a book, “The Cuckoo’s Calling,” under the name Robert Galbraith. The book received some good reviews, but no one paid much attention to it — until an anonymous tipster on Twitter said it was J.K. Rowling. The London Sunday Times enlisted two experts to compare the linguistic patterns of “Cuckoo” to Rowling’s “The Casual Vacancy,” as well as to books by several other authors. After the results of their analysis pointed strongly toward Rowling as the author, the Times directly asked the publisher if they were the same person, and the publisher confirmed. …


Using the K-Means Algorithm for Vector Quantization of a Raccoon Grayscale Image

Image for post
Photo by Gary Bendig on Unsplash

Vector Quantization

Vector Quantization is a lossy data compression technique. It allows the modeling of the probability density function by the distribution of the prototype vectors. There is some modification of data that renders the compression lossy.

Vector Quantization works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.

(Definition from Wikipedia)

K-Means Algorithm

K-Means is a clustering algorithm, which clusters together data points based on the number of clusters you want to identify in your data. In K-Means Clustering Algorithms, K is the no of clusters! …


Using Python to implement the various SVC kernels on the Iris Dataset

Image for post
Photo by Fanny Côté on Unsplash

In this article, we will go through the SVC algorithm in the Sklearn library and experiment with the different kernels on the Iris Dataset.

Support Vector Classifier

Support Vector Classifier (SVC) is a supervised machine learning model used for two-group classification problems. After giving an SVC model set of labeled training data for each category, they’re able to categorize new test data.


Implementing the DBSCAN Algorithm to find Core Samples

Image for post
(Image from Pixabay)

DBSCAN — short for Density-Based Spatial Clustering of Application with Noise, is a density-based clustering algorithm. Clusters are formed based on the density parameters.

Density, in terms of DBSCAN, means the number of points that are located in a given area. The closer the points are to each other, the greater the density will be.

DBSCAN algorithm takes 2 parameters; ε —epsilon, which is the radius of the core points and the minimum number of data points in the cluster.

In the diagram below which is taken from Wikipedia, the minimum points have been selected as 4, minPts = 4.

The point A and all the other red points are called as core points because they enclose at minimum 4 points in their circle. The points B and C are boundary points, they are not core points because they do not enclose the minimum number of 4 points. …


Implementing Machine Learning Classification Algorithms to Recognize Handwritten Digits

Image for post
(Image from Pixabay)

Handwritten Digit Recognition is an interesting machine learning problem in which we have to identify the handwritten digits through various classification algorithms. There are a number of ways and algorithms to recognize handwritten digits, including Deep Learning/CNN, SVM, Gaussian Naive Bayes, KNN, Decision Trees, Random Forests, etc.

In this article, we will deploy a variety of machine learning algorithms from the Sklearn’s library on our dataset to classify the digits into their categories.

Let us first look at the dataset:

Downloading the Dataset

We will use Sklearn’s load_digits dataset, which is a collection of 8x8 images (64 features)of digits. …


A step by step guide to using PCA’s Eigenfaces & SVM for Facial Recognition

Image for post
Photo by Sam Burriss on Unsplash

In this article, we will learn to use Principal Component Analysis and Support Vector Machines for building a facial recognition model.

First, let us understand what PCA and SVM are:

Principal Component Analysis:

Principal Component Analysis (PCA) is a machine learning algorithm that is widely used in exploratory data analysis and for making predictive models. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data’s variation as possible.


Using Python to dive into the biggest corporate fraud in American History and derive insights

Image for post
Photo by Tierra Mallorca on Unsplash

The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type.

In this article, we will use Python to analyze the dataset, and find out patterns and clues through data exploration, as well as build a regression model that could predict the bonus of a person at Enron based on the salaries they receive.

But first, we need to know a bit about the biggest corporate fraud in American history!

The Enron Case

Enron Corporation was an American energy, commodities, and services company based in Houston, Texas. …


The 5 Technical Indicators to Predict the Market

Image for post
Image by Gerd Altmann from Pixabay

In this article, we will go through the 5 basic (yet powerful) technical indicators to understand the bullish and bearish market trends.

We will start by understanding the stock market prediction and then dive into a few indicators in order to understand bullish and bearish trends!

Stock market prediction is the act of trying to determine the future value of company stock or other financial instruments traded on an exchange. The successful prediction of a stock’s future price could yield a significant profit. …


A step-by-step guide to build a Python-based Movie Recommender System using Cosine Similarity

Image for post
Image by Jade87 from Pixabay

Have you ever imagined that a simple formula that you have studied in high school would play a part in recommending you a movie on the basis of the one you already like?

Well, here we are, using the Cosine Similarity (the dot product for normalized vectors) to build a Movie Recommender System!

What are Recommender Systems?

Recommender systems are an important class of machine learning algorithms that offer “relevant” suggestions to users. Youtube, Amazon, Netflix, all function on recommendation systems where the system recommends you the next video or product based on your past activity (Content-based Filtering) or based on activities and preferences of other users similar to you (Collaborative Filtering). …


“Becoming” is an autobiography of Michelle Obama, taking us through her life from childhood to the day she sits at home while the family is adjusting to normal life — life after a two-term presidency of the United States.

Image for post
Photo by Alex Nemo Hanse on Unsplash

As the First Lady of the first African American US President, Michelle narrates her personal experience of life dividing it into 3 phases.

Becoming Me

Becoming Me” is Michelle’s upbringing as a child to a professional lawyer working at Sidley Austin, Chicago. Michelle narrates her childhood stories, being raised in a humble environment in the South side of Chicago Illinois. She learned the piano at her great-aunt Robbie’s place, studied at Bryn Mawr Elementary School, saw her father going through multiple sclerosis, and lived at a time when racial discrimination was highly prevalent in America. Amidst the many doubters she had including a career counselor at her school who judged her not to be ‘Princeton material’, she worked her way to Princeton, majoring in Sociology and minoring on African American studies, and later on, pursued professional study at Harvard Law School. Time and again she was told by her teachers not to ‘set her sights too high’, and time and again she proved them wrong by working hard and succeeding. She credits her achievements to her parents, who invested a lot in her education and upbringing by treating her and her brother as adults. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store