Email Classification is a Machine Learning problem that falls under the category of Supervised Learning.
This mini-project of Email Classification is inspired by J.K. Rowling’s publishing of a book under a pen-name. Udacity’s “Introduction to Machine Learning” provides a comprehensive study of the algorithms and the project.
A couple of years ago, Rowling wrote a book, “The Cuckoo’s Calling,” under the name Robert Galbraith. The book received some good reviews, but no one paid much attention to it — until an anonymous tipster on Twitter said it was J.K. Rowling. The London Sunday Times enlisted two experts to compare the linguistic patterns of “Cuckoo” to Rowling’s “The Casual Vacancy,” as well as to books by several other authors. After the results of their analysis pointed strongly toward Rowling as the author, the Times directly asked the publisher if they were the same person, and the publisher confirmed. …
Vector Quantization is a lossy data compression technique. It allows the modeling of the probability density function by the distribution of the prototype vectors. There is some modification of data that renders the compression lossy.
Vector Quantization works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.
(Definition from Wikipedia)
K-Means is a clustering algorithm, which clusters together data points based on the number of clusters you want to identify in your data. In K-Means Clustering Algorithms, K is the no of clusters! …
In this article, we will go through the SVC algorithm in the Sklearn library and experiment with the different kernels on the Iris Dataset.
Support Vector Classifier (SVC) is a supervised machine learning model used for two-group classification problems. After giving an SVC model set of labeled training data for each category, they’re able to categorize new test data.
DBSCAN — short for Density-Based Spatial Clustering of Application with Noise, is a density-based clustering algorithm. Clusters are formed based on the density parameters.
Density, in terms of DBSCAN, means the number of points that are located in a given area. The closer the points are to each other, the greater the density will be.
DBSCAN algorithm takes 2 parameters; ε —epsilon, which is the radius of the core points and the minimum number of data points in the cluster.
In the diagram below which is taken from Wikipedia, the minimum points have been selected as 4, minPts = 4.
The point A and all the other red points are called as core points because they enclose at minimum 4 points in their circle. The points B and C are boundary points, they are not core points because they do not enclose the minimum number of 4 points. …
Handwritten Digit Recognition is an interesting machine learning problem in which we have to identify the handwritten digits through various classification algorithms. There are a number of ways and algorithms to recognize handwritten digits, including Deep Learning/CNN, SVM, Gaussian Naive Bayes, KNN, Decision Trees, Random Forests, etc.
In this article, we will deploy a variety of machine learning algorithms from the Sklearn’s library on our dataset to classify the digits into their categories.
Let us first look at the dataset:
We will use Sklearn’s load_digits dataset, which is a collection of 8x8 images (64 features)of digits. …
In this article, we will learn to use Principal Component Analysis and Support Vector Machines for building a facial recognition model.
First, let us understand what PCA and SVM are:
Principal Component Analysis (PCA) is a machine learning algorithm that is widely used in exploratory data analysis and for making predictive models. It is commonly used for dimensionality reduction by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data’s variation as possible.
The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type.
In this article, we will use Python to analyze the dataset, and find out patterns and clues through data exploration, as well as build a regression model that could predict the bonus of a person at Enron based on the salaries they receive.
But first, we need to know a bit about the biggest corporate fraud in American history!
Enron Corporation was an American energy, commodities, and services company based in Houston, Texas. …
In this article, we will go through the 5 basic (yet powerful) technical indicators to understand the bullish and bearish market trends.
We will start by understanding the stock market prediction and then dive into a few indicators in order to understand bullish and bearish trends!
Stock market prediction is the act of trying to determine the future value of company stock or other financial instruments traded on an exchange. The successful prediction of a stock’s future price could yield a significant profit. …
Have you ever imagined that a simple formula that you have studied in high school would play a part in recommending you a movie on the basis of the one you already like?
Well, here we are, using the Cosine Similarity (the dot product for normalized vectors) to build a Movie Recommender System!
Recommender systems are an important class of machine learning algorithms that offer “relevant” suggestions to users. Youtube, Amazon, Netflix, all function on recommendation systems where the system recommends you the next video or product based on your past activity (Content-based Filtering) or based on activities and preferences of other users similar to you (Collaborative Filtering). …
“Becoming” is an autobiography of Michelle Obama, taking us through her life from childhood to the day she sits at home while the family is adjusting to normal life — life after a two-term presidency of the United States.
As the First Lady of the first African American US President, Michelle narrates her personal experience of life dividing it into 3 phases.
“Becoming Me” is Michelle’s upbringing as a child to a professional lawyer working at Sidley Austin, Chicago. Michelle narrates her childhood stories, being raised in a humble environment in the South side of Chicago Illinois. She learned the piano at her great-aunt Robbie’s place, studied at Bryn Mawr Elementary School, saw her father going through multiple sclerosis, and lived at a time when racial discrimination was highly prevalent in America. Amidst the many doubters she had including a career counselor at her school who judged her not to be ‘Princeton material’, she worked her way to Princeton, majoring in Sociology and minoring on African American studies, and later on, pursued professional study at Harvard Law School. Time and again she was told by her teachers not to ‘set her sights too high’, and time and again she proved them wrong by working hard and succeeding. She credits her achievements to her parents, who invested a lot in her education and upbringing by treating her and her brother as adults. …