An engineer by profession, a bibliophile by heart!

Have you ever imagined that a simple formula that you have studied in high school would play a part in recommending you a movie on the basis of the one you already like?

Well, here we are, using the **Cosine Similarity** (the dot product for normalized vectors) to build a **Movie Recommender System**!

**Recommender systems** are an important class of machine learning algorithms that offer “relevant” suggestions to users. Youtube, Amazon, Netflix, all function on recommendation systems where the system recommends you the next video or product based on your past activity (**Content-based Filtering**) or based on activities and preferences…

**Email Classification** is a Machine Learning problem that falls under the category of **Supervised Learning.**

This mini-project of Email Classification is inspired by J.K. Rowling’s publishing of a book under a pen-name. Udacity’s **“Introduction to Machine Learning”** provides a comprehensive study of the algorithms and the project.

A couple of years ago, Rowling wrote a book, **“The Cuckoo’s Calling,”** under the name Robert Galbraith. The book received some good reviews, but no one paid much attention to it — until an anonymous tipster on Twitter said it was J.K. Rowling. The London Sunday Times enlisted two experts to compare the…

“Animal Farm” is a political satire written by George Orwell in 1946. Written as a short novel of 112 pages, the novella revolves around the story of a farm, where animals began a revolution to free themselves of human control.

Through the use of animal character and subtle meanings embedded within the script, Orwell talks about revolutions, how they are started, and how much benefit do they actually bring.

George Orwell was the pen name of Eric Arthur Blake, a critique writer, and journalist best known for “Animal Farm” and the dystopian novel “1984”.

**Vector Quantization** is a lossy data compression technique. It allows the modeling of the probability density function by the distribution of the prototype vectors. There is some modification of data that renders the compression lossy.

Vector Quantization works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.

(Definition from Wikipedia)

**K-Means** is a clustering algorithm, which clusters together data points based on the number of clusters you want to identify in your data…

In this article, we will go through the SVC algorithm in the Sklearn library and experiment with the different kernels on the Iris Dataset.

**Support Vector Classifier **(**SVC**) is a supervised machine learning model used for two-group classification problems. After giving an **SVC** model set of labeled training data for each category, they’re able to categorize new test data.

**DBSCAN** — short for **Density-Based Spatial Clustering of Application with Noise**, is a density-based clustering algorithm. Clusters are formed based on the density parameters.

Density, in terms of DBSCAN, means the number of points that are located in a given area. The closer the points are to each other, the greater the density will be.

DBSCAN algorithm takes 2 parameters; *ε *—epsilon, which is the radius of the core points and the minimum number of data points in the cluster.

In the diagram below which is taken from Wikipedia, the minimum points have been selected as 4, minPts = 4.

…

Handwritten Digit Recognition is an interesting machine learning problem in which we have to identify the handwritten digits through various classification algorithms. There are a number of ways and algorithms to recognize handwritten digits, including Deep Learning/CNN, SVM, Gaussian Naive Bayes, KNN, Decision Trees, Random Forests, etc.

In this article, we will deploy a variety of machine learning algorithms from the Sklearn’s library on our dataset to classify the digits into their categories.

Let us first look at the dataset:

We will use Sklearn’s **load_digits** dataset, which is a collection of **8x8 **images (**64 features**)of digits. …

In this article, we will learn to use **Principal Component Analysis** and **Support Vector Machines** for building a facial recognition model.

First, let us understand what **PCA** and **SVM** are:

**Principal Component Analysis (PCA) **is a machine learning algorithm that is widely used in exploratory data analysis and for making predictive models. It is commonly used for** dimensionality reduction** by projecting each data point onto only the first few principal components to obtain lower-dimensional data while preserving as much of the data’s variation as possible.

The Enron fraud is a big, messy and totally fascinating story about corporate malfeasance of nearly every imaginable type.

In this article, we will use Python to analyze the dataset, and find out patterns and clues through data exploration, as well as build a regression model that could predict the bonus of a person at Enron based on the salaries they receive.

But first, we need to know a bit about the biggest corporate fraud in American history!

**Enron Corporation** was an American energy, commodities, and services company based in Houston, Texas. …

In this article, we will go through the 5 basic (yet powerful) technical indicators to understand the bullish and bearish market trends.

We will start by understanding the stock market prediction and then dive into a few indicators in order to understand bullish and bearish trends!

**Stock market prediction** is the act of trying to determine the future value of company stock or other financial instruments traded on an exchange. The successful prediction of a stock’s future price could yield a significant profit. …