Choose the correct graph or chart style for the task you want your audience to accomplish.

Photo by Morgan Housel on Unsplash

This is the second installment in a two-part series on Data Visualization. If you haven’t read Part 1 of this series, I recommend checking that out!

In part 1 of this series, we walked through the first three data visualization functions: relationship, data over time, and ranking plot. In case you need a quick refresher:

  • Data over time: This visualization method shows data over the period to find trends or…

Exploring the MovieLens 100k dataset with SGD, autograd, and the surprise package.

Photo by Charles Deluvio on Unsplash

By Gavin Smith and XuanKhanh Nguyen

This project was the third project for my machine learning class this semester. The project aims to train a machine learning algorithm using MovieLens 100k dataset for movie recommendation by optimizing the model's predictive power. We were given a clean preprocessed version of the MovieLens 100k dataset with 943 users' ratings of 1682 movies. The input to our prediction system is a (user id, movie id) pair. Our predictor's output will be a scalar rating y in range (1,5) — a rating of 1 is the worst possible, a rating of 5 is the…

Bag-of-word feature representations and word embedding feature representations.

By Gavin Smith and XuanKhanh Nguyen

This project was the second project for my machine learning class this semester. We were given a dataset of several thousand single-sentence reviews collected from three domains:,, Each review consists of a sentence and a binary label indicating the sentence's emotional sentiment (1 for positive feelings; 0 for negative feelings). All the provided reviews in the training and test set were scraped from websites whose assumed audience is primarily English speakers. There are 2400 input, output pairs in the training set with 4510 unique words and 600 inputs in the test…

Exploring the MNIST1 and FASHION MNIST2 dataset with Logistic Regression and Random Forest

By Gavin Smith and XuanKhanh Nguyen

This semester, I took a Machine Learning class at Tufts University. This was one of my favorite Data Science courses I have taken thus far. It taught me how to tell if machine learning is solving a problem. And most importantly, it made me a better Data Science person.

We were given three projects throughout the semester. Each project has a structure problem and an open-ended problem. The open-ended specification you could imagine. …

Choose the correct graph or chart style for the task you want your audience to accomplish

Photo by Morgan Housel on Unsplash

According to the World Economic Forum, the world produces 2.5 quintillion bytes of data every day. With so much data, it’s become increasingly difficult to manage and make sense of it all. It would be impossible for any person to wade through data line-by-line and see distinct patterns and make observations.

Data visualization is one of the data science processes; that is, a framework for approaching data science tasks. After data is collected, processed, and modeled, the relationships need to be visualized for the conclusions.

We use data visualization as a technique to communicate insights from data through visual representation…

Exploratory Data Analysis on World Happiness Report.

Photo by Freddy Do on Unsplash

What is the purpose of life? Is that to be happy? Why people go through all the pain and hardship? Is it to achieve happiness in some way?

I’m not the only person who believed the purpose of life is happiness. If you look around, most people are pursuing happiness in their lives.

On March 20th, the world celebrates the International Day of Happiness. The 2020 report ranked 156 countries by how happy their citizens perceive themselves based on their evaluations of their own lives. The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples…

The simplified explanation of the two traversals algorithm.

Photo by Christian Lambert on Unsplash

When it comes to learning, there are generally two approaches: we can go wide and try to cover as much of the spectrum of a field as possible, or we can go deep and try to get specific with the topic that we are learning. Most good learners know that, to some extent, everything we learn in life — from algorithms to necessary life skills — involves some combination of these two approaches. …

When to use supervised learning or unsupervised learning?

Photo by Julian O’hayon on Unsplash

If we don’t know what the objective of the machine learning algorithm is, we may fail to build an accurate model. Knowing the types of Machine learning algorithms is essential. It helps us to see a bigger picture of machine learning, what is the goal of all the things that are being done in the field and especially, put us in a better position to break down a real problem and design a machine learning system.

The goal of most machine learning algorithms is to construct a model or a hypothesis. All machine learning models categorize as either supervised or…

Basic plots, include code samples.

Photo by Giorgio Trovato on Unsplash

Matplotlib is a plotting library for the Python programming language. The most used module of Matplotib is Pyplot which provides an interface like Matlab but instead, it uses Python and it is open source.

In this note, we will focus on basic Matplotlib to help visualize our data. This is not a comprehensive list but contains common types of data visualization formats. Let’s hop to it!

The structure of this note:

  1. Start with Pyplot
  2. Chart Types

Anatomy of Matplotlib Figure

How to perform multiple linear regression in Python using sklearn?

Photo by Isaac Benhesed on Unsplash

Linear regression is a standard statistical data analysis technique. We use linear regression to determine the direct relationship between a dependent variable and one or more independent variables. The dependent variable must be measured on a continuous measurement scale, and the independent variable(s) can be measured on either a categorical or continuous measurement scale.

In linear regression, we want to draw a line that comes closest to the data by finding the slope and intercept, which define the line and minimize regression errors. There are two types of linear regression: simple linear regression and multiple linear regression. …

XuanKhanh Nguyen

Interests: Data Science, Machine Learning, AI, Stats, Python | Minimalist | A fan of odd things.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store