My Badge

I have been thinking about taking the AWS Machine Learning Specialty for a while. Finally, with the Covid-19 lockdown, I end up booking the exam and start studying. I got a free AWS exam voucher from joining the AWS ML community(I would recommend anyone who’s interested in cloud computing or ML to join, lots of great resources,s and free gift! Big Thanks to AWS), and booked the earliest exam which only gave me one week to study. I want to see how well I can do on the exam starting from zero and only study for a week. …


Implement Matrix Factorization from Scratch in Python

Photo by Nick Hillier on Unsplash

What is Matrix Factorization

Matrix Factorization (MF) (e.g., Probabilistic Matrix Factorization and NonNegative Matrix Factorization) techniques have become the crux of many real-world scenarios, including graph representation and recommendation system (RecSys) because they are powerful models to find the hidden properties behind the data. More specifically, Non-Negative Matrix Factorization (NNMF) is a group of models in multivariate analysis and linear algebra where a matrix A(dimension B*C) is decomposed into B (dimension B*d) and C (dimension C*d)


DBSCAN Algorithm Step by Step, Python Implementation, and Visualization.

What is DBSCAN

DBSCAN(Density-Based Spatial Clustering of Applications with Noise) is a commonly used unsupervised clustering algorithm proposed in 1996. Unlike the most well known K-mean, DBSCAN does not need to specify the number of clusters. It can automatically detect the number of clusters based on your input data and parameters. More importantly, DBSCAN can find arbitrary shape clusters that k-means are not able to find. For example, a cluster surrounded by a different cluster.

DBSCAN vs K-means, credit

Also, DBSCAN can handle noise and outliers. All the outliers will be identified and marked without been classified into any cluster. …


Python Optimization on finding Closed and Maximal Frequent Itemsets

In the last article, I have discussed in detail what is FP-growth, and how does it work to find frequent itemsets. Also, I demonstrated the python implementation from scratch. In this article, I would like to introduce two important concepts in Association Rule Mining, closed, and maximal frequent itemsets. In order to understand the concept, you need to have some basic knowledge of what is an FPtree and what is frequent itemsets. My last article covers all the basics.

Understand and Build FP-Growth Algorithm in Python

Photo by Markus Spiske on Unsplash

What are Closed and Maximal Frequent Itemsets

Here we quickly review the…


Frequency Pattern Mining using FP-tree and conditional FP-tree in Python

Photo by Luke Richardson on Unsplash

What is FP-Growth

FP-growth is an improved version of the Apriori Algorithm which is widely used for frequent pattern mining(AKA Association Rule Mining). It is used as an analytical process that finds frequent patterns or associations from data sets. For example, grocery store transaction data might have a frequent pattern that people usually buy chips and beer together. The Apriori Algorithm produces frequent patterns by generating itemsets and discovering the most frequent itemset over a threshold “minimal support count”. It greatly reduces the size of the itemset in the database by one simple principle:

If an itemset is frequent, then all of its…


Build a serverless ML application to predict flight delay on AWS

AWS Sagemaker logo

Most data enthusiasts know how to build and train a model, but how to deploy your model and make it useful in real-life sometimes can be a challenging issue for beginner data scientists. Luckily, there are many different platforms and tools available to help with model deployment. Amazon Sagemaker is one of my favorites, as it largely reduces the effort and hesitation of building, training, and deployment of your models. …


Photo by Ryan Quintal on Unsplash

I am personally a big fighting game fan that goes to the street fighter tournament whenever I can. After learning about all the machining learning techniques, I wonder is it possible to build a street fighter AI that can defeat the real human players. And also if I can learn from the AI to improve my game skills.

In video games, various artificial intelligence techniques have been used in a variety of ways, ranging from non-player character control to procedural content generation. The current AIs built-in most of the fighting games are far from the level to compete with real…


Detail Guide on How to Install Pyspark and use Spark GraphFrames on different OSs

image_credit — Databricks (https://databricks.com/spark/about)

Linux(Ubuntu)

All the following operations should be done under Terminal.

  1. Download Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz

2. Unpack the file

tar xf spark-2.2.0-bin-hadoop2.7.tgz

3. Install Java8 if necessary

sudo add-apt-repository ppa:openjdk-r/ppasudo apt-get updatesudo apt-get install openjdk-8-jdk

You can check your installation by “java -version”. If it is not “1.8.xxx”, you need to follow step5–6 to choose the right java version for spark to use.

sudo update-java-alternatives — set java-1.8.0-openjdk-amd64

Restart your terminal.

4. (Optional) If you want to use Spark more skillfully, it’s better for you to get familiar with Basic Linux Commands and Basic Bash Operations. …


Utilizing Machine Learning on Horse Racing Betting Strategy

Note from the editor: This article is for educational and entertainment purposes only. If you want to use the presented model for real money bets, you do it at your own risk. Please make sure that it is in alignment with the terms and conditions of your bookmaker.

Machine learning has been widely used in many time series analysis and forecasting. With the help of a large amount of historical data and computing power nowadays, ML models can sometimes produce extremely useful insight and guidance to sports betting decision making.

Photo by Julia Joppien on Unsplash

This article illustrates how machine learning could help with horse…


This article will introduce adversarial attacks’ key concepts on a high level. Explained background knowledge, several types of attack, how to generate adversarial examples. Also, presenting several real-world attack examples and the experiment effort on preventing those attacks.

Photo by Nahel Abdul Hadi on Unsplash

Background Knowledge

Machining learning technologies have been rapidly evolving in the recent decade. As more and more real-world use cases like image recognition, autonomy driving started to be deployed, potential security threats of the technology are also becoming a significant topic for the researchers. Researchers found that adversarial attack which adds small perturbations to images that human vision can not notice could pose a…

Andrewngai

Big Data Specialist, AWS Certified Solution Architect , Experienced Project Manager specializing in AWS/Azure Cloud infrastructure and Big Data Project

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store