====== Data Mining and Supervised Machine Learning (USTH) ====== ===== General Informations ===== Your Instructor: Ass. Prof. Axel Carlier, University of Toulouse --- //[[Axel.Carlier@enseeiht.fr|Ass. Prof. Axel CARLIER, University of Toulouse]] // ===== Lecture #1 ===== == Lecture time: == Monday Nov 6, 2017, 5:30pm-9:30pm Hanoi Time General introduction to the course and the topic. A few (already-known) administrative points can be found in the subsequent slides. == Slides == {{:public:res-ens:dmsml:intro.pdf|}} == Practical Assignment #1 == Download the following data: {{:public:res-ens:dmsml:data.zip|}} This archive contains data for both regression and classification exercises. The first assignment consists in preparing and visualizing data. The regression data are only 2d points: you should only plot those points. ---- ===== Lecture #2 ===== == Lecture time: == Tuesday Nov 7, 2017, 5:30pm-9:30pm Hanoi Time Regression : - Cost function - Polynomial predictors - Optimization: Gradient Descent == Practical Assignment #2 == In this lab, your task is to find a cubic polynom that best approximates the data points provided earlier. To do so, you will implement the Gradient Descent. ---- ===== Lecture #3 ===== == Lecture time: == Wednesday Nov 8, 2017, 5:30pm-9:30pm Hanoi Time Regression : - Cost function - Polynomial predictors - Optimization: Gradient Descent and Maximum-Likelihood - Expected Prediction Error == Practical Assignment #3 == You should first determine the Maximum Likelihood predictor, and compare it to the one you obtained using Gradient Descent. You will then implement the Leave-One-Out Cross-Validation policy to determine what is the optimal degree of the polynom you should use to model the data. ===== Lecture #4 ===== == Lecture time: == Thursday Nov 9, 2017, 5:30pm-9:30pm Hanoi Time Classification : - Definition - K-nearest neighbors - Logistic Regression == Practical Assignment #4 == Today, we start working on the classification data. The classification data are images from three types of flowers. You must extract features from these images called mean normalized colors. {{ :public:res-ens:dmsml:normalizedcolor.png?direct |}} Then, plot the features, using one separate color per type of flowers. You should obtain a plot looking like this: {{ :public:res-ens:dmsml:data_classif.png?direct |}} Your goal, for these labs, is to partition the space into areas of influence for each class, using the techniques learned in class. In other terms, you will classify each point in space using the classifiers we have studied. Download a skeleton for this assignment here: {{:public:res-ens:dmsml:skeleton.txt|}} Here is an example of what you should obtain using a K-nearest neighbors classifier with K = 1: {{ :public:res-ens:dmsml:knn-classif.png?direct |}} Try different values of K and comment the results. Then, apply Logistic Regression to classifying chrysanthemums vs. other flowers. What happens when you try to classify pansy vs. other flowers?