GitHub - BargavS/MachineLearning: This repository contains all assignments for statistical machine learning course done as a part of my Masters program.

WEEK3.PY: DENSITY ESTIMATION AND CLASSIFICATION The project consists of implementation of the Naïve Bayes classifier that assumes that the features are independent of each other. Given 10000 training examples, 5000 handwritten digit zero and 5000 handwritten digits one, the algorithm has to model a classifier based on the training set and determines whether the given image is a 0 or 1.

Mathematical modelling and defining variables: The given images of the 10000 samples are defined by a 28x28 vector. The feature of the images is considered to be the brightness of the pixels and the deviation of brightness of pixels from the mean. The Mean and distribution of the brightness of the entire image for each label is then calculated and visualized as a Gaussian distribution with two means and variance.

Results (No.1) Mean of feature1 for digit0 - 44.14111343 (No.2) Variance of feature1 for digit0 115.50482373 (No.3) Mean of feature2 for digit0 87.36453556 (No.4) Variance of feature2 for digit0 102.17048578] (No.5) Mean of feature1 for digit1 19.37864435 (No.6) Variance of feature1 for digit1 30.8808238 (No.7) Mean of feature2 for digit1 61.39433494 (No.8) Variance of feature2 for digit1 80.83307837 Accuracy for Digit 0 - 0.92777 Accuracy for Digit 1 – 0.92343

UTF8_STRATEGY1: K-MEANS UNSUPERVISED CLASSIFIER The project consists of implementation of the K means clustering algorithm on a 2 Dimensional data points. The K Means clustering is an unsupervised algorithm that basically involves clustering of data based on the Euclidean distance of the points to the cluster centroids. Mathematical modelling and defining variables: The algorithm follows an iterative approach by initialising the cluster centroids. The number of clusters is however chosen by looking at the data and intuitively deciding the best for the kind of data that is being used for clustering. In the program, the number of Clusters are given as k=3,k=5. The Euclidean norm or distance between each sample data and all the clusters are calculated and the index of the cluster that has the minimum distance is assigned to the data. Using all the data that is now in the cluster, the mean is calculated for each cluster, which forms the new centroids. This process is repeated until the mean converges or stays constant for the interations. RESULTS: (No.1) Centroid for K=3, centeriod1 = [[4.84461158, 7.30111158],| 3.34467115, 2.618687281,1 7.3773277 , 2.37886035]] (No.2) cost1 = 1338.133047467403 (No.3) Centroid2 for K=5, centeriod2 = [[ 3.21257461, 2.496580871,[7.75648325, 8.556689281,[ 2.51976116, 7.02028909],[7.25262683, 2.40015826],[ 5.29629878, 6.64908797]] (No.4) cost2 = 613.98662860666343

UTF8_STRATEGY2: The algorithm follows an iterative approach by initialising the cluster centroids. The only difference between the previous algorithm and this is the initialisation of the variable. The initial centroid is randomly chosen. The next centroid is chosen as the furthest data point from the available sample set and so on by calculating the maximum distance from the data point and the centroid. The number of clusters is however chosen by looking at the data and intuitively deciding the best for the kind of data that is being used for clustering. In the program, the number of Clusters are given as k=4,k=6. The Euclidean norm or distance between each sample data and all the clusters are calculated and the index of the cluster that has the minimum distance is assigned to the data. Using all the data that is now in the cluster, the mean is calculated for each cluster, which forms the new centroids. This process is repeated until the mean converges or stays constant for the interations.

Results {0: array([ 7.75648325, 8.55668928]), 1: array([ 3.14506148, 0.90770655]), 2: array([ 2.52382885, 7.02897469]), 3: array([ 7.41419243, 2.32169114]), 4: array([ 3.502455 , 3.62870476]), 5: array([ 5.46427736, 6.83771354])}, 476.29657052696626]] (No.1) Centroid for K=4, centeriod1 = [[ 3.21257461, 2.49658087],[ 7.13560727, 7.91651726],[ 3.39262114, 6.8928815 ],[ 7.22707673, 2.52234361]] (No.2) cost1 = cost1 = 797.9601840 (No.3) Centroid2 for K=6, centeriod2 = [[ 7.75648325, 8.55668928],[ 3.14506148, 0.90770655], [ 2.52382885, 7.02897469],[ 7.41419243, 2.32169114],[ 3.502455 , 3.62870476],[ 5.46427736, 6.83771354]] (No.4) cost2 = 476.29657052696626

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
utf-8''Strategy1 (1).py		utf-8''Strategy1 (1).py
utf-8''Strategy2.py		utf-8''Strategy2.py
week3.py		week3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

BargavS/MachineLearning

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages