due in 5 hours…donot waste mine and your time if you cannot do Task description: The data set comes from the Kaggle Digit Recognizer competition. The goal is to recognize digits 0 to 9 in handwriting images. Because the original data set is large, I have systematically sampled 10% of the data by selecting the 10th, 20th examples and so on. You are going to use the sampled data to construct prediction models using multiple machine learning algorithms that we have learned recently: naïve Bayes, kNN and SVM algorithms. Tune their parameters to get the best model (measured by cross validation) and compare which algorithms provide better model for this task. Report structure: Section 1: Introduction Briefly describe the classification problem and general data preprocessing. Note that some data preprocessing steps maybe specific to a particular algorithm. Report those steps under each algorithm section. Section 2: Naïve Bayes Build a naïve Bayes model. Tune the parameters, such as the discretization options, to compare results. Section 3: K-Nearest Neighbor method Section 4: Support Vector Machine (SVM) Section 5: Algorithm performance comparison Compare the results from the two algorithms. Which one reached higher accuracy? Which one runs faster? Can you explain why?


The Kaggle Digit Recognizer competition is a machine learning task where the goal is to recognize handwritten digits ranging from 0 to 9. The given dataset has been systematically sampled to include only 10% of the original data, selecting every 10th example and so on. This reduced dataset will be used to construct prediction models using three different machine learning algorithms: Naïve Bayes, k-Nearest Neighbor (kNN), and Support Vector Machine (SVM).

In this report, we will compare the performance of these algorithms by tuning their parameters and evaluating their accuracy through cross-validation. Additionally, we will analyze the specific data preprocessing steps associated with each algorithm.

Section 1: Data Preprocessing

Before constructing the prediction models, certain preprocessing steps need to be applied to the dataset. These steps may differ depending on the algorithm being utilized. The data preprocessing steps can generally include normalization, feature scaling, dimensionality reduction, and handling missing values. However, some specific steps may be unique to a particular algorithm. We will discuss these steps in their respective sections.

Section 2: Naïve Bayes

The Naïve Bayes algorithm will be used to build a prediction model for digit recognition. Naïve Bayes assumes independence between features, and it calculates the probability of a certain class given the observed features. In this section, we will tune the parameters of the Naïve Bayes model and compare the results.

One important preprocessing step for Naïve Bayes is discretization. Since the algorithm assumes independence between features, discretizing continuous variables into categorical ones can improve the accuracy. In this section, we will explore different discretization options and evaluate their impact on the model’s performance.

Section 3: k-Nearest Neighbor Method

The k-Nearest Neighbor (kNN) algorithm is a non-parametric method used for both classification and regression tasks. It works by assigning a new data point to the majority class of its k nearest neighbors in the training dataset. In this section, we will use the kNN algorithm to build a prediction model for digit recognition.

The data preprocessing steps for kNN include normalization and feature scaling. Since kNN relies on distances between data points, it is crucial to scale the features to the same range. We will apply these preprocessing steps before constructing the kNN model and evaluate their impact on the model’s performance.

Section 4: Support Vector Machine (SVM)

The Support Vector Machine (SVM) algorithm is a powerful classifier commonly used for various machine learning tasks. SVM aims to find an optimal hyperplane that separates the data points of different classes with the maximum margin. In this section, we will use the SVM algorithm to construct a prediction model for digit recognition.

The preprocessing steps for SVM may include normalization, feature scaling, and dimensionality reduction. Normalization and feature scaling are important for SVM since it relies on distances between data points. Additionally, dimensionality reduction techniques like Principal Component Analysis (PCA) can be applied to improve the performance of SVM on high-dimensional data. We will apply these preprocessing steps and analyze their impact in this section.

Section 5: Algorithm Performance Comparison

Finally, we will compare the results obtained from the Naïve Bayes, kNN, and SVM algorithms. We will assess the accuracy achieved by each algorithm and determine which one performed better in terms of classification. Furthermore, we will analyze the speed of execution for each algorithm and attempt to provide explanations for any observed differences.


In this report, we have outlined the classification problem in the Kaggle Digit Recognizer competition and described the general data preprocessing steps required for constructing prediction models using Naïve Bayes, kNN, and SVM algorithms. We then discussed the specific preprocessing steps involved in each algorithm section. Lastly, we compared the performance of these algorithms by evaluating their accuracy and execution speed.

Need your ASSIGNMENT done? Use our paper writing service to score better and meet your deadline.

Click Here to Make an Order Click Here to Hire a Writer