— Use file to conduct the classification task. Target variable: income. Predictors: all other variables in the file. Partition the data one training sample (the first 50% rows based on the order of row index) and two testing samples (the next following 25% rows as test1 and the rest 25% as test2). Build classification model on training sample and evaluate it on two testing samples. Evaluation metrics that need to be generated: overall accuracy, recall (TPR), precision, and f-measure for each of the two classes: >50k and

To perform the classification task, we will first partition the data into a training sample and two testing samples. The target variable for this task is “income”, while the predictors are all the other variables in the file.

The data will be partitioned as follows: the first 50% of the rows will be used as the training sample, and the next 25% rows will be used as test1, followed by the remaining 25% rows as test2.

To build and evaluate the classification model, we have the option to choose one of the following models: ksvm (support vector machine), C5.0 (decision tree), NB (naïve Bayes), KNN (k-nearest neighbors), and glm (logistic regression).

Once the model is built, we will evaluate it using several metrics. The metrics that need to be generated include overall accuracy, recall (true positive rate), precision, and f-measure for each of the two classes: >50k and <=50k. These metrics will help assess the performance of the model in predicting the income levels accurately. Moving on to the hierarchical clustering task, we will remove the "state" column from the dataset as it is not needed for the distance calculation in the clustering task. All the remaining columns will be used for the distance calculation. To generate the clustering results, we will use the "hierarchical" function. This will create a hierarchy of clusters based on the calculated distances between the data points. We can then check the plot of the family of clusters, which will give us a visual representation of the clustering results. Selecting the number of clusters (k) is an important step in hierarchical clustering. We will have to determine the appropriate value for k based on the nature of the data and the objective of the analysis. Once we have determined the value for k, we can assign a cluster ID to each data point and check the state name and the corresponding cluster ID. By performing these tasks, we will gain insights into the classification of income levels and the clustering patterns within the dataset. The generated evaluation metrics and cluster results will provide valuable information for further analysis and decision-making purposes.

Disclaimer

Links

Payment Method

Contact

CHAT WITH OUR LIVE SUPPORT WHO ARE LIVE 24/7.

START A CONVERSATION ANYTIME AND WE WILL BE GLAD TO SERVE YOU.

Need your ASSIGNMENT done? Use our paper writing service to score better and meet your deadline.