Objective: Utilize Naïve Bayes to predict the flight delay. Given the FlightDelay.csv file, use Naïve Bayesian Analysis model to determine whether the various flights experience delay or arrive at their destination on time. We start by clicking the “install” on your R plot window (as shown below) to type and install the following packages: naivebayes, dplyr, ggplot2, and psych; one at a time. After the installation of all the packages, load them into the memory through these commands: > library (naivebayes) > library (dplyr) > install.packages(ggplot2) > library (psych) Next, we load the .csv file and check the statistical properties of the csv file as follow: > setwd(“C:/RData”) # your working directory > tumor <- read.csv("FlightDelay.csv") # loading the file > str(FlightDelay) # check the properties of the file . . . continue from here! Important Note: • You need to split your data into test-data (tdata) and validated-data (vdata). • Use tdata to build Naïve Bayes’ model and use vdata to predict your model. • The dependent variable (y) of the model is delay. • The independent variables are dest, origin, carrier, deptime, weather, & dayweek. • Show your conclusion. Mandatory video on Naïve Bayer classification using R programming: https://www.youtube.com/watch?v=RLjSQdcg8AM

In this assignment, we are tasked with utilizing Naïve Bayes to predict flight delays. To proceed, we will be required to install and load several packages in R, including naivebayes, dplyr, ggplot2, and psych. Once these packages have been successfully installed and loaded, we can proceed with loading the FlightDelay.csv file and examining its statistical properties.

First, let’s set our working directory using the setwd() command. This command allows us to specify the directory where the FlightDelay.csv file is located. For example, if the file is located in the “C:/RData” directory, we would use the following command:

> setwd(“C:/RData”)

Make sure to modify the directory path to match the actual location of your FlightDelay.csv file.

Next, we will use the read.csv() function to load the FlightDelay dataset into R. We assign the loaded dataset to a variable called “tumor” for easier reference. However, it seems there is a typo in the instruction as the variable should be “tumor” rather than “FlightDelay”. The correct command would be:

> tumor <- read.csv("FlightDelay.csv") After loading the dataset, we can use the str() function to check the properties of the file. This will give us information about the structure of the dataset, including the number of observations, variables, and the data types of each variable. The correct command to check the properties of the dataset is: > str(tumor)

Now that we have successfully performed the initial setup and loading of the dataset, we can move on to the next steps of building the Naïve Bayes model and predicting flight delays.

It is important to note that we need to split our data into two sets: a training set (tdata) and a validation set (vdata). The purpose of this split is to use the tdata to build the Naïve Bayes model and the vdata to validate or predict the model.

The dependent variable (y) in our model will be “delay”, indicating whether a flight experienced a delay or arrived on time. The independent variables we will be using are “dest”, “origin”, “carrier”, “deptime”, “weather”, and “dayweek”.

After building the Naïve Bayes model using the tdata, we will use the model to predict flight delays in the vdata. Finally, we will draw conclusions based on the results of our analysis.

To gain a better understanding of Naïve Bayes classification using R programming, a mandatory video is provided as a resource. We can watch the video at the provided link: https://www.youtube.com/watch?v=RLjSQdcg8AM

Need your ASSIGNMENT done? Use our paper writing service to score better and meet your deadline.


Click Here to Make an Order Click Here to Hire a Writer