100% Guaranteed Results


CSC498H/BIF524 Solved
$ 24.99
Category:

Description

5/5 – (1 vote)

Task I: Data Preprocessing
Suppose that you want to build a model to predict whether a client has a health coverage insurance. You collected a set of information that you think can help predict the probability of health coverage.
1- Load the โ€œClientsโ€ dataset into R. What does the collected information on each client consist of?
2- Using the summary function, comment on age, income, and housingStatus.
3- Based on a subset of the data with 0 < ๐‘Ž๐‘”๐‘’ < 100 and ๐‘–๐‘›๐‘๐‘œ๐‘š๐‘’ > 0, is there a correlation between age and income? Plot both variables against each other and comment on the variations.
4- What is the number of missing values in housingStatus, recentlyChangedHousing, and numberCars variables? Find out whether those missing values come from the same observations? Comment on how to deal with them accordingly.
5- Now check the variable works. What can you say about the number of missing values? Does it make sense to remove all observations with NAs? If you think about the possible meaning of NA for this variable, it seems more reasonable to create a new variable fixedWorks with an additional level of value โ€œmissingโ€ for those observations, while keeping โ€œemployedโ€ and โ€œnot employedโ€ for the two other possible levels. Find a way to define such variable in R โ€“ Hint: use ifelse.
6- What is the type of variable income? What is the number of missing values in this variable? Assuming that the observations with missing income values have the same distribution as clients with specified income values, create a new variable fixedIncome in which you fill the mean income value in place of missing income values โ€“ Hint: you can also use ifelse.
Use the function cut to define a new variable age.range by dividing age into ranges, with breaks 0,25,65,๐‘Ž๐‘›๐‘‘ ๐ผ๐‘๐น.

Task II: The goal is to generate a model to predict diamond prices based on a set of features.
1- Use ggplot to:
a. Plot price against carat.
b. Plot cut against price.
2- Generate a simple linear regression model with price as response and carat as predictor. Comment on the model summary.
3- Generate a multiple linear regression model with price as response and carat, clarity, and color as predictors. Comment on the model summary.
4- Generate a multiple linear regression model with price as response and all attribute in the dataset as predictors. Comment on the model summary and compare it to the previous model.

Reviews

There are no reviews yet.

Be the first to review “CSC498H/BIF524 Solved”

Your email address will not be published. Required fields are marked *

Related products