The uscrime dataset has 15 predictors and one response. In this homework, the best factors that describe
the response are found using the variable selection methods Stepwise Regression, LASSO, and Elastic
Net. To do Stepwise Regression, the function regsubsets in R is used. This function has the argument
nvmax which is the maximum number of factors that is used to t the model. Since the datase
...[Show More]
The uscrime dataset has 15 predictors and one response. In this homework, the best factors that describe
the response are found using the variable selection methods Stepwise Regression, LASSO, and Elastic
Net. To do Stepwise Regression, the function regsubsets in R is used. This function has the argument
nvmax which is the maximum number of factors that is used to t the model. Since the dataset has 15
predictors, nvmax = 15 is added so the model uses only one factor all the way through the 15 factors.
Another argument that is used is method and this gets the value seqrep, which refers to sequential
replacement and is a combination between forward selection and backward elimination.
For each combination of number of factors used, the best factors that the model recommend to use
can be seen in Figure 1. The method also gives the adjusted R2 for each number of factors used and
it is represented in Figure 2. Going from using 5 factors through using 15 factors, excluding when 10
factors are considered, the adjusted R2 have a very similar value. Nonetheless, the biggest adjusted R2
is selected which is when only 8 predictors are used. These predictors are:
The function regsubsets does not t a linear model. For this, using the factors selected by Stepwise
Regression, a linear model is tted using the function lm. This model gets an adjusted R2 of 0.7444 and
is represented by equation 1. The predictors M:F and U1 have a p-value > 0:05, so these are deleted to
see if the model improves. But, the model presents an adjsuted R2 = 0:7307 and it can be interpreted in
two ways. First, both predictors are relevant for the description of the response variable; or, the model
is overtted and this is probably the case since this has been proven in past homeworks. The dataset
has too few datapoints in order to draw a conclusion. The prediction made by the model compared to
the original response variable can be seen in Figure 3.
y = 93:32M + 180:12Ed + 102:65Po1 + 22:34M:F 6086:63U1+
187:35U2 + 61:33Ineq 3796:03Prob 2802:1126335
[Show Less]