Description
EXECUTIVE SUMMARY
Your Team Name (pick one): Dataminers___________________________________________________________________________
⧠ Individual Submission
⧠ Group Submission. List group member names: Paulami Ray, Alejandro Jose Colindres Galindo, Allie Mouche
Case Overview
Concisely describe the problem and your objectives as the data analyst.
Problem Statement: Chess Bank deals with home loan applications and provides loan based on bank profits. The bank wants to propose a list of applicants that the bank should give loan to, by maximizing total profit from these loans. The application file for the year 2017 contains information of 250 loan applications, including loan amount and personal characteristics such as income, credit score, age, and education level.
Methodology
Describe the data approach and methodology you use. Justify your choice of methodology. You should be precise, but also avoid jargon. This is a document that will be going to the CFO, so make the language appropriate for them.
As part of the analysis activity on the loan applications, we executed the following steps to draw insights from the data:
• Data Cleaning: As part of the initial analysis, we found that there are 19 features and 250 applicants for the year 2017.The data set is complete and balanced with no null values in the columns, therefore there was no need to perform a data cleaning.
• Exploratory Analysis & Feature Engineering: The main objective of the case is to find the list of applicants the bank should agree to give a loan based on the estimated profit on each account. We calculated the profit as the difference between the amount paid and the loan amount. We defined following new features:
1) profit that would take the value of ‘1’ if the profit is greater than or equal to zero and ‘0’ if the profit is less than zero.
2) meanincome that would be the average wage income of the loan applicants 1 and 2 years ago. 3) li_ratio that would be the loan to mean income ratio of the loan applicants.
There were integer or categorical variables like statecode, married, educ, taxdependent which needed to be transformed into factor variables. The main advantage of converting this is that we can use the variables in statistical modeling where they can be implemented correctly.
1
• Model Selection and Evaluation: As part of the prediction process we used the following algorithms to derive the output:
1) Linear Regression: We used multiple linear regression model first to see how the independent variables are used to predict the value of the dependent variable – profit. We divided the loans_2017 data set into train and test with 90% values in the training set and 10% in the test set as the data set is relatively small. The training set was used to fit the model, and test set was used to evaluate the best model to get an estimation of generalization error and accuracy. In the linear model, we got the Accuracy to be – 0.60
2) Logistic Regression: We used logistic regression as our next predictive analysis model as it explains the relationship between the dependent binary variable well. In this case we used the cross-validation method to resample the data set. The reported Accuracy – 0.64
3) Decision Tree: Lastly, we used the decision tree modelling technique. Decision trees works well in performing feature selection which gives a clear understanding on the features which are important deciding factors in the model. Moreover, it is easy to interpret and get insights.
In this model the reported Accuracy – 0.72
Conclusion
Describe your results, including to how many people you intend to give loans, the total loan amounts, and the anticipated resulting profits. Describe your data file which reports to which individual loans should be given.
2



Reviews
There are no reviews yet.