Description
REQUEST FOR PROPOSAL
RFP #: IP – F3.H2 TITLE: BANKING INSURANCE PRODUCT – PHASE 2
Banking Insurance Product –
Phase 2: IP – F3.H2
Purpose
By responding to this Request for Proposal (RFP), the Proposer agrees that s/he has read and understood all documents within this RFP package.
Submission Details
Responders to this RFP should supply:
• A business report up to 4 pages (not including cover page, table of contents, or any needed appendix), including any supporting plots and tables.
• The commented code used to produce the results.
The report should address all points described in the “Objective” section below.
The report should be returned in the following way:
• Electronic (submit via Moodle)
Background
The Commercial Banking Corporation (hereafter the “Bank”), acting by and through its department of Customer Services and New Products is seeking proposals for banking services. The Bank ultimately wants to predict which customers will buy a variable rate annuity product. Previously the bank sought consulting work on the same project, but also had a focus on understanding the factors involved. Here the focus is more on predictive power.
A variable annuity offers a range of investment options. The value of your investment as a variable annuity owner will vary depending on the performance of the investment options you choose. The investment options for a variable annuity are typically mutual funds that invest in stocks, bonds, money market instruments, or some combination of the three. If you are interested in more information, see:
http://www.sec.gov/investor/pubs/varannty.htm
The project will be broken down into 3 phases:
• Phase 1 – MARS and GAMs
• Phase 2 – Tree-Based Models
• Phase 3 – Model Interpretation
Objective – Phase 2
The scope of services in this phase includes the following:
• For this phase use only the insurance_t data set.
• Previous analysis has identified potential predictor variables related to the purchase of the insurance product so no initial variable selection before model building is necessary.
• The data has missing values that need to be imputed.
o Typically, the Bank has used median and mode imputation for continuous and categorical variables but are open to other techniques if they are justified in the report.
•
• The Bank is interested in the value of random forest models.
o Build a random forest model.
§ (HINT: You CANNOT just copy and paste the code from class. In class we built a model to predict a continuous variable. Make sure your target variable is a factor for the random forest.)
o Tune the model parameters and recommend a final random forest model.
§ You are welcome to consider variable selection as well for building your final model. Describe your process for arriving at your final model.
o Report the variable importance for each of the variables in the model.
§ Pick one metric to rank things by – no need to report multiple metrics for each variable.
o Report the area under the ROC curve as well as a plot of the ROC curve.
§ (HINT: Use the same approaches you used back in the logistic regression class.)
• The Bank is also interested in the value of an XGBoost model.
o Build an XGBoost model.
§ (HINT: You CANNOT just copy and paste the code from class. In class we built a model to predict a continuous variable. You will need to look up the documentation for the ‘objective = “binary:logistic” ‘ option.)
§ Use the area under the ROC curve (AUC) as your evaluation metric instead of the default in XGBoost.
o Tune the model parameters and recommend a final XGBoost model.
§ You are welcome to consider variable selection as well for building your final model. Describe your process for arriving at your final model.
o Report the variable importance for each of the variables in the model.
o Report the area under the ROC curve as well as a plot of the ROC curve.
§ (HINT: Use the same approaches you used back in the logistic regression class.)
Data Provided
The following two sets of data are provided for the proposal:
• The training data set insurance_t contains 8,495 observations and selected variables.
o All of these customers have been offered the product in the data set under the variable INS, which takes a value of 1 if they bought and 0 if they did not buy.
o There are selected variables describing the customer’s attributes before they were offered the new insurance product.
• The validation data set insurance_v contains 2,124 observations and selected variables.
• The table below describes the Roles and Description of the variables found in both data sets.
o Except for Branch of Bank, consider anything with more than 10 distinct values as continuous.
Name Model Role Description
ACCTAGE Input Age of oldest account
DDA
DDABAL DEP
DEPAMT
CHECKS
DIRDEP NSF
NSFAMT
PHONE
TELLER SAV
SAVBAL
ATM
ATMAMT
POS
POSAMT
CD
CDBAL IRA
IRABAL INV
INVBAL MM
MMBAL
MMCRED
CC
CCBAL
CCPURC SDB
INCOME
LORES
HMVAL
AGE
CRSCORE
INAREA INS
BRANCH Input Indicator for checking account
Input Checking account balance
Input Checking deposits
Input Total amount deposited
Input Number of checks written
Input Indicator for direct deposit
Input Number of insufficient fund issues
Input Amount of NSF
Input Number of telephone banking interactions
Input Number of teller visit interactions
Input Indicator for savings account
Input Savings account balance
Input Indicator for ATM interaction
Input Total ATM withdrawal amount
Input Number of point of sale interactions
Input Total amount for point of sale interactions
Input Indicator for certificate of deposit account
Input CD balance
Input Indicator for retirement account
Input IRA balance
Input Indicator for investment account
Input INV balance
Input Indicator for money market account
Input MM balance
Input Number of money market credits
Input Indicator for credit card
Input CC balance
Input Number of credit card purchases
Input Indicator for safety deposit box
Input Income
Input Length of residence in years
Input Value of home
Input Age
Input Credit score
Input Indicator for local address
Target Indicator for purchase of insurance product
Input Branch of bank




Reviews
There are no reviews yet.