Does anyone know anything about logistic regression, ROC Curves, and AUC (Area Under the Curve)?
I'm working on a project that I gave myself at work. Basically this is what it entails:
1) Get default data and the principal and APR associated with it.
2) Use logistic regression to forecast default rates based on principal and APR.
3) Set the default rate as a formula as a function of average loan size and interest (yes I know technically APR and interest are not the same things but I figured this would be easiest). Then set average loan size (or the determinants thereof) and interest rate as variables, while optimizing the net income for one report and cash for another report.
The outcome of this hopefully is us optimizing the loan amounts and interest rates we charge based on the financials.
Now, this is all well and good BUT, I ran a logistic regression with the aforementioned data and it doesn't seem uber predictive. So one way to measure the strength of the model is by looking at the ROC Curve that plots false positives and true positives when looking at actual results vs. the model's prediction. If you look at the AUC (area under the curve) and it's high then there's a good ratio of true positives to false positives. An AUC of 50% means there's no predictive power. The model might as well be a coin flip. I got about 62% which is considered poor. Does anyone know of a way that I can adjust either the model or more usefully the probability calculation to make it work? My model is underrepresenting the amount of defaults. I was considering adjusting the prediction by that amount and then I realized I'd have probability rates of over 100% quite easily.
So either the data is bad, defaults are actually not predicted by principal and interest, at least at my company, or I'm missing something here, such as other variables. Can anyone help? Can I just not do this analysis?
Thanks!
I'm working on a project that I gave myself at work. Basically this is what it entails:
1) Get default data and the principal and APR associated with it.
2) Use logistic regression to forecast default rates based on principal and APR.
3) Set the default rate as a formula as a function of average loan size and interest (yes I know technically APR and interest are not the same things but I figured this would be easiest). Then set average loan size (or the determinants thereof) and interest rate as variables, while optimizing the net income for one report and cash for another report.
The outcome of this hopefully is us optimizing the loan amounts and interest rates we charge based on the financials.
Now, this is all well and good BUT, I ran a logistic regression with the aforementioned data and it doesn't seem uber predictive. So one way to measure the strength of the model is by looking at the ROC Curve that plots false positives and true positives when looking at actual results vs. the model's prediction. If you look at the AUC (area under the curve) and it's high then there's a good ratio of true positives to false positives. An AUC of 50% means there's no predictive power. The model might as well be a coin flip. I got about 62% which is considered poor. Does anyone know of a way that I can adjust either the model or more usefully the probability calculation to make it work? My model is underrepresenting the amount of defaults. I was considering adjusting the prediction by that amount and then I realized I'd have probability rates of over 100% quite easily.
So either the data is bad, defaults are actually not predicted by principal and interest, at least at my company, or I'm missing something here, such as other variables. Can anyone help? Can I just not do this analysis?
Thanks!