Reject Inference Technique
Predictive models are used to form a “credit score” that determines the probability of an application being good/bad at a future date. Different types of models are used in this context, the most common being the Regression Formulae,
here, y is our target variable, that is, whether the loan would be good or bad. The representation could be through a dummy variable, probability unit or logistic unit. X’s are the variables that the model considers to reach the decision. Each of these X’s has a corresponding score that gets activated when the variable attached to it is activated. The regression coefficients determine the weightage of the respective variables, that is, factor by which the respective score attached to that variable would be weighted. Example,
As discussed,
there are different techniques to form the models. Broadly, they can be
classified into 2,
1.
Parametric
techniques: This includes, discriminant analysis, linear probability modelling
and logistic regression. These techniques require strong assumptions which need
not hold always
2.
Non-parametric
techniques: These overcome the problem of strong assumptions as non-parametric
models do not require them. Such models include, neural networks, genetic algorithms,
k-nearest neighbours, decision trees etc. However, they have their own set of
challenges, for example, lacking transparency and potential over-fitting
problem.
In order
to obtain good predictions for the applications, we also have other
requirements alongside modelling techniques,
1.
We
require our models to be transparent
2.
The
model structures should be easily analysable
3.
We
need sufficient amount of data
4.
The
quality of data should be relevant, accurate, consistent and complete
Even
after all this, with current technologies we have a extent to which we can
predict. The margin by which we miss the truth, is then called as the bias. There
are many ways in which bias may arise,
1. Data quality: Models rely heavily
on data. As discussed, “hallmarked” data is required to be fed into the models.
Sub-standard data leads to compromises in the model performance. Major source
of issues,
a.
Missing
data
b.
Misrepresentation
c.
Miscapture
2. Omitted characteristics: Some variables
would be hidden, not known or unavailable, creating sub-optimality. Major
source of issues,
a.
Compliance
b.
Poor
quality
c.
Lacking
infrastructure
d.
Ignorance
3. Sample selection: The modelling
sample should be representative. Major source of issues,
a.
Improper
inclusions
b.
Improper
exclusions
4. Transformation: Data needs to be
transformed for the model to capture their relationship. Major source of
issues,
a.
No-transformation
b.
Improper
transformation
5. Misapplication: Many times, firms may
borrow scorecards that were originally developed for other areas. Major source
of issues,
a.
Inappropriate
use
b.
Shock
events
To
deal with the issue of sample selection, reject inference technique is used. Our
sample should ideally contain both “accepts” and “rejects”. However, the data
that is used to form the scorecard contains only those application that were
funded. This is because only in the funded application pool, we know the good-bad
loan distribution. If we use this biased sample to form scorecard, the results
would be erroneous since the model would be used to determine applications who
were ignored. Reject Inference offers help in this regard by providing inferred
performance for non-funded applications.
So,
any received application goes through certain steps for its risk to be
assessed. The steps would vary from an institution but let’s say, it goes
through 2-steps before the risk is underwritten,
1. Exclusions: Those applications that
do not meet set criteria are rejected right away. These applications might be
very poor in terms of regulatory requirements or firm’s policies
2. Risk assessment: Applications that
pass the first step, then goes through the model that calculates the probability
of default (PD) for each of them. The PD is then used to categorize
applications on a risk grade. Consequently, this becomes a major input for the
decision of loan approval/rejection
As
discussed, there are other rejected applications where good-bad distribution is
not known. Whether such applications would be good-bad could only be known if
such applications were approved. Despite this, such applications have gone
through and have to go through the risk-scorecard as well. Note that there
might be good loans in this pool of rejected applications as well. But since
this sub-sample is not considered in constructing the risk scorecard, such loan
applications have chances to be denied in the future as well. Therefore, to get
inference about this sub-sample whose good-bad distribution is not known, the Reject
Inference (RI) technique is used. It then allows us to form better scorecards using
an entire set of applications (including those previously declined). In this
way, the RI technique ensures that the modelling population is a better
representation of the actual scenario.
The
need for reject inference depends on factors like,
1. Heterogeneity of population
2. Reject rates
3. Effectiveness of current process
A statistical tool used to measure the appropriateness of RI is, known-to-inferred odds ratio. Higher the value, greater the risk associated with the inferred group. Its ration is sought to be appropriate if it falls between 2 and 4.
The
following steps are utilized to obtain the good-bad distribution for the
rejected pool using the RI method
There
are 3 main types of performance manipulation tools,
1. Reweighting
2. Reclassification
3. Parcelling
a.
Random
(left)
b.
Polarised; some say simple augmentation (centre)
c. Fuzzy (right)
References
1.
Credit scorecards with reject inference,
AngossSoftware, url: https://www.youtube.com/watch?v=v4nebafmtHc&t=143s,
accessed: 17-11-21
2.
Introduction to reject inference, Gopal
Prasad Malakar, url: https://www.youtube.com/watch?v=iaHbUUwcslM, accessed:
17-11-21
3.
The Credit Scoring Toolkit, Oxford
University Press, Raymond Anderson
Comments
Post a Comment