Reject Inference Technique

Predictive models are used to form a “credit score” that determines the probability of an application being good/bad at a future date. Different types of models are used in this context, the most common being the Regression Formulae,

here, y is our target variable, that is, whether the loan would be good or bad. The representation could be through a dummy variable, probability unit or logistic unit. X’s are the variables that the model considers to reach the decision. Each of these X’s has a corresponding score that gets activated when the variable attached to it is activated. The regression coefficients determine the weightage of the respective variables, that is, factor by which the respective score attached to that variable would be weighted. Example,

As discussed, there are different techniques to form the models. Broadly, they can be classified into 2,

1.     Parametric techniques: This includes, discriminant analysis, linear probability modelling and logistic regression. These techniques require strong assumptions which need not hold always

2.     Non-parametric techniques: These overcome the problem of strong assumptions as non-parametric models do not require them. Such models include, neural networks, genetic algorithms, k-nearest neighbours, decision trees etc. However, they have their own set of challenges, for example, lacking transparency and potential over-fitting problem.

In order to obtain good predictions for the applications, we also have other requirements alongside modelling techniques,

1.     We require our models to be transparent

2.     The model structures should be easily analysable

3.     We need sufficient amount of data

4.     The quality of data should be relevant, accurate, consistent and complete

Even after all this, with current technologies we have a extent to which we can predict. The margin by which we miss the truth, is then called as the bias. There are many ways in which bias may arise,

1.     Data quality: Models rely heavily on data. As discussed, “hallmarked” data is required to be fed into the models. Sub-standard data leads to compromises in the model performance. Major source of issues,

a.      Missing data

b.     Misrepresentation

c.      Miscapture

2.     Omitted characteristics: Some variables would be hidden, not known or unavailable, creating sub-optimality. Major source of issues,

a.      Compliance

b.     Poor quality

c.      Lacking infrastructure

d.     Ignorance

3.     Sample selection: The modelling sample should be representative. Major source of issues,

a.      Improper inclusions

b.     Improper exclusions

4.     Transformation: Data needs to be transformed for the model to capture their relationship. Major source of issues,

a.      No-transformation

b.     Improper transformation

5.     Misapplication: Many times, firms may borrow scorecards that were originally developed for other areas. Major source of issues,

a.      Inappropriate use

b.     Shock events

To deal with the issue of sample selection, reject inference technique is used. Our sample should ideally contain both “accepts” and “rejects”. However, the data that is used to form the scorecard contains only those application that were funded. This is because only in the funded application pool, we know the good-bad loan distribution. If we use this biased sample to form scorecard, the results would be erroneous since the model would be used to determine applications who were ignored. Reject Inference offers help in this regard by providing inferred performance for non-funded applications.

So, any received application goes through certain steps for its risk to be assessed. The steps would vary from an institution but let’s say, it goes through 2-steps before the risk is underwritten,

1.     Exclusions: Those applications that do not meet set criteria are rejected right away. These applications might be very poor in terms of regulatory requirements or firm’s policies

2.     Risk assessment: Applications that pass the first step, then goes through the model that calculates the probability of default (PD) for each of them. The PD is then used to categorize applications on a risk grade. Consequently, this becomes a major input for the decision of loan approval/rejection

As discussed, there are other rejected applications where good-bad distribution is not known. Whether such applications would be good-bad could only be known if such applications were approved. Despite this, such applications have gone through and have to go through the risk-scorecard as well. Note that there might be good loans in this pool of rejected applications as well. But since this sub-sample is not considered in constructing the risk scorecard, such loan applications have chances to be denied in the future as well. Therefore, to get inference about this sub-sample whose good-bad distribution is not known, the Reject Inference (RI) technique is used. It then allows us to form better scorecards using an entire set of applications (including those previously declined). In this way, the RI technique ensures that the modelling population is a better representation of the actual scenario.

The need for reject inference depends on factors like,

1.     Heterogeneity of population

2.     Reject rates

3.     Effectiveness of current process

A statistical tool used to measure the appropriateness of RI is, known-to-inferred odds ratio. Higher the value, greater the risk associated with the inferred group. Its ration is sought to be appropriate if it falls between 2 and 4.

The following steps are utilized to obtain the good-bad distribution for the rejected pool using the RI method

There are 3 main types of performance manipulation tools,

1.     Reweighting

2.     Reclassification

3.     Parcelling

a.      Random (left)

b.     Polarised; some say simple augmentation (centre)

c.      Fuzzy (right)


1.                Credit scorecards with reject inference, AngossSoftware, url:, accessed: 17-11-21

2.                Introduction to reject inference, Gopal Prasad Malakar, url:, accessed: 17-11-21

3.                The Credit Scoring Toolkit, Oxford University Press, Raymond Anderson
