Glossary – Predictive Modeling

Predictive Modeling Terms:

Backtesting: The process of using historical data to test the performance of a predictive model.

Censoring: In survival analysis, censoring occurs when the time to an event of interest is not observed for some subjects in a study. Censoring occurs when a subject leaves the study before the event of interest occurs, or when the study ends before the event of interest occurs for all subjects.

Censoring is a common issue in survival analysis because it is often impossible to observe the exact time to event for all subjects. For example, in a study of cancer patients, some patients may be lost to follow-up before the end of the study period, and their time to death (or other event of interest) is unknown.

Censoring can be classified into three types:

Right-censoring: This is the most common type of censoring in survival analysis. Right-censoring occurs when the event of interest has not occurred for some subjects at the end of the study period. For example, if a study follows patients for five years and some patients are still alive at the end of the study, their time to death is right-censored.
Left-censoring: This occurs when the event of interest has already occurred for some subjects before they enter the study. For example, if a study examines the time to HIV diagnosis, subjects who are already diagnosed with HIV before the study begins are left-censored.
Interval-censoring: This occurs when the exact time to the event of interest is not known, but it is known to have occurred within a certain interval. For example, if a study follows patients for regular check-ups and a patient’s cancer is detected during one of the check-ups, but the exact time of cancer onset is unknown, the time to cancer onset is interval-censored.

Censoring is an important issue to consider in survival analysis because it can bias the estimates of the survival function and other related statistics if not handled properly. Special statistical methods, such as the Kaplan-Meier estimator or the Cox proportional hazards model, have been developed to account for censoring in survival analysis.

Hazard Rate: Represents the instantaneous rate at which events occur, given that it has not occurred already. In other words, what is the probability the the event will occur in the next instant of time. It is represented by the formula: h(t)= f(t) / S(t), where f(t) is the probability of termination, and S(t) is the Survival Function

The cumulative hazard rate (sum of h(t) from t = 0 to t = t) represents accumulated risk over time.

Kaplan-Meier Estimator is a non-parametric estimator. It allows the analyst to use observed data to estimate the survival distribution. The Kaplan-Meier Curve plots the cumulative probability of survival beyond each given time period. Using the Kaplan-Meier Curve allows us the analyst to visually inspect differences in survival rates by category, and see whether there appear to be differences based on these categories

Machine Learning: A branch of artificial intelligence that uses algorithms to learn from data and make predictions or decisions without explicit programming.

Predictive Modeling: The process of using statistical techniques and algorithms to analyze historical data and make predictions about future events or outcomes.

Risk-Adjusted Return: The return on an investment after adjusting for the level of risk.

Spline Variables: Refers to a technique used to model a non-linear relationship between two variables. A spline is a smooth curve that is constructed by joining together several polynomial segments of different degrees, known as knots.

Spline variables are often used to capture the effects of nonlinear relationships in financial models. For example, if the relationship between a company’s revenue and its advertising spend is not linear, spline variables can be used to model the relationship accurately.

Spline variables can also be used to address issues such as multicollinearity and overfitting in regression analysis. By using spline variables, the modeler can fit a curve that captures the underlying relationship between the variables without introducing unnecessary complexity into the model.

Spline variables are a useful tool in financial modeling for capturing nonlinear relationships between variables and improving the accuracy and robustness of the model.

Survival Analysis: Allows the analyst to calculate the probability of a defined terminal event, such as death, breakdown, loan prepayment or default, or some other event of interest at, by, or after a certain time. In analyzing survival (or failure), analysts use specialized regression models to calculate the contributions of various factors that influence the length of time before a failure occurs.

Survival Analysis allows us to consider cases with incomplete or censored data.

The Survival Function is defined as S(t)=P(T>t). It measures the probability that a subject will survive past time t.

This function:

Is decreasing (non-increasing) over time
Starts at 1 for all observations when t=0
Ends at 0 for a high-enough t

Common survival models include:

Cox Proportional Hazard (CPH) model: Assumes features have a constant proportional impact on the hazard rate.
Accelerated Failure Time (AFT) models (several variants including the Weibull AFT model)

Valuation Modeling Approaches:

Black-Scholes Model: This method is used to value a financial asset portfolio using option pricing theory, taking into account the volatility of the portfolio’s cash flows.
Discounted Cash Flow (DCF) Model: This method calculates the present value of future cash flows generated by an asset portfolio using a discount rate that reflects the risk of the portfolio.
Comparable Transactions: This method compares the subject financial asset portfolio to similar portfolios that have been recently sold in the market.
Mark-to-Market: This technique compares the value of the financial asset portfolio to the current market value of similar assets.
Mark-to-Model: This method uses a proprietary model to value the financial asset portfolio, taking into account factors such as credit risk and cash flow.
Monte Carlo Simulation: This method uses computer simulations to model the potential performance of the financial asset portfolio under different economic scenarios.
Risk-Adjusted Return on Capital (RAROC) Model: This technique calculates the return on capital of the financial asset portfolio adjusted for risk.
Statistical Modeling: This technique uses historical data to predict future performance of the financial asset portfolio. It can include factors such as credit score, loan-to-value ratio, and geographic location.

Links to other Glossaries

Glossaries Home Page

Mortgage and Mortgage-Backed Security Terms

Forensic Accounting and Fraud Examination Terms

Data Analytics Terms