Three Basic Predictive Analysis Models

It used to be that basic data was enough to make successful decisions within an organization. A CEO could look at common key performance indicators such as net profit margin, debt to income ratio, and return on investment and be able to make the best decisions available at the time.

For the past several decades, companies have collected large amounts of data in order to evaluate why they performed the way they did and to understand their customer’s needs and preferences. They built data warehouses and advance reports to improve accuracy to improve key processes, and optimize performance.

As time went on, companies learned that they could use historical data and trends to predict future behavior, and to make decisions. This was seen in examples as when a call center manager uses call volume by hour statistics to staff a call center for peak and non peak times.

Then organizations moved beyond reporting capabilities and began gathering even larger amounts of data to apply statistical analysis to further predict future trends and behavioral patterns. This was seen in examples like the banking industry using credit history, residential information, job information, debts, etc to calculate a credit score to determine if a person is likely to pay off a loan. This is an example of predictive analytics, and organizations in all genres are learning to apply it to their reporting capabilities. Predictive analytics applies large volumes of data to capture relationships between explanatory variables (variables used in a relationship to explain or predict changes in the values of another variable) and predicted variables from past data, and applying it to predict future outcomes.

Predictive modeling is the process by which data is modeled and diagnosed to try to best predict the probability of an outcome. In many cases the model is chosen on the basis of detection theory to try to guess the probability of a signal given a set amount of input data. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set.

There are three main types of models associated with predictive analytics: predictive models, descriptive models, and decision models.

Predictive models predict future behavior and anticipate the consequences of change. Predictive models are comprised of a number of predictors (factors likely to affect future behavior or results). For example, in marketing a customer’s age, sex and income can be used to predict the likelihood of buying.

Predictive analytics’ central building block is the predictor, a single value measured for each customer. For example, ‘most recent’, which is based on the number of weeks since the customer’s last purchase, has higher values for more recent customers. This predictor is usually a reliable campaign response predictor: you will receive more responses from those customers more highly ranked by ‘most recent’. That means that if you contact your customers in order of ‘most recent’ – first, call the most-recent customer; next, call the next-most-recent customer; and so on – you will improve your response rate. For each prediction goal, there are an abundance of predictors that will help rank your customer database. For example, consider a customer’s online behavior: Customers who spend less time logged on may be less likely to renew their annual subscription. In this case, retention campaigns can be cost-effectively targeted to customers with a low monthly usage predictor value.

Descriptive models quantify the relationships between data in order to classify customers into groups. While predictive models focus on predicting one customer’s behavior, descriptive models identify relationships between several customers or products. Descriptive models do not predict a target value, but focus more on the intrinsic structure, relations, interconnectedness, etc. Descriptive models are used in our earlier example of the financial industry and credit scores.

Cluster analysis is a descriptive modeling technique that identifies clusters embedded in the data. A cluster is a collection of data objects that are similar in some sense to one another.

Another descriptive modeling technique is the k-means algorithm. K-means algorithm is a distance-based clustering algorithm that partitions the data into a predetermined number of clusters (provided there are enough distinct cases). The k-means algorithm works only with numerical attributes. Distance-based algorithms rely on a distance metric (function) to measure the similarity between data points.

Decision models describe the relationship between all decision elements and predict the results of decisions, allowing you to try different scenarios, and optimize results. Clinical Decision Support Systems use predictive analysis in the health care industry to determine at risk patients and sometimes to determine which course of action would be best given a multiple array of variables.

Rational decision models are based around a cognitive judgment of the pros and cons of various options. It is organized around selecting the most logical and sensible alternative that will have the desired effect. The decisions are normally organized through a detailed analysis of alternatives and a comparative assessment of the advantages of each. Weighted criteria scoring is an example of rational decision models.

Hopefully this has given you a better understanding of the basic predictive analysis models that drive predictive analytics. Check out my article on predictive modeling techniques to learn about 12 common techniques used to predict future behavior.