Data Converter Decision Model

Build your first decision model by setting boundaries, calculating accuracy, and understanding the power of interpretability.

May 15, 20263 min read2 / 5

Here is the part nobody tells you about ML models: the first version is almost always wrong. Not slightly wrong, but structurally wrong. The process of making it right is where the learning actually happens.

In the previous post, we established our sample: four users with features (years active, refund amount, IP accounts) and labels (1 or -1). Now we need to turn those features into a working converter, a model.

Setting the Boundaries

We have four users in our sample:

UserYears ActiveRefund AmountIP AccountsLabel
James4.0$8021
Denver0.01$5007-1
Neal5.0$4511
Peter1.0$20021

A Decision Tree makes its prediction by checking a set of conditions. Each condition is a parameter: a boundary value like "greater than 1 year" or "less than $200".

Model v1

Three conditions:

  1. Years Active > 1.0
  2. Refund Amount < $200
  3. IP Accounts < 3

If all three pass: output 1. Otherwise: output -1.

Measuring Accuracy

Running every user through Model v1:

  • James: Passes all three? Yes. Output: 1. Correct.
  • Denver: Passes all three? No. Output: -1. Correct.
  • Neal: Passes all three? Yes. Output: 1. Correct.
  • Peter: Passes all three? No. He has exactly 1 year (not greater) and exactly $200 (not less). Output: -1. Incorrect.

Accuracy: 3 out of 4 (75%). Not good enough for a four-user sample.

Tweak and Optimize

This is the core of training: adjusting parameters until the model fits the sample better.

Model v2

  1. Years Active > 0.9
  2. Refund Amount < $201
  3. IP Accounts < 3

Peter now passes conditions 1 and 2. With these thresholds, Peter passes all three. Accuracy: 4 out of 4.

I also added a "2 out of 3" rule: if two conditions pass rather than all three, still approve. This makes the model more lenient and handles edge cases, but it also introduces risk, as we'll see in the next post.

Decision tree model v2 showing three parameter conditions leading to approve or deny ExpandDecision tree model v2 showing three parameter conditions leading to approve or deny

Why Interpretability Is Worth Caring About

A neural network can achieve 95% accuracy on this kind of task. But if the Head of Compliance asks why Peter got a refund and Denver didn't, a neural network cannot answer. A Decision Tree can: "Peter's account is over 0.9 years old, his refund is under $201, and he only has 2 accounts on his IP."

This is called interpretability: the ability to explain each prediction. For regulated domains (insurance, finance, lending), this is not optional. It's a legal requirement.

Many models would trade accuracy for interpretability. Knowing the trade-offs is what separates engineering a model from just running one.

In the next post, we test our perfect 4/4 model against data it has never seen, and watch it fail.

The Essentials

  1. Parameters: The boundary values inside a model (e.g., "Years > 0.9"). These are what we tune during training.
  2. Interpretability: The ability to explain the reason behind each individual prediction. Decision Trees have it. Many other models don't.
  3. Accuracy: The fraction of sample predictions that match the label. We want this as high as possible on the sample, but not at the cost of generalization.
  4. Iterative Tweaking: Training a model is not a one-shot operation. It is a loop: predict, measure, adjust, repeat.

Further Reading and Watching

Practice what you just read.

The Refund Predictor
1 exercise