Data Converter Decision Model
Build your first decision model by setting boundaries, calculating accuracy, and understanding the power of interpretability.
Here is the part nobody tells you about ML models: the first version is almost always wrong. Not slightly wrong, but structurally wrong. The process of making it right is where the learning actually happens.
In the previous post, we established our sample: four users with features (years active, refund amount, IP accounts) and labels (1 or -1). Now we need to turn those features into a working converter, a model.
Setting the Boundaries
We have four users in our sample:
| User | Years Active | Refund Amount | IP Accounts | Label |
|---|---|---|---|---|
| James | 4.0 | $80 | 2 | 1 |
| Denver | 0.01 | $500 | 7 | -1 |
| Neal | 5.0 | $45 | 1 | 1 |
| Peter | 1.0 | $200 | 2 | 1 |
A Decision Tree makes its prediction by checking a set of conditions. Each condition is a parameter: a boundary value like "greater than 1 year" or "less than $200".
Model v1
Three conditions:
- Years Active >
1.0 - Refund Amount <
$200 - IP Accounts <
3
If all three pass: output 1. Otherwise: output -1.
Measuring Accuracy
Running every user through Model v1:
- James: Passes all three? Yes. Output:
1. Correct. - Denver: Passes all three? No. Output:
-1. Correct. - Neal: Passes all three? Yes. Output:
1. Correct. - Peter: Passes all three? No. He has exactly 1 year (not greater) and exactly $200 (not less). Output:
-1. Incorrect.
Accuracy: 3 out of 4 (75%). Not good enough for a four-user sample.
Tweak and Optimize
This is the core of training: adjusting parameters until the model fits the sample better.
Model v2
- Years Active >
0.9 - Refund Amount <
$201 - IP Accounts <
3
Peter now passes conditions 1 and 2. With these thresholds, Peter passes all three. Accuracy: 4 out of 4.
I also added a "2 out of 3" rule: if two conditions pass rather than all three, still approve. This makes the model more lenient and handles edge cases, but it also introduces risk, as we'll see in the next post.
ExpandDecision tree model v2 showing three parameter conditions leading to approve or deny
Why Interpretability Is Worth Caring About
A neural network can achieve 95% accuracy on this kind of task. But if the Head of Compliance asks why Peter got a refund and Denver didn't, a neural network cannot answer. A Decision Tree can: "Peter's account is over 0.9 years old, his refund is under $201, and he only has 2 accounts on his IP."
This is called interpretability: the ability to explain each prediction. For regulated domains (insurance, finance, lending), this is not optional. It's a legal requirement.
Many models would trade accuracy for interpretability. Knowing the trade-offs is what separates engineering a model from just running one.
In the next post, we test our perfect 4/4 model against data it has never seen, and watch it fail.
The Essentials
- Parameters: The boundary values inside a model (e.g., "Years > 0.9"). These are what we tune during training.
- Interpretability: The ability to explain the reason behind each individual prediction. Decision Trees have it. Many other models don't.
- Accuracy: The fraction of sample predictions that match the label. We want this as high as possible on the sample, but not at the cost of generalization.
- Iterative Tweaking: Training a model is not a one-shot operation. It is a loop: predict, measure, adjust, repeat.
Further Reading and Watching
Practice what you just read.
Keep reading