What is "Data Converter Decision Model" about?

Build a decision tree model from scratch, understand parameter tuning, and learn why interpretability matters in regulated domains.

What topics does "Data Converter Decision Model" cover?

This article covers: decision tree, ml model, model parameters, interpretability, model accuracy, machine learning training.

Building a Decision Tree: Parameters & Interpretability

Here is the part nobody tells you about ML models: the first version is almost always wrong. Not slightly wrong, but structurally wrong. The process of making it right is where the learning actually happens.

In the previous post, we established our sample: four users with features (years active, refund amount, IP accounts) and labels (1 or -1). Now we need to turn those features into a working converter, a model.

Setting the Boundaries

We have four users in our sample:

User	Years Active	Refund Amount	IP Accounts	Label
James	4.0	$80	2	1
Denver	0.01	$500	7	-1
Neal	5.0	$45	1	1
Peter	1.0	$200	2	1

A Decision Tree makes its prediction by checking a set of conditions. Each condition is a parameter: a boundary value like "greater than 1 year" or "less than $200".

Model v1

Three conditions:

Years Active > 1.0
Refund Amount < $200
IP Accounts < 3

If all three pass: output 1. Otherwise: output -1.

Measuring Accuracy

Running every user through Model v1:

James: Passes all three? Yes. Output: 1. Correct.
Denver: Passes all three? No. Output: -1. Correct.
Neal: Passes all three? Yes. Output: 1. Correct.
Peter: Passes all three? No. He has exactly 1 year (not greater) and exactly $200 (not less). Output: -1. Incorrect.

Accuracy: 3 out of 4 (75%). Not good enough for a four-user sample.

Tweak and Optimize

This is the core of training: adjusting parameters until the model fits the sample better.

Model v2

Years Active > 0.9
Refund Amount < $201
IP Accounts < 3

Peter now passes conditions 1 and 2. With these thresholds, Peter passes all three. Accuracy: 4 out of 4.

I also added a "2 out of 3" rule: if two conditions pass rather than all three, still approve. This makes the model more lenient and handles edge cases, but it also introduces risk, as we'll see in the next post.

ExpandDecision tree model v2 showing three parameter conditions leading to approve or deny

Why Interpretability Is Worth Caring About

A neural network can achieve 95% accuracy on this kind of task. But if the Head of Compliance asks why Peter got a refund and Denver didn't, a neural network cannot answer. A Decision Tree can: "Peter's account is over 0.9 years old, his refund is under $201, and he only has 2 accounts on his IP."

This is called interpretability: the ability to explain each prediction. For regulated domains (insurance, finance, lending), this is not optional. It's a legal requirement.

Many models would trade accuracy for interpretability. Knowing the trade-offs is what separates engineering a model from just running one.

In the next post, we test our perfect 4/4 model against data it has never seen, and watch it fail.

The Essentials

Parameters: The boundary values inside a model (e.g., "Years > 0.9"). These are what we tune during training.
Interpretability: The ability to explain the reason behind each individual prediction. Decision Trees have it. Many other models don't.
Accuracy: The fraction of sample predictions that match the label. We want this as high as possible on the sample, but not at the cost of generalization.
Iterative Tweaking: Training a model is not a one-shot operation. It is a loop: predict, measure, adjust, repeat.

Data Converter Decision Model