In the scientific world, Randomness is a term with a slightly negative connotation. Doing things “randomly,” in everyday language, suggests improvisation, patching things together, not following a plan or logical reasoning. Imagine an analyst or scientist telling you they solved a problem using a random approach: well, you probably wouldn’t feel entirely reassured.
But in the world of statistics, things are quite different. Let’s be clear straight right away: statistics can, in a way, be defined as the science of randomness—especially its most famous branch: probability theory. However, this article isn’t about that thorny topic (raise your hand if you flinched reading “probability theory”), but rather something slightly more intriguing: machine learning, the branch of statistics that analyzes large datasets to make predictions about an uncertain outcome (the dependent variable) based on a set of predictors (also called independent variables).
What kind of tree is that?
One of the most useful, and elegantly simple, models in machine learning is the whimsically named Decision Tree. Let’s walk through a simple example. Suppose we want to predict someone’s salary (our dependent variable) based on some collected information: work experience, education level, and place of residence (our predictors).
A Decision Tree works like this: starting from a dataset, the tree identifies the predictor that most influences the dependent variable. It then splits the dataset into two branches based on a simple Yes/No question. For example:
“Does the person have more than 10 years of experience?”
If Yes, you go to the right branch of the tree; if No, to the left.
Now, among those with more than 10 years of experience, the tree moves to the second most impactful variable and asks another question:
“Does the person have a college degree?”
Again, if Yes, go right; if No, go left.
And so on, until reaching the last branches, whose endpoints—called leaves—contain the predicted value of the dependent variable, i.e., the predicted salary.
So, in its simplest form, a Decision Tree is a sequence of nested questions, each with only two answer choices: Yes or No. Based on how a person responds to those questions, the Decision Tree model predicts the outcome.

A “forest” of data
Naturally, a single Decision Tree gives only a single prediction based on a specific set of initial conditions. But here’s the fascinating part: by introducing a bit of controlled randomness, we can actually produce more robust predictions. How? By varying the initial conditions randomly, the model is trained to perform well not just on one situation, but on a range of diverse (and random) scenarios.
Welcome to the magical world of Random Forests—ensembles of Decision Trees, each trained on slightly different data created through random sampling. For example:
- Instead of using the entire dataset, you train each tree on a random sample of the data.
- Instead of using all predictors, you use a random subset of them for each tree.
You can repeat this process to generate a second dataset, a third, a fourth, and so on, each built by drawing random subsets from the original data. For each of these datasets, a new tree is trained. The final estimate from the Random Forest is then the weighted average of all individual tree predictions.
The benefits
But what’s the real advantage here, you ask? Well, in the vast majority of cases, Random Forests outperform single trees in predictive accuracy. This is because introducing controlled randomness, and effectively creating multiple datasets, helps the model generalize better. It avoids over-relying on any single dataset, thus reducing what’s known as “overfitting”.
Randomness has many other advantages in statistics. For example, by setting initial conditions and simulating thousands (or millions) of random samples, one can estimate the risk associated with a given choice or action. This approach is used in estimating financial investment returns or modeling particle interactions in physics, and it is known as Monte Carlo Simulation.
The key takeaway here is that in predictive statistics, the concept of randomness can be harnessed to our advantage. It enables the creation of a potentially infinite number of subsamples and simulations that strengthen statistical estimates. Without randomness, statistical models would be less accurate, and we wouldn’t be able to predict future events within an acceptable margin of error.
So, maybe that analyst who said they solved a problem with a random approach… wasn’t completely wrong after all.