[P32] Beta distribution

An introduction to the distribution and its applications.

Jun 26, 2026

Context: I was preparing a post on the synthetic data generation prior of TabICLv2 and came across this distribution. I was going to add it to the appendix section of that post, but it needs more work, so I decided to put this section out as a separate post.

The Beta distribution is a family of continuous probability distributions defined on the boundary interval \([0,1]\). It is parameterized by two positive shape parameters alpha and beta, which dictate the overall shape, spread, and skewness of the curve. Its probability density function is given by

\(f(x;\alpha, \beta) = \dfrac{x^{\alpha -1}(1-x)^{\beta-1}}{B(\alpha, \beta)} \quad x\in[0,1],\)

where \(B(\alpha, \beta)\) is the Beta function, a normalization constant ensuring the total area under the curve equals 1. It is computed using the Gamma function:

\(B(\alpha, \beta) = \dfrac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}.\)

The expected value (mean) and variance (spread) of the beta distribution are

\(\mu=\dfrac{\alpha}{\alpha + \beta}, \quad \sigma^2=\dfrac{\alpha \beta}{(\alpha + \beta)^2(\alpha + \beta + 1)}.\)

Intuition

The easiest way to understand the beta distribution is through its relationship with the Binomial distribution. While a binomial distribution calculates the odds of getting a certain number of successes from a fixed probability, the Beta distribution flips this around. It calculates the likelihood of different underlying probabilities based on an observed success and failures.

If we are tracking an event, we can think of the parameters as:

\(\alpha\) = Successes + 1,
\(\beta\) = Failures + 1

For example, if a biased coin is flipped 5 times, landing on heads (success) is 3 times and tails (failure) 2 times, the probability distribution of that coin landing on heads can be modeled as Beta(4,3).

Parameter visualization and shapes

The distribution is incredibly flexible because changing α and β alters the geometry of the curve drastically:

\(\alpha = 1\), \(\beta=1\) (uniform): Completely flat line. Every probability between 0 and 1 is equally likely, representing total uncertainty.
\(\alpha > 1\), \(\beta>1\) (unimodal/bell-shaped): Forms a single peak inside the interval. If \(\alpha=\beta\), the curve is perfectly symmetrical around 0.5. As both numbers scale up, (e.g., \(\alpha=40\), \(\beta=40\)), the peak becomes narrower, signifying high certainty.
\(\alpha < 1\), \(\beta<1\) (U-shaped): The probability mass concentrates heavily near the extreme boundaries of 0 and 1, meaning the outcome is highly polarized.
\(\alpha >\beta\) (right-leaning): The distribution skews toward 1, meaning higher probabilities are expected.
\(\alpha <\beta\) (left-leaning): The distribution skews toward 0, meaning lower probabilities are expected.

Common applications

Bayesian inference: In Bayesian statistics, it serves as the conjugate prior for Bernoulli and Binomial likelihood distributions. This allows to seamlessly update past baseline assumptions with new data.
Project management: Used in Program Evaluation and Review Technique (PERT) to model task completion timelines. It accommodates variables with absolute minimum and maximum constraints.
A/B Testing: Extensively applied in tech and marketing data science to estimate click-through rates (CTR) and conversion percentages when comparing two features.

Discussion about this post

Ready for more?