Package 'distributional'

Title: Vectorised Probability Distributions
Description: Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions.
Authors: Mitchell O'Hara-Wild [aut, cre] , Matthew Kay [aut] , Alex Hayes [aut] , Rob Hyndman [aut] , Earo Wang [ctb] , Vencislav Popov [ctb]
Maintainer: Mitchell O'Hara-Wild <[email protected]>
License: GPL-3
Version: 0.5.0.9000
Built: 2024-11-14 12:49:38 UTC
Source: https://github.com/mitchelloharawild/distributional

Help Index


The cumulative distribution function

Description

[Stable]

Usage

cdf(x, q, ..., log = FALSE)

## S3 method for class 'distribution'
cdf(x, q, ...)

Arguments

x

The distribution(s).

q

The quantile at which the cdf is calculated.

...

Additional arguments passed to methods.

log

If TRUE, probabilities will be given as log probabilities.


Covariance

Description

[Stable]

A generic function for computing the covariance of an object.

Usage

covariance(x, ...)

Arguments

x

An object.

...

Additional arguments used by methods.

See Also

covariance.distribution(), variance()


Covariance of a probability distribution

Description

[Stable]

Returns the empirical covariance of the probability distribution. If the method does not exist, the covariance of a random sample will be returned.

Usage

## S3 method for class 'distribution'
covariance(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.


The probability density/mass function

Description

[Stable]

Computes the probability density function for a continuous distribution, or the probability mass function for a discrete distribution.

Usage

## S3 method for class 'distribution'
density(x, at, ..., log = FALSE)

Arguments

x

The distribution(s).

at

The point at which to compute the density/mass.

...

Additional arguments passed to methods.

log

If TRUE, probabilities will be given as log probabilities.


The Bernoulli distribution

Description

[Stable]

Bernoulli distributions are used to represent events like coin flips when there is single trial that is either successful or unsuccessful. The Bernoulli distribution is a special case of the Binomial() distribution with n = 1.

Usage

dist_bernoulli(prob)

Arguments

prob

The probability of success on each trial, prob can be any value in ⁠[0, 1]⁠.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Bernoulli random variable with parameter p = pp. Some textbooks also define q=1pq = 1 - p, or use π\pi instead of pp.

The Bernoulli probability distribution is widely used to model binary variables, such as 'failure' and 'success'. The most typical example is the flip of a coin, when pp is thought as the probability of flipping a head, and q=1pq = 1 - p is the probability of flipping a tail.

Support: {0,1}\{0, 1\}

Mean: pp

Variance: p(1p)=pqp \cdot (1 - p) = p \cdot q

Probability mass function (p.m.f):

P(X=x)=px(1p)1x=pxq1xP(X = x) = p^x (1 - p)^{1-x} = p^x q^{1-x}

Cumulative distribution function (c.d.f):

P(Xx)={0x<01p0x<11x1P(X \le x) = \left \{ \begin{array}{ll} 0 & x < 0 \\ 1 - p & 0 \leq x < 1 \\ 1 & x \geq 1 \end{array} \right.

Moment generating function (m.g.f):

E(etX)=(1p)+petE(e^{tX}) = (1 - p) + p e^t

Examples

dist <- dist_bernoulli(prob = c(0.05, 0.5, 0.3, 0.9, 0.1))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Beta distribution

Description

[Stable]

Usage

dist_beta(shape1, shape2)

Arguments

shape1, shape2

The non-negative shape parameters of the Beta distribution.

See Also

stats::Beta

Examples

dist <- dist_beta(shape1 = c(0.5, 5, 1, 2, 2), shape2 = c(0.5, 1, 3, 2, 5))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Binomial distribution

Description

[Stable]

Binomial distributions are used to represent situations can that can be thought as the result of nn Bernoulli experiments (here the nn is defined as the size of the experiment). The classical example is nn independent coin flips, where each coin flip has probability p of success. In this case, the individual probability of flipping heads or tails is given by the Bernoulli(p) distribution, and the probability of having xx equal results (xx heads, for example), in nn trials is given by the Binomial(n, p) distribution. The equation of the Binomial distribution is directly derived from the equation of the Bernoulli distribution.

Usage

dist_binomial(size, prob)

Arguments

size

The number of trials. Must be an integer greater than or equal to one. When size = 1L, the Binomial distribution reduces to the Bernoulli distribution. Often called n in textbooks.

prob

The probability of success on each trial, prob can be any value in ⁠[0, 1]⁠.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

The Binomial distribution comes up when you are interested in the portion of people who do a thing. The Binomial distribution also comes up in the sign test, sometimes called the Binomial test (see stats::binom.test()), where you may need the Binomial C.D.F. to compute p-values.

In the following, let XX be a Binomial random variable with parameter size = nn and p = pp. Some textbooks define q=1pq = 1 - p, or called π\pi instead of pp.

Support: {0,1,2,...,n}\{0, 1, 2, ..., n\}

Mean: npnp

Variance: np(1p)=npqnp \cdot (1 - p) = np \cdot q

Probability mass function (p.m.f):

P(X=k)=(nk)pk(1p)nkP(X = k) = {n \choose k} p^k (1 - p)^{n-k}

Cumulative distribution function (c.d.f):

P(Xk)=i=0k(ni)pi(1p)niP(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n \choose i} p^i (1 - p)^{n-i}

Moment generating function (m.g.f):

E(etX)=(1p+pet)nE(e^{tX}) = (1 - p + p e^t)^n

Examples

dist <- dist_binomial(size = 1:5, prob = c(0.05, 0.5, 0.3, 0.9, 0.1))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Burr distribution

Description

[Stable]

Usage

dist_burr(shape1, shape2, rate = 1, scale = 1/rate)

Arguments

shape1, shape2, scale

parameters. Must be strictly positive.

rate

an alternative way to specify the scale.

See Also

actuar::Burr

Examples

dist <- dist_burr(shape1 = c(1,1,1,2,3,0.5), shape2 = c(1,2,3,1,1,2))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Categorical distribution

Description

[Stable]

Categorical distributions are used to represent events with multiple outcomes, such as what number appears on the roll of a dice. This is also referred to as the 'generalised Bernoulli' or 'multinoulli' distribution. The Cateogorical distribution is a special case of the Multinomial() distribution with n = 1.

Usage

dist_categorical(prob, outcomes = NULL)

Arguments

prob

A list of probabilities of observing each outcome category.

outcomes

The values used to represent each outcome.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Categorical random variable with probability parameters p = {p1,p2,,pk}\{p_1, p_2, \ldots, p_k\}.

The Categorical probability distribution is widely used to model the occurance of multiple events. A simple example is the roll of a dice, where p={1/6,1/6,1/6,1/6,1/6,1/6}p = \{1/6, 1/6, 1/6, 1/6, 1/6, 1/6\} giving equal chance of observing each number on a 6 sided dice.

Support: {1,,k}\{1, \ldots, k\}

Mean: pp

Variance: p(1p)=pqp \cdot (1 - p) = p \cdot q

Probability mass function (p.m.f):

P(X=i)=piP(X = i) = p_i

Cumulative distribution function (c.d.f):

The cdf() of a categorical distribution is undefined as the outcome categories aren't ordered.

Examples

dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)))

dist

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

# The outcomes aren't ordered, so many statistics are not applicable.
cdf(dist, 4)
quantile(dist, 0.7)
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

dist <- dist_categorical(
  prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)),
  outcomes = list(letters[1:5], letters[24:26])
)

generate(dist, 10)

density(dist, "a")
density(dist, "z", log = TRUE)

The Cauchy distribution

Description

[Stable]

The Cauchy distribution is the student's t distribution with one degree of freedom. The Cauchy distribution does not have a well defined mean or variance. Cauchy distributions often appear as priors in Bayesian contexts due to their heavy tails.

Usage

dist_cauchy(location, scale)

Arguments

location, scale

location and scale parameters.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Cauchy variable with mean ⁠location =⁠ x0x_0 and scale = γ\gamma.

Support: RR, the set of all real numbers

Mean: Undefined.

Variance: Undefined.

Probability density function (p.d.f):

f(x)=1πγ[1+(xx0γ)2]f(x) = \frac{1}{\pi \gamma \left[1 + \left(\frac{x - x_0}{\gamma} \right)^2 \right]}

Cumulative distribution function (c.d.f):

F(t)=1πarctan(tx0γ)+12F(t) = \frac{1}{\pi} \arctan \left( \frac{t - x_0}{\gamma} \right) + \frac{1}{2}

Moment generating function (m.g.f):

Does not exist.

See Also

stats::Cauchy

Examples

dist <- dist_cauchy(location = c(0, 0, 0, -2), scale = c(0.5, 1, 2, 1))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The (non-central) Chi-Squared Distribution

Description

[Stable]

Chi-square distributions show up often in frequentist settings as the sampling distribution of test statistics, especially in maximum likelihood estimation settings.

Usage

dist_chisq(df, ncp = 0)

Arguments

df

degrees of freedom (non-negative, but can be non-integer).

ncp

non-centrality parameter (non-negative).

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a χ2\chi^2 random variable with df = kk.

Support: R+R^+, the set of positive real numbers

Mean: kk

Variance: 2k2k

Probability density function (p.d.f):

f(x)=12πσ2e(xμ)2/2σ2f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2}

Cumulative distribution function (c.d.f):

The cumulative distribution function has the form

F(t)=t12πσ2e(xμ)2/2σ2dxF(t) = \int_{-\infty}^t \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2} dx

but this integral does not have a closed form solution and must be approximated numerically. The c.d.f. of a standard normal is sometimes called the "error function". The notation Φ(t)\Phi(t) also stands for the c.d.f. of a standard normal evaluated at tt. Z-tables list the value of Φ(t)\Phi(t) for various tt.

Moment generating function (m.g.f):

E(etX)=eμt+σ2t2/2E(e^{tX}) = e^{\mu t + \sigma^2 t^2 / 2}

See Also

stats::Chisquare

Examples

dist <- dist_chisq(df = c(1,2,3,4,6,9))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The degenerate distribution

Description

[Stable]

The degenerate distribution takes a single value which is certain to be observed. It takes a single parameter, which is the value that is observed by the distribution.

Usage

dist_degenerate(x)

Arguments

x

The value of the distribution.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a degenerate random variable with value x = k0k_0.

Support: RR, the set of all real numbers

Mean: k0k_0

Variance: 00

Probability density function (p.d.f):

f(x)=1forx=k0f(x) = 1 for x = k_0

f(x)=0forxk0f(x) = 0 for x \neq k_0

Cumulative distribution function (c.d.f):

The cumulative distribution function has the form

F(x)=0forx<k0F(x) = 0 for x < k_0

F(x)=1forxk0F(x) = 1 for x \ge k_0

Moment generating function (m.g.f):

E(etX)=ek0tE(e^{tX}) = e^{k_0 t}

Examples

dist_degenerate(x = 1:5)

The Exponential Distribution

Description

[Stable]

Usage

dist_exponential(rate)

Arguments

rate

vector of rates.

See Also

stats::Exponential

Examples

dist <- dist_exponential(rate = c(2, 1, 2/3))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The F Distribution

Description

[Stable]

Usage

dist_f(df1, df2, ncp = NULL)

Arguments

df1, df2

degrees of freedom. Inf is allowed.

ncp

non-centrality parameter. If omitted the central F is assumed.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Gamma random variable with parameters shape = α\alpha and rate = β\beta.

Support: x(0,)x \in (0, \infty)

Mean: αβ\frac{\alpha}{\beta}

Variance: αβ2\frac{\alpha}{\beta^2}

Probability density function (p.m.f):

f(x)=βαΓ(α)xα1eβxf(x) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}

Cumulative distribution function (c.d.f):

f(x)=Γ(α,βx)Γαf(x) = \frac{\Gamma(\alpha, \beta x)}{\Gamma{\alpha}}

Moment generating function (m.g.f):

E(etX)=(ββt)α,t<βE(e^{tX}) = \Big(\frac{\beta}{ \beta - t}\Big)^{\alpha}, \thinspace t < \beta

See Also

stats::FDist

Examples

dist <- dist_f(df1 = c(1,2,5,10,100), df2 = c(1,1,2,1,100))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Gamma distribution

Description

[Stable]

Several important distributions are special cases of the Gamma distribution. When the shape parameter is 1, the Gamma is an exponential distribution with parameter 1/β1/\beta. When the shape=n/2shape = n/2 and rate=1/2rate = 1/2, the Gamma is a equivalent to a chi squared distribution with n degrees of freedom. Moreover, if we have X1X_1 is Gamma(α1,β)Gamma(\alpha_1, \beta) and X2X_2 is Gamma(α2,β)Gamma(\alpha_2, \beta), a function of these two variables of the form X1X1+X2\frac{X_1}{X_1 + X_2} Beta(α1,α2)Beta(\alpha_1, \alpha_2). This last property frequently appears in another distributions, and it has extensively been used in multivariate methods. More about the Gamma distribution will be added soon.

Usage

dist_gamma(shape, rate, scale = 1/rate)

Arguments

shape, scale

shape and scale parameters. Must be positive, scale strictly.

rate

an alternative way to specify the scale.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Gamma random variable with parameters shape = α\alpha and rate = β\beta.

Support: x(0,)x \in (0, \infty)

Mean: αβ\frac{\alpha}{\beta}

Variance: αβ2\frac{\alpha}{\beta^2}

Probability density function (p.m.f):

f(x)=βαΓ(α)xα1eβxf(x) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}

Cumulative distribution function (c.d.f):

f(x)=Γ(α,βx)Γαf(x) = \frac{\Gamma(\alpha, \beta x)}{\Gamma{\alpha}}

Moment generating function (m.g.f):

E(etX)=(ββt)α,t<βE(e^{tX}) = \Big(\frac{\beta}{ \beta - t}\Big)^{\alpha}, \thinspace t < \beta

See Also

stats::GammaDist

Examples

dist <- dist_gamma(shape = c(1,2,3,5,9,7.5,0.5), rate = c(0.5,0.5,0.5,1,2,1,1))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Geometric Distribution

Description

[Stable]

The Geometric distribution can be thought of as a generalization of the dist_bernoulli() distribution where we ask: "if I keep flipping a coin with probability p of heads, what is the probability I need kk flips before I get my first heads?" The Geometric distribution is a special case of Negative Binomial distribution.

Usage

dist_geometric(prob)

Arguments

prob

probability of success in each trial. 0 < prob <= 1.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Geometric random variable with success probability p = pp. Note that there are multiple parameterizations of the Geometric distribution.

Support: 0 < p < 1, x=0,1,x = 0, 1, \dots

Mean: 1pp\frac{1-p}{p}

Variance: 1pp2\frac{1-p}{p^2}

Probability mass function (p.m.f):

P(X=x)=p(1p)x,P(X = x) = p(1-p)^x,

Cumulative distribution function (c.d.f):

P(Xx)=1(1p)x+1P(X \le x) = 1 - (1-p)^{x+1}

Moment generating function (m.g.f):

E(etX)=pet1(1p)etE(e^{tX}) = \frac{pe^t}{1 - (1-p)e^t}

See Also

stats::Geometric

Examples

dist <- dist_geometric(prob = c(0.2, 0.5, 0.8))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Generalized Extreme Value Distribution

Description

The GEV distribution function with parameters location=a\code{location} = a, scale=b\code{scale} = b and shape=s\code{shape} = s is

Usage

dist_gev(location, scale, shape)

Arguments

location

the location parameter aa of the GEV distribution.

scale

the scale parameter bb of the GEV distribution.

shape

the shape parameter ss of the GEV distribution.

Details

F(x)=exp[{1+s(xa)/b}1/s]F(x) = \exp\left[-\{1+s(x-a)/b\}^{-1/s}\right]

for 1+s(xa)/b>01+s(x-a)/b > 0, where b>0b > 0. If s=0s = 0 the distribution is defined by continuity, giving

F(x)=exp[exp(xab)]F(x) = \exp\left[-\exp\left(-\frac{x-a}{b}\right)\right]

The support of the distribution is the real line if s=0s = 0, xab/sx \geq a - b/s if s0s \neq 0, and xab/sx \leq a - b/s if s<0s < 0.

The parametric form of the GEV encompasses that of the Gumbel, Frechet and reverse Weibull distributions, which are obtained for s=0s = 0, s>0s > 0 and s<0s < 0 respectively. It was first introduced by Jenkinson (1955).

References

Jenkinson, A. F. (1955) The frequency distribution of the annual maximum (or minimum) of meteorological elements. Quart. J. R. Met. Soc., 81, 158–171.

See Also

gev

Examples

dist <- dist_gev(location = 0, scale = 1, shape = 0)

The generalised g-and-h Distribution

Description

[Stable]

The generalised g-and-h distribution is a flexible distribution used to model univariate data, similar to the g-k distribution. It is known for its ability to handle skewness and heavy-tailed behavior.

Usage

dist_gh(A, B, g, h, c = 0.8)

Arguments

A

Vector of A (location) parameters.

B

Vector of B (scale) parameters. Must be positive.

g

Vector of g parameters.

h

Vector of h parameters. Must be non-negative.

c

Vector of c parameters (used for generalised g-and-h). Often fixed at 0.8 which is the default.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a g-and-h random variable with parameters A, B, g, h, and c.

Support: (,)(-\infty, \infty)

Mean: Not available in closed form.

Variance: Not available in closed form.

Probability density function (p.d.f):

The g-and-h distribution does not have a closed-form expression for its density. Instead, it is defined through its quantile function:

Q(u)=A+B(1+c1exp(gz(u))1+exp(gz(u)))exp(hz(u)2/2)z(u)Q(u) = A + B \left( 1 + c \frac{1 - \exp(-gz(u))}{1 + \exp(-gz(u))} \right) \exp(h z(u)^2/2) z(u)

where z(u)=Φ1(u)z(u) = \Phi^{-1}(u)

Cumulative distribution function (c.d.f):

The cumulative distribution function is typically evaluated numerically due to the lack of a closed-form expression.

See Also

gk::dgh, dist_gk

Examples

dist <- dist_gh(A = 0, B = 1, g = 0, h = 0.5)
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The g-and-k Distribution

Description

[Stable]

The g-and-k distribution is a flexible distribution often used to model univariate data. It is particularly known for its ability to handle skewness and heavy-tailed behavior.

Usage

dist_gk(A, B, g, k, c = 0.8)

Arguments

A

Vector of A (location) parameters.

B

Vector of B (scale) parameters. Must be positive.

g

Vector of g parameters.

k

Vector of k parameters. Must be at least -0.5.

c

Vector of c parameters. Often fixed at 0.8 which is the default.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a g-k random variable with parameters A, B, g, k, and c.

Support: (,)(-\infty, \infty)

Mean: Not available in closed form.

Variance: Not available in closed form.

Probability density function (p.d.f):

The g-k distribution does not have a closed-form expression for its density. Instead, it is defined through its quantile function:

Q(u)=A+B(1+c1exp(gz(u))1+exp(gz(u)))(1+z(u)2)kz(u)Q(u) = A + B \left( 1 + c \frac{1 - \exp(-gz(u))}{1 + \exp(-gz(u))} \right) (1 + z(u)^2)^k z(u)

where z(u)=Φ1(u)z(u) = \Phi^{-1}(u), the standard normal quantile of u.

Cumulative distribution function (c.d.f):

The cumulative distribution function is typically evaluated numerically due to the lack of a closed-form expression.

See Also

gk::dgk, dist_gh

Examples

dist <- dist_gk(A = 0, B = 1, g = 0, k = 0.5)
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Generalized Pareto Distribution

Description

The GPD distribution function with parameters location=a\code{location} = a, scale=b\code{scale} = b and shape=s\code{shape} = s is

Usage

dist_gpd(location, scale, shape)

Arguments

location

the location parameter aa of the GPD distribution.

scale

the scale parameter bb of the GPD distribution.

shape

the shape parameter ss of the GPD distribution.

Details

F(x)=1(1+s(xa)/b)1/sF(x) = 1 - \left(1+s(x-a)/b\right)^{-1/s}

for 1+s(xa)/b>01+s(x-a)/b > 0, where b>0b > 0. If s=0s = 0 the distribution is defined by continuity, giving

F(x)=1exp(xab)F(x) = 1 - \exp\left(-\frac{x-a}{b}\right)

The support of the distribution is xax \geq a if s0s \geq 0, and axab/sa \leq x \leq a -b/s if s<0s < 0.

The Pickands–Balkema–De Haan theorem states that for a large class of distributions, the tail (above some threshold) can be approximated by a GPD.

See Also

gpd

Examples

dist <- dist_gpd(location = 0, scale = 1, shape = 0)

The Gumbel distribution

Description

[Stable]

The Gumbel distribution is a special case of the Generalized Extreme Value distribution, obtained when the GEV shape parameter ξ\xi is equal to 0. It may be referred to as a type I extreme value distribution.

Usage

dist_gumbel(alpha, scale)

Arguments

alpha

location parameter.

scale

parameter. Must be strictly positive.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Gumbel random variable with location parameter mu = μ\mu, scale parameter sigma = σ\sigma.

Support: RR, the set of all real numbers.

Mean: μ+σγ\mu + \sigma\gamma, where γ\gamma is Euler's constant, approximately equal to 0.57722.

Median: μσln(ln2)\mu - \sigma\ln(\ln 2).

Variance: σ2π2/6\sigma^2 \pi^2 / 6.

Probability density function (p.d.f):

f(x)=σ1exp[(xμ)/σ]exp{exp[(xμ)/σ]}f(x) = \sigma ^ {-1} \exp[-(x - \mu) / \sigma]% \exp\{-\exp[-(x - \mu) / \sigma] \}

for xx in RR, the set of all real numbers.

Cumulative distribution function (c.d.f):

In the ξ=0\xi = 0 (Gumbel) special case

F(x)=exp{exp[(xμ)/σ]}F(x) = \exp\{-\exp[-(x - \mu) / \sigma] \}

for xx in RR, the set of all real numbers.

See Also

actuar::Gumbel

Examples

dist <- dist_gumbel(alpha = c(0.5, 1, 1.5, 3), scale = c(2, 2, 3, 4))
dist


mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Hypergeometric distribution

Description

[Stable]

To understand the HyperGeometric distribution, consider a set of rr objects, of which mm are of the type I and nn are of the type II. A sample with size kk (k<rk<r) with no replacement is randomly chosen. The number of observed type I elements observed in this sample is set to be our random variable XX.

Usage

dist_hypergeometric(m, n, k)

Arguments

m

The number of type I elements available.

n

The number of type II elements available.

k

The size of the sample taken.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a HyperGeometric random variable with success probability p = p=m/(m+n)p = m/(m+n).

Support: x{max(0,kn),,min(k,m)}x \in { \{\max{(0, k-n)}, \dots, \min{(k,m)}}\}

Mean: kmn+m=kp\frac{km}{n+m} = kp

Variance: km(n)(n+mk)(n+m)2(n+m1)=kp(1p)(1k1m+n1)\frac{km(n)(n+m-k)}{(n+m)^2 (n+m-1)} = kp(1-p)(1 - \frac{k-1}{m+n-1})

Probability mass function (p.m.f):

P(X=x)=(mx)(nkx)(m+nk)P(X = x) = \frac{{m \choose x}{n \choose k-x}}{{m+n \choose k}}

Cumulative distribution function (c.d.f):

P(Xk)Φ(xkpkp(1p))P(X \le k) \approx \Phi\Big(\frac{x - kp}{\sqrt{kp(1-p)}}\Big)

See Also

stats::Hypergeometric

Examples

dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

Inflate a value of a probability distribution

Description

[Stable]

Usage

dist_inflated(dist, prob, x = 0)

Arguments

dist

The distribution(s) to inflate.

prob

The added probability of observing x.

x

The value to inflate. The default of x = 0 is for zero-inflation.


The Inverse Exponential distribution

Description

[Stable]

Usage

dist_inverse_exponential(rate)

Arguments

rate

an alternative way to specify the scale.

See Also

actuar::InverseExponential

Examples

dist <- dist_inverse_exponential(rate = 1:5)
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Inverse Gamma distribution

Description

[Stable]

Usage

dist_inverse_gamma(shape, rate = 1/scale, scale)

Arguments

shape, scale

parameters. Must be strictly positive.

rate

an alternative way to specify the scale.

See Also

actuar::InverseGamma

Examples

dist <- dist_inverse_gamma(shape = c(1,2,3,3), rate = c(1,1,1,2))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Inverse Gaussian distribution

Description

[Stable]

Usage

dist_inverse_gaussian(mean, shape)

Arguments

mean, shape

parameters. Must be strictly positive. Infinite values are supported.

See Also

actuar::InverseGaussian

Examples

dist <- dist_inverse_gaussian(mean = c(1,1,1,3,3), shape = c(0.2, 1, 3, 0.2, 1))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Logarithmic distribution

Description

[Stable]

Usage

dist_logarithmic(prob)

Arguments

prob

parameter. 0 <= prob < 1.

See Also

actuar::Logarithmic

Examples

dist <- dist_logarithmic(prob = c(0.33, 0.66, 0.99))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Logistic distribution

Description

[Stable]

A continuous distribution on the real line. For binary outcomes the model given by P(Y=1X)=F(Xβ)P(Y = 1 | X) = F(X \beta) where FF is the Logistic cdf() is called logistic regression.

Usage

dist_logistic(location, scale)

Arguments

location, scale

location and scale parameters.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Logistic random variable with location = μ\mu and scale = ss.

Support: RR, the set of all real numbers

Mean: μ\mu

Variance: s2π2/3s^2 \pi^2 / 3

Probability density function (p.d.f):

f(x)=e(xμs)s[1+exp((xμs))]2f(x) = \frac{e^{-(\frac{x - \mu}{s})}}{s [1 + \exp(-(\frac{x - \mu}{s})) ]^2}

Cumulative distribution function (c.d.f):

F(t)=11+e(tμs)F(t) = \frac{1}{1 + e^{-(\frac{t - \mu}{s})}}

Moment generating function (m.g.f):

E(etX)=eμtβ(1st,1+st)E(e^{tX}) = e^{\mu t} \beta(1 - st, 1 + st)

where β(x,y)\beta(x, y) is the Beta function.

See Also

stats::Logistic

Examples

dist <- dist_logistic(location = c(5,9,9,6,2), scale = c(2,3,4,2,1))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The log-normal distribution

Description

[Stable]

The log-normal distribution is a commonly used transformation of the Normal distribution. If XX follows a log-normal distribution, then lnX\ln{X} would be characteristed by a Normal distribution.

Usage

dist_lognormal(mu = 0, sigma = 1)

Arguments

mu

The mean (location parameter) of the distribution, which is the mean of the associated Normal distribution. Can be any real number.

sigma

The standard deviation (scale parameter) of the distribution. Can be any positive number.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let YY be a Normal random variable with mean mu = μ\mu and standard deviation sigma = σ\sigma. The log-normal distribution X=exp(Y)X = exp(Y) is characterised by:

Support: R+R+, the set of all real numbers greater than or equal to 0.

Mean: e(μ+σ2/2e^(\mu + \sigma^2/2

Variance: (e(σ2)1)e(2μ+σ2(e^(\sigma^2)-1) e^(2\mu + \sigma^2

Probability density function (p.d.f):

f(x)=1x2πσ2e(lnxμ)2/2σ2f(x) = \frac{1}{x\sqrt{2 \pi \sigma^2}} e^{-(\ln{x} - \mu)^2 / 2 \sigma^2}

Cumulative distribution function (c.d.f):

The cumulative distribution function has the form

F(x)=Φ((lnxμ)/σ)F(x) = \Phi((\ln{x} - \mu)/\sigma)

Where PhiPhi is the CDF of a standard Normal distribution, N(0,1).

See Also

stats::Lognormal

Examples

dist <- dist_lognormal(mu = 1:5, sigma = 0.1)

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

# A log-normal distribution X is exp(Y), where Y is a Normal distribution of
# the same parameters. So log(X) will produce the Normal distribution Y.
log(dist)

Missing distribution

Description

[Maturing]

A placeholder distribution for handling missing values in a vector of distributions.

Usage

dist_missing(length = 1)

Arguments

length

The number of missing distributions

Examples

dist <- dist_missing(3L)

dist
mean(dist)
variance(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

Create a mixture of distributions

Description

[Maturing]

Usage

dist_mixture(..., weights = numeric())

Arguments

...

Distributions to be used in the mixture.

weights

The weight of each distribution passed to ....

Examples

dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7))

The Multinomial distribution

Description

[Stable]

The multinomial distribution is a generalization of the binomial distribution to multiple categories. It is perhaps easiest to think that we first extend a dist_bernoulli() distribution to include more than two categories, resulting in a dist_categorical() distribution. We then extend repeat the Categorical experiment several (nn) times.

Usage

dist_multinomial(size, prob)

Arguments

size

The number of draws from the Categorical distribution.

prob

The probability of an event occurring from each draw.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let X=(X1,...,Xk)X = (X_1, ..., X_k) be a Multinomial random variable with success probability p = pp. Note that pp is vector with kk elements that sum to one. Assume that we repeat the Categorical experiment size = nn times.

Support: Each XiX_i is in 0,1,2,...,n{0, 1, 2, ..., n}.

Mean: The mean of XiX_i is npin p_i.

Variance: The variance of XiX_i is npi(1pi)n p_i (1 - p_i). For iji \neq j, the covariance of XiX_i and XjX_j is npipj-n p_i p_j.

Probability mass function (p.m.f):

P(X1=x1,...,Xk=xk)=n!x1!x2!...xk!p1x1p2x2...pkxkP(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! x_2! ... x_k!} p_1^{x_1} \cdot p_2^{x_2} \cdot ... \cdot p_k^{x_k}

Cumulative distribution function (c.d.f):

Omitted for multivariate random variables for the time being.

Moment generating function (m.g.f):

E(etX)=(i=1kpieti)nE(e^{tX}) = \left(\sum_{i=1}^k p_i e^{t_i}\right)^n

See Also

stats::Multinomial

Examples

dist <- dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))

dist
mean(dist)
variance(dist)

generate(dist, 10)

# TODO: Needs fixing to support multiple inputs
# density(dist, 2)
# density(dist, 2, log = TRUE)

The multivariate normal distribution

Description

[Stable]

Usage

dist_multivariate_normal(mu = 0, sigma = diag(1))

Arguments

mu

A list of numeric vectors for the distribution's mean.

sigma

A list of matrices for the distribution's variance-covariance matrix.

See Also

mvtnorm::dmvnorm, mvtnorm::qmvnorm

Examples

dist <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2)))
dimnames(dist) <- c("x", "y")
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, cbind(2, 1))
density(dist, cbind(2, 1), log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)
quantile(dist, 0.7, type = "marginal")

The Negative Binomial distribution

Description

[Stable]

A generalization of the geometric distribution. It is the number of failures in a sequence of i.i.d. Bernoulli trials before a specified number of successes (size) occur. The probability of success in each trial is given by prob.

Usage

dist_negative_binomial(size, prob)

Arguments

size

target for number of successful trials, or dispersion parameter (the shape parameter of the gamma mixing distribution). Must be strictly positive, need not be integer.

prob

probability of success in each trial. 0 < prob <= 1.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Negative Binomial random variable with success probability prob = pp and the number of successes size = rr.

Support: {0,1,2,3,...}\{0, 1, 2, 3, ...\}

Mean: pr1p\frac{p r}{1-p}

Variance: pr(1p)2\frac{pr}{(1-p)^2}

Probability mass function (p.m.f):

f(k)=(k+r1k)(1p)rpkf(k) = {k + r - 1 \choose k} \cdot (1-p)^r p^k

Cumulative distribution function (c.d.f):

Too nasty, omitted.

Moment generating function (m.g.f):

(1p1pet)r,t<logp\left(\frac{1-p}{1-pe^t}\right)^r, t < -\log p

See Also

stats::NegBinomial

Examples

dist <- dist_negative_binomial(size = 10, prob = 0.5)

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)
support(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Normal distribution

Description

[Stable]

The Normal distribution is ubiquitous in statistics, partially because of the central limit theorem, which states that sums of i.i.d. random variables eventually become Normal. Linear transformations of Normal random variables result in new random variables that are also Normal. If you are taking an intro stats course, you'll likely use the Normal distribution for Z-tests and in simple linear regression. Under regularity conditions, maximum likelihood estimators are asymptotically Normal. The Normal distribution is also called the gaussian distribution.

Usage

dist_normal(mu = 0, sigma = 1, mean = mu, sd = sigma)

Arguments

mu, mean

The mean (location parameter) of the distribution, which is also the mean of the distribution. Can be any real number.

sigma, sd

The standard deviation (scale parameter) of the distribution. Can be any positive number. If you would like a Normal distribution with variance σ2\sigma^2, be sure to take the square root, as this is a common source of errors.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Normal random variable with mean mu = μ\mu and standard deviation sigma = σ\sigma.

Support: RR, the set of all real numbers

Mean: μ\mu

Variance: σ2\sigma^2

Probability density function (p.d.f):

f(x)=12πσ2e(xμ)2/2σ2f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2}

Cumulative distribution function (c.d.f):

The cumulative distribution function has the form

F(t)=t12πσ2e(xμ)2/2σ2dxF(t) = \int_{-\infty}^t \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-(x - \mu)^2 / 2 \sigma^2} dx

but this integral does not have a closed form solution and must be approximated numerically. The c.d.f. of a standard Normal is sometimes called the "error function". The notation Φ(t)\Phi(t) also stands for the c.d.f. of a standard Normal evaluated at tt. Z-tables list the value of Φ(t)\Phi(t) for various tt.

Moment generating function (m.g.f):

E(etX)=eμt+σ2t2/2E(e^{tX}) = e^{\mu t + \sigma^2 t^2 / 2}

See Also

stats::Normal

Examples

dist <- dist_normal(mu = 1:5, sigma = 3)

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Pareto distribution

Description

[Stable]

Usage

dist_pareto(shape, scale)

Arguments

shape, scale

parameters. Must be strictly positive.

See Also

actuar::Pareto

Examples

dist <- dist_pareto(shape = c(10, 3, 2, 1), scale = rep(1, 4))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

Percentile distribution

Description

[Stable]

Usage

dist_percentile(x, percentile)

Arguments

x

A list of values

percentile

A list of percentiles

Examples

dist <- dist_normal()
percentiles <- seq(0.01, 0.99, by = 0.01)
x <- vapply(percentiles, quantile, double(1L), x = dist)
dist_percentile(list(x), list(percentiles*100))

The Poisson Distribution

Description

[Stable]

Poisson distributions are frequently used to model counts.

Usage

dist_poisson(lambda)

Arguments

lambda

vector of (non-negative) means.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Poisson random variable with parameter lambda = λ\lambda.

Support: {0,1,2,3,...}\{0, 1, 2, 3, ...\}

Mean: λ\lambda

Variance: λ\lambda

Probability mass function (p.m.f):

P(X=k)=λkeλk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}

Cumulative distribution function (c.d.f):

P(Xk)=eλi=0kλii!P(X \le k) = e^{-\lambda} \sum_{i = 0}^{\lfloor k \rfloor} \frac{\lambda^i}{i!}

Moment generating function (m.g.f):

E(etX)=eλ(et1)E(e^{tX}) = e^{\lambda (e^t - 1)}

See Also

stats::Poisson

Examples

dist <- dist_poisson(lambda = c(1, 4, 10))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Poisson-Inverse Gaussian distribution

Description

[Stable]

Usage

dist_poisson_inverse_gaussian(mean, shape)

Arguments

mean, shape

parameters. Must be strictly positive. Infinite values are supported.

See Also

actuar::PoissonInverseGaussian

Examples

dist <- dist_poisson_inverse_gaussian(mean = rep(0.1, 3), shape = c(0.4, 0.8, 1))
dist


mean(dist)
variance(dist)
support(dist)
generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

Sampling distribution

Description

[Stable]

Usage

dist_sample(x)

Arguments

x

A list of sampled values.

Examples

# Univariate numeric samples
dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10)))

dist
mean(dist)
variance(dist)
skewness(dist)
generate(dist, 10)

density(dist, 1)

# Multivariate numeric samples
dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10))))
dimnames(dist) <- c("x", "y")

dist
mean(dist)
variance(dist)
generate(dist, 10)
quantile(dist, 0.4) # Returns the marginal quantiles
cdf(dist, matrix(c(0.3,9), nrow = 1))

The (non-central) location-scale Student t Distribution

Description

[Stable]

The Student's T distribution is closely related to the Normal() distribution, but has heavier tails. As ν\nu increases to \infty, the Student's T converges to a Normal. The T distribution appears repeatedly throughout classic frequentist hypothesis testing when comparing group means.

Usage

dist_student_t(df, mu = 0, sigma = 1, ncp = NULL)

Arguments

df

degrees of freedom (>0> 0, maybe non-integer). df = Inf is allowed.

mu

The location parameter of the distribution. If ncp == 0 (or NULL), this is the median.

sigma

The scale parameter of the distribution.

ncp

non-centrality parameter δ\delta; currently except for rt(), only for abs(ncp) <= 37.62. If omitted, use the central t distribution.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a central Students T random variable with df = ν\nu.

Support: RR, the set of all real numbers

Mean: Undefined unless ν2\nu \ge 2, in which case the mean is zero.

Variance:

νν2\frac{\nu}{\nu - 2}

Undefined if ν<1\nu < 1, infinite when 1<ν21 < \nu \le 2.

Probability density function (p.d.f):

f(x)=Γ(ν+12)νπΓ(ν2)(1+x2ν)ν+12f(x) = \frac{\Gamma(\frac{\nu + 1}{2})}{\sqrt{\nu \pi} \Gamma(\frac{\nu}{2})} (1 + \frac{x^2}{\nu} )^{- \frac{\nu + 1}{2}}

See Also

stats::TDist

Examples

dist <- dist_student_t(df = c(1,2,5), mu = c(0,1,2), sigma = c(1,2,3))

dist
mean(dist)
variance(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Studentized Range distribution

Description

[Stable]

Tukey's studentized range distribution, used for Tukey's honestly significant differences test in ANOVA.

Usage

dist_studentized_range(nmeans, df, nranges)

Arguments

nmeans

sample size for range (same for each group).

df

degrees of freedom for ss (see below).

nranges

number of groups whose maximum range is considered.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

Support: R+R^+, the set of positive real numbers.

Other properties of Tukey's Studentized Range Distribution are omitted, largely because the distribution is not fun to work with.

See Also

stats::Tukey

Examples

dist <- dist_studentized_range(nmeans = c(6, 2), df = c(5, 4), nranges = c(1, 1))

dist

cdf(dist, 4)

quantile(dist, 0.7)

Modify a distribution with a transformation

Description

[Maturing]

The density(), mean(), and variance() methods are approximate as they are based on numerical derivatives.

Usage

dist_transformed(dist, transform, inverse)

Arguments

dist

A univariate distribution vector.

transform

A function used to transform the distribution. This transformation should be monotonic over appropriate domain.

inverse

The inverse of the transform function.

Examples

# Create a log normal distribution
dist <- dist_transformed(dist_normal(0, 0.5), exp, log)
density(dist, 1) # dlnorm(1, 0, 0.5)
cdf(dist, 4) # plnorm(4, 0, 0.5)
quantile(dist, 0.1) # qlnorm(0.1, 0, 0.5)
generate(dist, 10) # rlnorm(10, 0, 0.5)

Truncate a distribution

Description

[Stable]

Note that the samples are generated using inverse transform sampling, and the means and variances are estimated from samples.

Usage

dist_truncated(dist, lower = -Inf, upper = Inf)

Arguments

dist

The distribution(s) to truncate.

lower, upper

The range of values to keep from a distribution.

Examples

dist <- dist_truncated(dist_normal(2,1), lower = 0)

dist
mean(dist)
variance(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

if(requireNamespace("ggdist")) {
library(ggplot2)
ggplot() +
  ggdist::stat_dist_halfeye(
    aes(y = c("Normal", "Truncated"),
        dist = c(dist_normal(2,1), dist_truncated(dist_normal(2,1), lower = 0)))
  )
}

The Uniform distribution

Description

[Stable]

A distribution with constant density on an interval.

Usage

dist_uniform(min, max)

Arguments

min, max

lower and upper limits of the distribution. Must be finite.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Poisson random variable with parameter lambda = λ\lambda.

Support: [a,b][a,b]

Mean: 12(a+b)\frac{1}{2}(a+b)

Variance: 112(ba)2\frac{1}{12}(b-a)^2

Probability mass function (p.m.f):

f(x)=1baforx[a,b]f(x) = \frac{1}{b-a} for x \in [a,b]

f(x)=0otherwisef(x) = 0 otherwise

Cumulative distribution function (c.d.f):

F(x)=0forx<aF(x) = 0 for x < a

F(x)=xabaforx[a,b]F(x) = \frac{x - a}{b-a} for x \in [a,b]

F(x)=1forx>bF(x) = 1 for x > b

Moment generating function (m.g.f):

E(etX)=etbetat(ba)fort0E(e^{tX}) = \frac{e^{tb} - e^{ta}}{t(b-a)} for t \neq 0

E(etX)=1fort=0E(e^{tX}) = 1 for t = 0

See Also

stats::Uniform

Examples

dist <- dist_uniform(min = c(3, -2), max = c(5, 4))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

The Weibull distribution

Description

[Stable]

Generalization of the gamma distribution. Often used in survival and time-to-event analyses.

Usage

dist_weibull(shape, scale)

Arguments

shape, scale

shape and scale parameters, the latter defaulting to 1.

Details

We recommend reading this documentation on https://pkg.mitchelloharawild.com/distributional/, where the math will render nicely.

In the following, let XX be a Weibull random variable with success probability p = pp.

Support: R+R^+ and zero.

Mean: λΓ(1+1/k)\lambda \Gamma(1+1/k), where Γ\Gamma is the gamma function.

Variance: λ[Γ(1+2k)(Γ(1+1k))2]\lambda [ \Gamma (1 + \frac{2}{k} ) - (\Gamma(1+ \frac{1}{k}))^2 ]

Probability density function (p.d.f):

f(x)=kλ(xλ)k1e(x/λ)k,x0f(x) = \frac{k}{\lambda}(\frac{x}{\lambda})^{k-1}e^{-(x/\lambda)^k}, x \ge 0

Cumulative distribution function (c.d.f):

F(x)=1e(x/λ)k,x0F(x) = 1 - e^{-(x/\lambda)^k}, x \ge 0

Moment generating function (m.g.f):

n=0tnλnn!Γ(1+n/k),k1\sum_{n=0}^\infty \frac{t^n\lambda^n}{n!} \Gamma(1+n/k), k \ge 1

See Also

stats::Weibull

Examples

dist <- dist_weibull(shape = c(0.5, 1, 1.5, 5), scale = rep(1, 4))

dist
mean(dist)
variance(dist)
skewness(dist)
kurtosis(dist)

generate(dist, 10)

density(dist, 2)
density(dist, 2, log = TRUE)

cdf(dist, 4)

quantile(dist, 0.7)

Create a distribution from p/d/q/r style functions

Description

[Maturing]

If a distribution is not yet supported, you can vectorise p/d/q/r functions using this function. dist_wrap() stores the distributions parameters, and provides wrappers which call the appropriate p/d/q/r functions.

Using this function to wrap a distribution should only be done if the distribution is not yet available in this package. If you need a distribution which isn't in the package yet, consider making a request at https://github.com/mitchelloharawild/distributional/issues.

Usage

dist_wrap(dist, ..., package = NULL)

Arguments

dist

The name of the distribution used in the functions (name that is prefixed by p/d/q/r)

...

Named arguments used to parameterise the distribution.

package

The package from which the distribution is provided. If NULL, the calling environment's search path is used to find the distribution functions. Alternatively, an arbitrary environment can also be provided here.

Examples

dist <- dist_wrap("norm", mean = 1:3, sd = c(3, 9, 2))

density(dist, 1) # dnorm()
cdf(dist, 4) # pnorm()
quantile(dist, 0.975) # qnorm()
generate(dist, 10) # rnorm()

library(actuar)
dist <- dist_wrap("invparalogis", package = "actuar", shape = 2, rate = 2)
density(dist, 1) # actuar::dinvparalogis()
cdf(dist, 4) # actuar::pinvparalogis()
quantile(dist, 0.975) # actuar::qinvparalogis()
generate(dist, 10) # actuar::rinvparalogis()

Extract the name of the distribution family

Description

[Experimental]

Usage

## S3 method for class 'distribution'
family(object, ...)

Arguments

object

The distribution(s).

...

Additional arguments used by methods.

Examples

dist <- c(
  dist_normal(1:2),
  dist_poisson(3),
  dist_multinomial(size = c(4, 3),
  prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
  )
family(dist)

Randomly sample values from a distribution

Description

[Stable]

Generate random samples from probability distributions.

Usage

## S3 method for class 'distribution'
generate(x, times, ...)

Arguments

x

The distribution(s).

times

The number of samples.

...

Additional arguments used by methods.


Compute highest density regions

Description

Used to extract a specified prediction interval at a particular confidence level from a distribution.

Usage

hdr(x, ...)

Arguments

x

Object to create hilo from.

...

Additional arguments used by methods.


Highest density regions of probability distributions

Description

[Maturing]

This function is highly experimental and will change in the future. In particular, improved functionality for object classes and visualisation tools will be added in a future release.

Computes minimally sized probability intervals highest density regions.

Usage

## S3 method for class 'distribution'
hdr(x, size = 95, n = 512, ...)

Arguments

x

The distribution(s).

size

The size of the interval (between 0 and 100).

n

The resolution used to estimate the distribution's density.

...

Additional arguments used by methods.


Compute intervals

Description

[Stable]

Used to extract a specified prediction interval at a particular confidence level from a distribution.

The numeric lower and upper bounds can be extracted from the interval using ⁠<hilo>$lower⁠ and ⁠<hilo>$upper⁠ as shown in the examples below.

Usage

hilo(x, ...)

Arguments

x

Object to create hilo from.

...

Additional arguments used by methods.

Examples

# 95% interval from a standard normal distribution
interval <- hilo(dist_normal(0, 1), 95)
interval

# Extract the individual quantities with `$lower`, `$upper`, and `$level`
interval$lower
interval$upper
interval$level

Probability intervals of a probability distribution

Description

[Stable]

Returns a hilo central probability interval with probability coverage of size. By default, the distribution's quantile() will be used to compute the lower and upper bound for a centered interval

Usage

## S3 method for class 'distribution'
hilo(x, size = 95, ...)

Arguments

x

The distribution(s).

size

The size of the interval (between 0 and 100).

...

Additional arguments used by methods.

See Also

hdr.distribution()


Test if the object is a distribution

Description

[Stable]

This function returns TRUE for distributions and FALSE for all other objects.

Usage

is_distribution(x)

Arguments

x

An object.

Value

TRUE if the object inherits from the distribution class.

Examples

dist <- dist_normal()
is_distribution(dist)
is_distribution("distributional")

Is the object a hdr

Description

Is the object a hdr

Usage

is_hdr(x)

Arguments

x

An object.


Is the object a hilo

Description

Is the object a hilo

Usage

is_hilo(x)

Arguments

x

An object.


Kurtosis of a probability distribution

Description

[Stable]

Usage

kurtosis(x, ...)

## S3 method for class 'distribution'
kurtosis(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.


The (log) likelihood of a sample matching a distribution

Description

[Stable]

Usage

likelihood(x, ...)

## S3 method for class 'distribution'
likelihood(x, sample, ..., log = FALSE)

log_likelihood(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.

sample

A list of sampled values to compare to distribution(s).

log

If TRUE, the log-likelihood will be computed.


Mean of a probability distribution

Description

[Stable]

Returns the empirical mean of the probability distribution. If the method does not exist, the mean of a random sample will be returned.

Usage

## S3 method for class 'distribution'
mean(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.


Median of a probability distribution

Description

[Stable]

Returns the median (50th percentile) of a probability distribution. This is equivalent to quantile(x, p=0.5).

Usage

## S3 method for class 'distribution'
median(x, na.rm = FALSE, ...)

Arguments

x

The distribution(s).

na.rm

Unused, included for consistency with the generic function.

...

Additional arguments used by methods.


Create a new distribution

Description

[Maturing]

Allows extension package developers to define a new distribution class compatible with the distributional package.

Usage

new_dist(..., class = NULL, dimnames = NULL)

Arguments

...

Parameters of the distribution (named).

class

The class of the distribution for S3 dispatch.

dimnames

The names of the variables in the distribution (optional).


Construct hdr intervals

Description

Construct hdr intervals

Usage

new_hdr(
  lower = list_of(.ptype = double()),
  upper = list_of(.ptype = double()),
  size = double()
)

Arguments

lower, upper

A list of numeric vectors specifying the region's lower and upper bounds.

size

A numeric vector specifying the coverage size of the region.

Value

A "hdr" vector

Author(s)

Mitchell O'Hara-Wild

Examples

new_hdr(lower = list(1, c(3,6)), upper = list(10, c(5, 8)), size = c(80, 95))

Construct hilo intervals

Description

[Stable]

Class constructor function to help with manually creating hilo interval objects.

Usage

new_hilo(lower = double(), upper = double(), size = double())

Arguments

lower, upper

A numeric vector of values for lower and upper limits.

size

Size of the interval between [0, 100].

Value

A "hilo" vector

Author(s)

Earo Wang & Mitchell O'Hara-Wild

Examples

new_hilo(lower = rnorm(10), upper = rnorm(10) + 5, size = 95)

Create a new support region vector

Description

Create a new support region vector

Usage

new_support_region(x = numeric(), limits = list(), closed = list())

Arguments

x

A list of prototype vectors defining the distribution type.

limits

A list of value limits for the distribution.

closed

A list of logical(2L) indicating whether the limits are closed.


Extract the parameters of a distribution

Description

[Experimental]

Usage

parameters(x, ...)

## S3 method for class 'distribution'
parameters(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.

Examples

dist <- c(
  dist_normal(1:2),
  dist_poisson(3),
  dist_multinomial(size = c(4, 3),
  prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
  )
parameters(dist)

Distribution Quantiles

Description

[Stable]

Computes the quantiles of a distribution.

Usage

## S3 method for class 'distribution'
quantile(x, p, ..., log = FALSE)

Arguments

x

The distribution(s).

p

The probability of the quantile.

...

Additional arguments passed to methods.

log

If TRUE, probabilities will be given as log probabilities.


Skewness of a probability distribution

Description

[Stable]

Usage

skewness(x, ...)

## S3 method for class 'distribution'
skewness(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.


Region of support of a distribution

Description

[Experimental]

Usage

support(x, ...)

## S3 method for class 'distribution'
support(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.


Variance

Description

[Stable]

A generic function for computing the variance of an object.

Usage

variance(x, ...)

## S3 method for class 'numeric'
variance(x, ...)

## S3 method for class 'matrix'
variance(x, ...)

## S3 method for class 'numeric'
covariance(x, ...)

Arguments

x

An object.

...

Additional arguments used by methods.

Details

The implementation of variance() for numeric variables coerces the input to a vector then uses stats::var() to compute the variance. This means that, unlike stats::var(), if variance() is passed a matrix or a 2-dimensional array, it will still return the variance (stats::var() returns the covariance matrix in that case).

See Also

variance.distribution(), covariance()


Variance of a probability distribution

Description

[Stable]

Returns the empirical variance of the probability distribution. If the method does not exist, the variance of a random sample will be returned.

Usage

## S3 method for class 'distribution'
variance(x, ...)

Arguments

x

The distribution(s).

...

Additional arguments used by methods.