1.1 Signatures of the d-p-q-r functions in libRmath
Users of R are familiar with the functions for evaluating properties of and for sampling from probability distributions, the so-called d-p-q-r functions. The designation d-p-q-r reflects the fact that, for each distribution, there are up to 4 such functions, each beginning with one of these letters and followed by an abbreviated name of the distribution. The prefixes indicate:
- d
- the density of a continuous random variable or the probability mass function of a discrete random variable
- p
- the cumulative probability function, also called the cumulative distribution function or c.d.f.
- q
- the quantile function, which is the inverse of the c.d.f. defined on the interval (0, 1)
- r
- random sampling from the distribution
Members of R-core, notably Martin Maechler, have devoted considerable
effort to ensuring that these functions are both reliable and
general. Because they are part of an Open Source system, they can be
used in other Open Source projects. In fact, there is the capability
to collect these functions in a stand-alone library. One of the R packages
for Debian Linux and distributions that are derived from it, such as
Ubuntu, is r-mathlib which contains this stand-alone library. The
header file for this library is $RHOME/include/Rmath.h
on most
systems (/usr/share/R/include/Rmath.h
on Debian and derived
distributions because of the rules for Debian packaging).
The names and arguments of these functions follow well-defined patterns. For example, those for the χ2 distribution have signatures
double dchisq(double, double, int); double pchisq(double, double, int, int); double qchisq(double, double, int, int); double rchisq(double);
The first argument of dchisq
is x
, the abscissa, of pchisq
is
q
, the quantile, and of qchisq
is p
, the probability. The next
argument of the d-p-q functions, and the only argument of rchisq
, is
the distribution parameter, df
, which is the degrees of freedom of
the χ2. The last argument in the d-p-q functions controls whether
the probability is on the logarithm scale. It is named give_log
for
the density function and log_p
for the probability and quantile
functions. (At first glance this argument may seem unnecessary as the
probability density, for example, could be calculated on the
probability scale then transformed to the logarithm scale. However,
operations that are mathematically equivalent do not always produce
the same result in floating point arithmetic. There are good reasons
for these cases.)
The second last argument in the p and q functions determines whether
probability is accumulated from the left, the lower tail, or from the
right, the upper tail. The default is the lower tail. Again, there
is an apparent redundancy because one can always subtract the
probability in the lower tail from 1 to get the probability in the
upper tail. However when x
is extremely small but positive, 1 - x
rounds to 1 and small upper-tail probabilities truncate too quickly.
(Remember Kernighan and Plauger's aphorism that "10 times 0.1 is
hardly ever 1" - that's what you live with in floating point
computations.)
1.2 Calling these functions from Julia
It is surprisingly easy from within Julia to call C functions like
these having simple signatures. First you call the Julia function
dlopen
to access the shared object file and save the handle
julia> _jl_libRmath = dlopen("libRmath")
Ptr{Void} @0x0000000003461050
I use an awkward, patterned name for this variable to avoid potential name conflicts. At present the Julia namespace is flat (but that is likely to change as namespaces are already being discussed).
Then a call to, say, pchisq
needs only
julia> ccall(dlsym(_jl_libRmath,:pchisq),Float64,(Float64,Float64,Int32,Int32), 1.96^2, 1, true, false) 0.950004209703559
The first argument to the ccall
function is the symbol extracted
with dlsym
, followed by the specific type of the function value, the
argument signatures and the actual arguments. The actual arguments
are converted to the signature shown before being passed to the C
function. Thus the true
is converted to 1 and the false
to 0.
The value of this function call, 0.95000, is what we would expect, because a χ2 on 1 degree of freedom is the square of a standard normal distribution and the interval [-1.960, 1.960] contains approximately 95% of the probability of the standard normal.
1.3 Creating a Julia function for scalar arguments
Obviously, calling such a C function from the Rmath
library is
simpler if we wrap the call in a Julia function. We can define such a
function as a one-liner (well, two lines but only because it would be
a very long single line)
pchisq(q::Number, df::Number, lower_tail::Bool, log_p::Bool) = ccall(dlsym(_jl_libRmath,:pchisq),Float64,(Float64,Float64,Int32,Int32), q, df, lower_tail, log_p)
This illustrates one of the methods of defining a function in Julia,
fname(arg1, arg2, ...) = expression
This form, which reads much like the mathematical description f(x) = value, is useful when the function body is a single expression.
This code defines a particular method, with a reasonably general
signature, for pchisq
. The C function pchisq
requires double
precision values (called Float64 in Julia) for the first two
arguments and integers for the last two. However, these integers are
treated as Boolean values, so we require the arguments passed to the Julia function to be Boolean.
The Number
type in Julia is an abstract type
incorporating all the integer and real number types (including complex and
rational numbers, as it turns out).
The Julia function, pchisq
, can have methods of many different
signatures. In a sense a function in Julia is just a name used to
index into a method table.
We could have allowed unrestricted arguments, as in
pchisq(q, df, lower_tail, log_p) = <something>
but then we would need to define checks on the argument types, to ensure that they are appropriate and to check for vector versus scalar arguments, in the function body. It is better to use the method signature to check for general forms of arguments and to distinguish the scalar and vector cases. (Vector methods are defined in the next section).
This is one of the organizing principles of Julia; it is built around functions but actually all functions are methods chosen via multiple dispatch. So thinking in terms of signatures right off the bat is a good idea.
1.4 Default argument values
In R we can match arguments by names in a function call and we can define default values for arguments that are not given a value. At present neither of these capabilities is available in Julia. However, we can provide defaults for trailing arguments by creating methods with reduced signatures
pchisq(q::Number, df::Number, lower_tail::Bool) = pchisq(q, df, lower_tail, false) pchisq(q::Number, df::Number) = pchisq(q, df, true, false)
The fact that the signature is recognized by position and not by using
named actual arguments means that whenever log_p
is true
we must
specify lower_tail
, even if its value is the default value, true
.
In the case of pchisq
the distribution parameter, df
, does not
have a default value. For many other distributions there are default
values of distribution parameters. For example, the distribution
parameters for the logistic distribution are location
and scale
with defaults 0 and 1. Having defined the 5-argument scalar version
of plogis
plogis(q::Number, l::Number, s::Number, lo::Bool, lg::Bool) = ccall(dlsym(_jl_libRmath,:plogis),Float64,(Float64,Float64,Float64,Int32,Int32), q, l, s, lo, lg)
(for brevity we have used the names l
, s
, lo
and lg
for the last four arguments)
we can define the methods with default values as
plogis(q::Number, l::Number, lo::Bool, lg::Bool) = plogis(q, l, 1, lo, lg) plogis(q::Number, lo::Bool, lg::Bool) = plogis(q, 0, 1, true, false) plogis(q::Number, l::Number, s::Number, lo::Bool) = plogis(q, l, s, lo, false) plogis(q::Number, l::Number, lo::Bool) = plogis(q, l, 1, lo, false) plogis(q::Number, lo::Bool) = plogis(q, 0, 1, lo, false) plogis(q::Number, l::Number, s::Number) = plogis(q, l, s, true, false) plogis(q::Number, l::Number) = plogis(q, l, 1, true, false) plogis(q::Number) = plogis(q, 0, 1, true, false)
Because a Bool
is not a Number
, a signature such as (Number, Bool)
can be distinguished from (Number, Number).
These method definitions allow for some combinations of default values but not all combinations. Having named arguments, possibly with default values, is still a more flexible system but these method signatures do handle the most important cases.
Defining all those methods can get tedious (not to mention providing
the possibility of many transcription errors) so it is worthwhile scripting the process using
the macro language for Julia. See the file extras/Rmath.jl
in the
Julia distribution for the macros that generate both these methods and
the vectorized methods described in the next section.
1.5 Vectorizing the function
In R there are no scalars, only vectors of length 1, so, in that
sense, all functions are defined for vector arguments. In addition, many
functions, including the d-p-q functions, are vectorized in the sense that they create a result of the
appropriate form from vector arguments whose length is greater
than 1. Thus providing a vector of quantiles to pchisq
will return
a vector of probabilities. We would want the same to be true for the Julia functions.
In Julia scalars are distinct from vectors and we use the method dispatch to distinguish scalar and vector cases. I think a general programming paradigm for Julia is first to define a function for scalar arguments and then build the vectorized version from that. However, I don't yet have enough experience to say if this really is a general paradigm.
We can write methods for a vector q
as
pchisq(q::Vector{Number}, df::Number, lower_tail::Bool, log_p::Bool) = [ pchisq(q[i], df, lower_tail, log_p) | i=1:length(q) ] pchisq(q::Vector{Number}, df::Number, lower_tail::Bool) = [ pchisq(q[i], df, lower_tail, false) | i=1:length(q) ] pchisq(q::Vector{Number}, df::Number) = [ pchisq(q[i], df, true, false) | i=1:length(q) ]
These methods use a "comprehension" to specify the loop over the individual elements of the array producing another array. They could, of course, be written in a more conventional looping notation but the comprehension notation is powerful and compact, similar to the "apply" family of functions in R, so we use it here. Like the one-line function definition, the comprehension reads much like a mathematical expression to create an array of values of a certain form for i = 1,…,n.
By the way, experienced R programmers, who know not to write
1:length(q)
because of unexpected results when length(q)
is zero,
need not worry. The sequence notation, a:b
doesn't count down in
Julia when a
is greater than b
so 1:0
has length zero.
That is, 1:length(q)
in Julia always produces the same answer as seq_along(q)
does in R.
We could also vectorize the df
argument which leads us to consider
the case of vector df
and scalar q
and the case of vector df
and
vector q
, etc. At this point we realize that we are seeing many
variations on a theme and decide that it is best to script this
vectorization. Fortunately the file base/operations.jl
has macros
_jl_vectorize_1arg
and _jl_vectorize_2arg
which we can adapt for
this purpose.
1.6 The result
As of this writing, loading the file extras/Rmath.jl
julia> load("../extras/Rmath.jl")
defines
julia> whos() R_pow Function _jl_libRmath Ptr{None} dbeta Function dbinom Function dcauchy Function dchisq Function dexp Function df Function dgamma Function dgeom Function dlnorm Function dlogis Function dnbinom Function dnchisq Function dnorm Function dpois Function dsignrank Function dt Function dunif Function dweibull Function dwilcox Function pbeta Function pbinom Function pcauchy Function pchisq Function pexp Function pf Function pgamma Function pgeom Function plnorm Function plogis Function pnbinom Function pnchisq Function pnorm Function ppois Function psignrank Function pt Function punif Function pweibull Function pwilcox Function qbeta Function qbinom Function qcauchy Function qchisq Function qexp Function qf Function qgamma Function qgeom Function qlnorm Function qlogis Function qnbinom Function qnchisq Function qnorm Function qpois Function qsignrank Function qt Function qunif Function qweibull Function qwilcox Function rchisq Function rgeom Function rpois Function rsignrank Function rt Function set_seed Function
and a function like plogis
provides many method signatures
julia> plogis Methods for generic function plogis plogis(Number,Number,Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:275 plogis(Number,Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:281 plogis(Number,Number,Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:277 plogis(Number,Bool,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:287 plogis(Number,Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:283 plogis(Number,Number,Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:279 plogis(Number,Bool) at /home/bates/build/julia/extras/../extras/Rmath.jl:289 plogis(Number,Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:285 plogis(Number,) at /home/bates/build/julia/extras/../extras/Rmath.jl:291 plogis{T1<:Number,T2<:Number}(T1<:Number,AbstractArray{T2<:Number,N}) at operators.jl:162 plogis{T1<:Number,T2<:Number}(AbstractArray{T1<:Number,N},T2<:Number) at operators.jl:165 plogis{T1<:Number,T2<:Number}(AbstractArray{T1<:Number,N},AbstractArray{T2<:Number,N}) at operators.jl:169 plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},T2<:Number,T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:7 plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,AbstractArray{T2<:Number,N},T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:10 plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,T2<:Number,AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:13 plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},AbstractArray{T2<:Number,N},T3<:Number) at /home/bates/build/julia/extras/../extras/Rmath.jl:16 plogis{T1<:Number,T2<:Number,T3<:Number}(T1<:Number,AbstractArray{T2<:Number,N},AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:20 plogis{T1<:Number,T2<:Number,T3<:Number}(AbstractArray{T1<:Number,N},T2<:Number,AbstractArray{T3<:Number,N}) at /home/bates/build/julia/extras/../extras/Rmath.jl:24
You will see that the vectorized methods have slightly more general signatures involving AbstractArrays of Numbers, which also handles cases such as matrix arguments.
The methods allow for calling plogis
in a way that is natural to R programmers
julia> plogis(-1:0.2:1, 0) [0.268941, 0.310026, 0.354344, 0.401312, 0.450166, 0.5, 0.549834, 0.598688, 0.645656, 0.689974, 0.731059] julia> plogis(-1:0.2:1, 0, 2) [0.377541, 0.401312, 0.425557, 0.450166, 0.475021, 0.5, 0.524979, 0.549834, 0.574443, 0.598688, 0.622459]
(In running that example I realized that I had omitted some cases from my macros. That will be repaired "in the fullness of time" as Bill Venables is wont to say.)