In preparation for a session at useR!2012 on "What other languages should R users know about?", Dirk, Chris Fonnesbeck and I have considered implementations of this simple sampler in other languages. I describe a Julia implementation below. Full details on all of the implementations are available at Chris's github repository.
The task is to create a Gibbs sampler for the unscaled density
f(x,y) = x x^2 \exp(-xy^2 - y^2 + 2y - 4x)
using the conditional distributions x|y \sim Gamma(3, y^2 +4)
y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)})
Dirk's version of Darren's original R function isRgibbs <- function(N,thin) {
mat <- matrix(0,ncol=2,nrow=N)
x <- 0
y <- 0
for (i in 1:N) {
for (j in 1:thin) {
x <- rgamma(1,3,y*y+4)
y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1)))
}
mat[i,] <- c(x,y)
}
mat
}
RCgibbs <- cmpfun(Rgibbs)
examples
directory of the Rcpp package, Dirk provides an R
script using the inline, Rcpp
and RcppGSL packages to implement this sampler in C++
code callable from R
and time the results. On my desktop computer, timing 10 replications of Rgibbs(20000, 200)
and the other versions produces test replications elapsed relative user.self sys.self
4 GSLGibbs(N, thn) 10 8.338 1.000000 8.336 0.000
3 RcppGibbs(N, thn) 10 13.285 1.593308 13.285 0.000
2 RCgibbs(N, thn) 10 369.843 44.356320 369.327 0.032
1 Rgibbs(N, thn) 10 473.511 56.789518 472.754 0.044
C
code for R's d-p-q-r functions for probability densities, cumulative distribution, quantile and random sampling can be compiled into a separate Rmath library. These sources are included with the Julia sources and Julia functions with similar calling sequences are available as "extras/Rmath.jl"load("extras/Rmath.jl")
function JGibbs1(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = rgamma(1,3,(y*y + 4))[1]
y = rnorm(1, 1/(x+1),1/sqrt(2(x + 1)))[1]
end
mat[i,:] = [x,y]
end
mat
end
JGibbs1
is essentially the same code as Rgibbs
with minor adjustments for syntax. A similar timing on the same computer givesjulia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
27.748079776763916
julia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
27.782002687454224
Rgibbs
and 13 times faster than RCgibbs
. It's actually within a factor of 2 of the compiled code in RcppGibbs
.One of the big differences between this function and the compiled C++ function,
RcppGibbs
, is that the compiled function calls the underlying C
code for the samplers directly, avoiding the overhead of creating a vector of length 1 and indexing to get the first element. As these operations are done in the inner loop of JGibbs1
their overhead mounts up.Fortunately, Julia allows for calling a C function directly. You need the symbol from the dynamically loaded library, the signature of the function and the arguments.
It looks like
function JGibbs2(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = ccall(dlsym(_jl_libRmath, :rgamma),
Float64, (Float64, Float64), 3., (y*y + 4))
y = ccall(dlsym(_jl_libRmath, :rnorm),
Float64, (Float64, Float64), 1/(x+1), 1/sqrt(2(x + 1)))
end
mat[i,:] = [x,y]
end
mat
end
RcppGibbs
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
13.596416234970093
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
13.584651470184326
function JGibbs3(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = randg(3) * (y*y + 4)
y = 1/(x + 1) + randn()/sqrt(2(x + 1))
end
mat[i,:] = [x,y]
end
mat
end
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
6.603794574737549
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
6.58268928527832
RcppGibbs
(which is using slower samplers) and GSLGibbs
(faster samplers but not as fast as those in Julia) while writing code that looks very much like the original R function.But wait, there's more!
This computer has a 4-core processor (AMD Athlon(tm) II X4 635 Processor @ 2.9 GHz) and Julia can take advantage of that. When starting Julia we specify the number of processes
julia -p 4
dJGibbs3a
, leaves the result as a distributed array and the second, dJGibbs3b
, converts the result to a single array in the parent process.## Distributed versions - keeping the results as a distributed array
dJGibbs3a(N::Int, thin::Int) = darray((T,d,da)->JGibbs3(d[1],thin), Float64, (N, 2), 1)
## Converting the results to an array controlled by the parent process
dJGibbs3b(N::Int, thin::Int) = convert(Array{Float64,2}, dJGibbs3a(N, thin))
dJGibbs3a
is declared with the right-pointing arrow construction ->
. Its arguments are the type of the array, T
, the dimensions of the local chunk, d
, and the dimension on which the array is distributed, da
. Here we only use the number of rows, d[1]
, of the chunk to be generated.The timings,
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
1.6914057731628418
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
1.6724529266357422
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
2.2329299449920654
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
2.267037868499756
If you haven't looked into Julia before now, you owe it to yourself to do so.
I changed your code slightly and the run-time is reduced from 51 seconds to 8 seconds, both under enableJIT(3) on my 4-year old workstation:
ReplyDeleteRgibbs <- function(N,thin) {
mat <- matrix(0,ncol=2,nrow=N)
x <- 0
y <- 0
for (i in 1:N) {
rgamma.saved = rgamma(thin,3, 1)
rnorm.saved = rnorm(thin)
for (j in 1:thin) {
x <- rgamma.saved[j]/(y*y+4)
y <- (rnorm.saved[j] + 1/(x+1))/sqrt(2*(x+1))
}
mat[i,] <- c(x,y)
}
mat
}
enableJIT(3)
print( system.time( Rgibbs(20000, 200) ) )