In preparation for a session at useR!2012 on "What other languages should R users know about?", Dirk, Chris Fonnesbeck and I have considered implementations of this simple sampler in other languages. I describe a Julia implementation below. Full details on all of the implementations are available at Chris's github repository.

The task is to create a Gibbs sampler for the unscaled density

```
f(x,y) = x x^2 \exp(-xy^2 - y^2 + 2y - 4x)
```

using the conditional distributions```
x|y \sim Gamma(3, y^2 +4)
y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)})
```

Dirk's version of Darren's original R function is```
Rgibbs <- function(N,thin) {
mat <- matrix(0,ncol=2,nrow=N)
x <- 0
y <- 0
for (i in 1:N) {
for (j in 1:thin) {
x <- rgamma(1,3,y*y+4)
y <- rnorm(1,1/(x+1),1/sqrt(2*(x+1)))
}
mat[i,] <- c(x,y)
}
mat
}
```

```
RCgibbs <- cmpfun(Rgibbs)
```

`examples`

directory of the Rcpp package, Dirk provides an `R`

script using the inline, `Rcpp`

and RcppGSL packages to implement this sampler in `C++`

code callable from `R`

and time the results. On my desktop computer, timing 10 replications of `Rgibbs(20000, 200)`

and the other versions produces```
test replications elapsed relative user.self sys.self
4 GSLGibbs(N, thn) 10 8.338 1.000000 8.336 0.000
3 RcppGibbs(N, thn) 10 13.285 1.593308 13.285 0.000
2 RCgibbs(N, thn) 10 369.843 44.356320 369.327 0.032
1 Rgibbs(N, thn) 10 473.511 56.789518 472.754 0.044
```

`C`

code for R's d-p-q-r functions for probability densities, cumulative distribution, quantile and random sampling can be compiled into a separate Rmath library. These sources are included with the Julia sources and Julia functions with similar calling sequences are available as "extras/Rmath.jl"```
load("extras/Rmath.jl")
function JGibbs1(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = rgamma(1,3,(y*y + 4))[1]
y = rnorm(1, 1/(x+1),1/sqrt(2(x + 1)))[1]
end
mat[i,:] = [x,y]
end
mat
end
```

`JGibbs1`

is essentially the same code as `Rgibbs`

with minor adjustments for syntax. A similar timing on the same computer gives```
julia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
27.748079776763916
julia> sum([@elapsed JGibbs1(20000, 200) for i=1:10])
27.782002687454224
```

`Rgibbs`

and 13 times faster than `RCgibbs`

. It's actually within a factor of 2 of the compiled code in `RcppGibbs`

.One of the big differences between this function and the compiled C++ function,

`RcppGibbs`

, is that the compiled function calls the underlying `C`

code for the samplers directly, avoiding the overhead of creating a vector of length 1 and indexing to get the first element. As these operations are done in the inner loop of `JGibbs1`

their overhead mounts up.Fortunately, Julia allows for calling a C function directly. You need the symbol from the dynamically loaded library, the signature of the function and the arguments.

It looks like

```
function JGibbs2(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = ccall(dlsym(_jl_libRmath, :rgamma),
Float64, (Float64, Float64), 3., (y*y + 4))
y = ccall(dlsym(_jl_libRmath, :rnorm),
Float64, (Float64, Float64), 1/(x+1), 1/sqrt(2(x + 1)))
end
mat[i,:] = [x,y]
end
mat
end
```

`RcppGibbs`

```
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
13.596416234970093
julia> sum([@elapsed JGibbs2(20000, 200) for i=1:10])
13.584651470184326
```

```
function JGibbs3(N::Int, thin::Int)
mat = Array(Float64, (N, 2))
x = 0.
y = 0.
for i = 1:N
for j = 1:thin
x = randg(3) * (y*y + 4)
y = 1/(x + 1) + randn()/sqrt(2(x + 1))
end
mat[i,:] = [x,y]
end
mat
end
```

```
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
6.603794574737549
julia> sum([@elapsed JGibbs3(20000, 200) for i=1:10])
6.58268928527832
```

`RcppGibbs`

(which is using slower samplers) and `GSLGibbs`

(faster samplers but not as fast as those in Julia) while writing code that looks very much like the original R function.**But wait, there's more!**

This computer has a 4-core processor (AMD Athlon(tm) II X4 635 Processor @ 2.9 GHz) and Julia can take advantage of that. When starting Julia we specify the number of processes

```
julia -p 4
```

`dJGibbs3a`

, leaves the result as a distributed array and the second, `dJGibbs3b`

, converts the result to a single array in the parent process.```
## Distributed versions - keeping the results as a distributed array
dJGibbs3a(N::Int, thin::Int) = darray((T,d,da)->JGibbs3(d[1],thin), Float64, (N, 2), 1)
## Converting the results to an array controlled by the parent process
dJGibbs3b(N::Int, thin::Int) = convert(Array{Float64,2}, dJGibbs3a(N, thin))
```

`dJGibbs3a`

is declared with the right-pointing arrow construction `->`

. Its arguments are the type of the array, `T`

, the dimensions of the local chunk, `d`

, and the dimension on which the array is distributed, `da`

. Here we only use the number of rows, `d[1]`

, of the chunk to be generated.The timings,

```
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
1.6914057731628418
julia> sum([@elapsed dJGibbs3a(20000, 200) for i=1:10])
1.6724529266357422
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
2.2329299449920654
julia> sum([@elapsed dJGibbs3b(20000, 200) for i=1:10])
2.267037868499756
```

If you haven't looked into Julia before now, you owe it to yourself to do so.