For each row we take the cumulative sums and, for safety, normalize by dividing by the last element then compare these values to a random draw from a standard uniform distribution. The number of elements in the cumulative sums that are less than the uniform draw is the 0-based index of the result. We add 1 to convert to a 1-based index.
I have been experimenting a bit with a very interesting new language called Julia and decided to write a similar function in it. The version shown here has been updated according to suggestions from Jeff Bezanson, Stefan Karpinski and Viral Shah on the julia-dev list at Google Groups
function samp1(x::Array{Float64, 2},)
cs = cumsum(reshape(x, length(x)))
sum(cs/cs[end] < rand()) + 1
end
function samp(X::Array{Float64, 2},)
if any(X < 0)
error("Negative probabilities not allowed")
end
[samp1(X[i,:]) | i = 1:size(X,1)]
end
This version is about 10 times as fast as the
pure R version but about 9 times
slower than
the RcppEigen
version.Update:
In the thread on the julia-dev list about this example Stefan Karpinski showed that in Julia you can enhance performance by de-vectorizing your code and came up with the following version which is much faster than the RcppEigen version
function SKsamp(X::Matrix{Float64})
if any(X < 0)
error("Negative probabilities not allowed")
end
s = Array(Int, size(X,1))
for i = 1:size(X,1)
r = rand()
for j = 1:size(X,2)
r -= X[i,j]
if r <= 0.0
s[i] = j
break
end
end
end
return s
end
To a longtime R/S programmer the concept of de-vectorizing your code
seems heretical but I can understand that code created by
a JIT
will be happier with the looping and break style.In any case, I think this example shows that R programmers should take a look at Julia. Two immediate applications I can imagine are McMC methods and large scale iterative fits such as Generalized linear models
No comments:
Post a Comment