Sunday, March 11, 2012

An RcppEigen example

R is an Open Source project providing an interactive language and environment for statistical computing.  It has become the lingua franca for research in statistical methods.  Because R is an interpreted language it is comparatively easy to develop applications (called packages) for it.  However, the resulting code can sometimes be slow, which is a problem in compute-bound methods like Markov chain Monte Carlo

From its inception R, and its predecessor S, have allowed calls to compiled code written in C or Fortran but such programming is not for the faint-hearted.  There are many ways in which you can trip yourself up.  Over the past several years Dirk Eddelbuettel and Romain Francois (there should be a cedilla under the c but I don't know how to create one) developed a package called Rcpp that provides C++ classes and methods for encapsulating R objects.  I recently started using these in the lme4 package for mixed-effects models that I develop with Martin Maechler and Ben Bolker.

High-performance numerical linear algebra is particularly important in the mixed-effects models calculations which use both dense and sparse matrices.  I ran across a wonderful linear algebra system called Eigen that is a templated library of C++ classes and methods and, with Dirk and Romain, wrote the RcppEigen package to interface with R.

This posting is to show an example of code that can be made to run much faster using RcppEigen than in the original R code.

The example, from Dongjun Chung, requires sampling from a collection of multinomial random variables as part of an iterative estimation method for parameter estimation in his R package for the analysis of ChIP-sequencing data.  There are generally a small number of categories - 5 is typical - and a relatively large number (say 10,000) instances that are available as a 10000 by 5 matrix of non-negative elements whose rows sum to 1.  At each iteration a sample of size 10000 consisting of indices in the range 1 to 5 is to be generated from the current matrix of probabilities.  Dongjun wrote an R function for this

Rsamp <- function(X) {
  stopifnot(is.numeric(X <- as.matrix(X)),
            (nc <- ncol(X)) > 1L,
            all(X >= 0))
  apply(X, 1L, function(x) sample(nc, size=1L, replace=FALSE, prob=x+1e-10))
}
which is careful R code (e.g. using apply instead of running a loop) but, even so, this function is the bottleneck in the method.

A method using RcppEigen requires a similar R function

RcppSamp <- function(X) {
  stopifnot(is.numeric(X <- as.matrix(X)),
            (nc <- ncol(X)) > 1L,
            all(X >= 0))
  .Call(CppSamp, X)
}

and a C++ function

SEXP CppSamp(SEXP X_) {
  typedef Eigen::ArrayXd   Ar1;
  typedef Eigen::ArrayXXd  Ar2;
  typedef Eigen::Map<Ar2> MAr2;

  const MAr2 X(as<MAr2>(X_));
  int m(X.rows()), n(X.cols()), nm1(n - 1);
  Ar1     samp(m);
  RNGScope sc; // Handle GetRNGstate()/PutRNGstate()
  for (int i=0; i < m; ++i) {
    Ar1 ri(X.row(i));
    std::partial_sum(ri.data(), ri.data() + n, ri.data())
    ri /= ri[nm1];    // normalize to sum to 1
    samp[i] = (ri < ::unif_rand()).count() + 1;
  }
  return wrap(samp);
}

The general idea is that the Eigen::ArrayXd class is a one-dimensional array of doubles and the Eigen::ArrayXXd class is a two-dimensional array of doubles.  Operations on array classes are component-wise operations or reductions.  There are corresponding classes Eigen::VectorXd and Eigen::MatrixXd that provide linear algebra operations.  A Eigen::Map of another structure has the corresponding structure but takes a pointer to the storage instead of allocating its own storage.  One of the idioms of writing with RcppEigen is to create const Eigen::Map<klass> objects from the input arguments, thus avoiding unnecessary copying.  The as templated function and the wrap function are part of RcppEigen that generalizes the methods in Rcpp for converting back and forth between the R objects and the C++ classes.

Here the approach is to find the cumulative sums in each row, normalize these sums by dividing by the last element and comparing them to a draw from the standard uniform distribution.  The number of elements less than the uniform variate is the 0-based index for the result and we add 1 to get the 1-based index.

Creating the RcppEigen code is more work than the pure R code but not orders of magnitude more work.  You can operate on entire arrays as shown here and, best of all, you don't need to worry about protecting storage from the garbage collector.  And the RcppEigen code is much faster

> set.seed(1234321)
> X <- matrix(runif(100000 * 5), ncol=5)
> benchmark(Rsamp(X), RcppSamp(X), replications=5,
+           columns=c("test", "elapsed", "relative", "user.self"))
         test elapsed relative user.self
2 RcppSamp(X)   0.058        1      0.06
1    Rsamp(X)   5.162       89      5.12

Update: I have corrected the spelling of Dongjun Chung's name.  My apologies for mis-spelling it.


Update: The next posting discusses a Julia implementation of this function.  An initial version was somewhat slower than the RcppEigen version but then Stefan Karpinsky got to work and created an implementation that was over twice as fast as the RcppEigen version.  In what may seem like a bizarre approach to an R programmer, the trick is to de-vectorize the code.

7 comments:

  1. I don't get why the probabilities are `x+1e-10` and not just `x`, but I assume it's some floating-point roundoff thing. Could you say more about that? And it doesn't seem to apply the same technique in the Rcpp code, is that right?

    ReplyDelete
    Replies
    1. Indeed, I should have explained that. I was copying Dongjin's code and he believed that the sample function in R did not allow probabilities of zero. However, I just checked and apparently it does allow them

      > sample(5, 1, prob=c(0.1, 0, 0.3, 0.4, 0.2))
      [1] 4

      so I can simplify the R version of the code.

      Delete
  2. Douglas, could you post your full working example? I would love to play with it a little.

    ReplyDelete
  3. I posted a package at http://www.stat.wisc.edu/~bates/Chung_0.1-1.tar.gz

    You need to have RcppEigen-0.2.0, which has just been added to CRAN, installed.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Sorry, for the deleted post. I had an error in the original post. Here is a correct (and better) version:

    After playing around with the example I came up with a much faster pure R version (although still slower than with Rcpp). The speedup is mainly from the vectorized operations instead of individual function calls for each row. In other words: better loop over the "short" dimension and treat the other one in vector ops. On top of that, row-wise loop over a matrix that is stored in column-major fashion is suboptimal, too, but in this case probably negligible.

    Rsamp2<-function(X) {
       stopifnot(is.numeric(X <- as.matrix(X)),
          (nc <- ncol(X)) > 1L,
          all(X >= 0))
       R<- runif(nrow(X))
       I<-rep(1L, nrow(X))
       for(i in 1:(ncol(X)-1)) {
          R<-R-X[,i]
          I<-I+(R>=0)
       }
       I
    }

    set.seed(1234321)
    X <- matrix(runif(100000 * 5), ncol=5)
    benchmark(Rsamp(X), Rsamp2(X), RcppSamp(X), replications=10, columns=c("test", "elapsed", "relative", "user.self"))
    test elapsed relative user.self
    3 RcppSamp(X) 0.134 1.000000 0.132
    1 Rsamp(X) 8.315 62.052239 8.288
    2 Rsamp2(X) 0.262 1.955224 0.260

    ReplyDelete