Hamming distance in R using Rcpp

After all my posts on efficiently computing the Hamming distance in R, two questions remain for me. First, can we compute the Hamming distance even faster when implementing it in a low-level programming language such as C/C++, and then use the great Rcpp package to directly call this low-level C/C++ function from R? Second, how much would parallelization help? I will answer the second question in a separate post, and address the first question here.

I implemented two C/C++ functions for computing the Hamming distance between all pairs of columns of a binary matrix X:

  1. A straightforward approach that I called “hamming__C_simple”: this function simply iterates through all combinations of columns of a matrix X to compare all elements between each pair of columns.
  2. A more involved approach that I called “hamming__C_bitwise”: This function exploits the ability one has in C to directly manipulate individual bits in memory, which should typically be very fast. The function first interprets the R binary vectors as sequences of unsigned integers, to then perform bitwise operations on these integers.

Note that focusing on binary matrices does not limit the applicability of the C/C++ functions, because multi-value Hamming distances can be easily computed by adding multiple binary Hamming distances.

The code for both implementations is shown below.


#include "Rcpp.h"
#include "math.h"
#include "stdlib.h"
#include "stdio.h"
#include "math.h"
#include "algorithm"
#include "climits"
using namespace Rcpp;

unsigned long long binvec2int(IntegerMatrix x) {
unsigned long long u = 0;
int n = x.nrow();
for (int i = 0; i < n; i++) {
u += x(i, 0) * (unsigned long long) pow(2, n - 1 - i);
}
return u;
}

// [[Rcpp::export]]
NumericMatrix hamming__C_bitwise(IntegerMatrix X) {
int i, j, k, from, to, h;
int stepsize = sizeof(unsigned long long) * CHAR_BIT;
int n = ceil(((double) X.nrow()) / stepsize);
IntegerMatrix x;
NumericMatrix H(X.ncol(), X.ncol());
// convert X to unsigned long long
unsigned long long *xx =
(unsigned long long *) malloc(n * X.ncol() * sizeof(unsigned long long));
i = 0;
while ((from = i * stepsize) < X.nrow()) {
for (j = 0; j < X.ncol(); j++) {
to = std::min(X.nrow(), (i + 1) * stepsize);
x = X(Range(from, to - 1), Range(j, j));
xx[i + j * n] = binvec2int(x);
}
i++;
}

for (i = 0; i < X.ncol(); i++) {
for (j = i + 1; j < X.ncol(); j++) {
for (k = 0; k < n; k++) {
h = __builtin_popcountll(xx[k + i * n] ^ xx[k + j * n]);
H(i, j) += h;
H(j, i) += h;
}
}
}
free(xx);
return H;
}

// [[Rcpp::export]]
NumericMatrix hamming__C_simple(IntegerMatrix X) {
int i, j, k, h;
NumericMatrix H(X.ncol(), X.ncol());
for (i = 0; i < X.ncol(); i++) {
for (j = i + 1; j < X.ncol(); j++) {
for (k = 0; k < n; k++) {
h = (X(k, i) != X(k, j));
H(i, j) += h;
H(j, i) += h;
}
}
}
return H;
}

I compared the performance of the two C/C++ functions to a very fast pure R function based on matrix multiplication that I wrote about before:

performance_C
C++ vs. pure R in computing the Hamming distance.

Interestingly, we can see that the straightforward C++ implementation (C_simple) is not faster than the pure R function (pure_R). In other words, efficient use of pure R can easily be faster than fairly regular use of C++ within R. We can see that if we actually want to be faster than the pure R function, one way of doing that is exploiting C’s bitwise operations (C_bitwise).