# Nested loops with mapply

December 31, 2012 7 Comments

So as I sink deeper into the second level of R enlightenment, one thing troubled me. “lapply” is fine for looping over a single vector of elements, but it doesn’t do a nested loop structure. These tend to be pretty ubiquitous for me. I’m forever doing the same thing to a set of two or three different variables. “apply ” smells like a logical candidate, but it will really only allow to you to do the same operation over a set of vectors. Meh. “tapply” is more of the same, but applies over a “ragged” array. But “mapply” fits the bill. As it turns out, using mapply is incredibly easy. I found that the trickiest thing to implement is the logic to create a set of all possible combinations over which I want to loop.

Let’s look at that first. Say that you have three variables. To keep things simple, each one is a two-dimensional character vector as below.

a = c("A", "B") b = c("L", "M") c = c("X", "Y")

I poked around for a function that would easily render the Cartesian product of those three vectors. Interaction seemed like a natural choice, but it seems as though it wants to work with factors and my first attempts to use it returned an error which had something to do with the number of elements. Diagnosing errors in R can be a Kafka-esque adventure and you have to choose your battles. I decided to look elsewhere. An easy way to do that is to handle it manually if you only have two vectors. Just replicate each, order one of them and bind the results together, sort of like this:

var1 = rep(a, length(b)) var1 = var1[order(var1)] var2 = rep(b, length(a)) df = data.frame(a = var1, b = var2)

The ordering step is necessary so that all combinations are represented. So, this is fine for two variables, but won’t work for three or more. Extension of the idea above is straightforward. After two variables, you have a matrix and you simply need to replicate it, just as you would a vector. I coded a function that would take two arguments. The first is a matrix (or a vector) and the second is the next vector we want to reflect.

CartProduct = function(CurrentMatrix, NewElement) { if (length(dim(NewElement)) != 0 ) { warning("New vector has more than one dimension.") return (NULL) } if (length(dim(CurrentMatrix)) == 0) { CurrentRows = length(CurrentMatrix) CurrentMatrix = as.matrix(CurrentMatrix, nrow = CurrentRows, ncol = 1) } else { CurrentRows = nrow(CurrentMatrix) } var1 = replicate(length(NewElement), CurrentMatrix, simplify=F) var1 = do.call("rbind", var1) var2 = rep(NewElement, CurrentRows) var2 = matrix(var2[order(var2)], nrow = length(var2), ncol = 1) CartProduct = cbind(var1, var2) return (CartProduct) }

Note that using rep or replicate with a character matrix may not give you the results you intended. rep converts a matrix into a one-dimensional array. So, I coerce results into matrices and replicate using a list structure, rather than the simplified result from replicate.

So. Nested loops. At this point, it’s easy.

someFunction = function(a, b, c) { aList = list(a = toupper(a), b = tolower(b), c = c) return (aList) } mojo = CartProduct(a, b) mojo = CartProduct(mojo,c) aList = mapply(someFunction, mojo[,1], mojo[,2], mojo[,3], SIMPLIFY = F)

Compare this with the following:

for (a in 1:length(a)) { for (b in 1:length(b)) { for (c in 1:length(c)) { aListElement = someFunction(a, b, c) } } }

Ugh. Note that you can’t do things like check for critical values or whatnot. But for execution over many categories this will spare me a bit of sanity.

I didn’t think it was all that unobvious– at least for “sapply” . I’ve written functions which call sapply(sapply(sapply(stuff,…),stuff…),stuff) pretty regularly.

Although I find that a novel use of sapply, I don’t think it would work in this case. The nesting of sapply which you illustrate presumes that the output of a function may also serve as the input. The specific problem which I had was rapid application of a function which pulls NFL results for a single season for a single team. Given a vector of seasons and a vector of teams, I wanted to get a dataframe which had results for all teams and all seasons. Having done that, I thought about how to apply it generally to problems where I was simply running the same function against combinations of many variables.

FYI, an even easier way to form the cartesian product: expand.grid(a, b, c)

you should check out expand.grid. It is in base and works well for this.

One of the trickiest things about R is that it’s so hard to identify solutions that you presume must be out there. I had done some searching on the word “cartesian” but only turned up some ggplot2 functions. No way in the world would I have guessed that expand.grid was what I was looking for. Great tip!

How about something like:

whatever.summary <- ddply(.data=whatever, .variables=c("a", "b", "c"), some.function)

The ply functions are ones that I’m not yet familiar with. I expect I’ll be looking into them eventually. Thanks for the suggestion!