Arranging Arrays

HOME We often have to deal with multidimensional data, which generally has to be squeezed into a 2-D format for tables and spreadsheets and then latter reconstituted. Whenever I have to do that, I need to rediscover how to do it. So here's a tutorial for my future self which might be useful for others.

Here's a simple example: we have 3 sites, visited 4 times per year for 2 years. This is usually shoehorned into a table with 8 columns for the visits, like this:

site 1.1 1.2 1.3 1.4 2.1 2.2 2.3 2.4
   A   0   1   1   3   2   0   1   1
   B   1   0   1   0   3   0   2   2
   C   5   2   1   1   3   4   2   1

These look like counts, but the data could be detection/nondetection (0/1) data, wind speed at each visit, or the name of the observer.

An example to play with

Here's the code to produce the data above:

colNames <- as.vector(t(outer(1:2, 1:4, paste, sep=".")))
siteNames <- c("A", "B", "C")
dat1 <- matrix(rpois(24, 1.5), 3)
dimnames(dat1) <- list(site = siteNames, occasion = colNames)

But instead of that, I want to play with a matrix of character strings, where each string tells us what it is. Here it is:

mat1 <- outer(siteNames, colNames, paste0)
dimnames(mat1) <- list(site = siteNames, occasion = colNames)
site 1.1    1.2    1.3    1.4    2.1    2.2    2.3    2.4   
   A "A1.1" "A1.2" "A1.3" "A1.4" "A2.1" "A2.2" "A2.3" "A2.4"
   B "B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4"
   C "C1.1" "C1.2" "C1.3" "C1.4" "C2.1" "C2.2" "C2.3" "C2.4"

A comment on dimnames

I think it's really useful to name matrices and arrays. As you can see above, you can name the dimensions (here "site" and "occasion") as well as the rows and columns. The names generally survive when you summarise (with colSums for instance) or select subsets.

mat1[, 1:4]
site 1.1    1.2    1.3    1.4   
   A "A1.1" "A1.2" "A1.3" "A1.4"
   B "B1.1" "B1.2" "B1.3" "B1.4"
   C "C1.1" "C1.2" "C1.3" "C1.4"
mat1[2, ]
   1.1    1.2    1.3    1.4    2.1    2.2    2.3    2.4 
"B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4" 

The last example has lost its row name. That's because it doesn't have a row, it's not a matrix. By default, R drops unused dimensions, so a single row or a single column becomes an ordinary vector. To prevent this happening, add the argument drop = FALSE to the call:

dim(mat1[2, ])    # no dimensions, it's not a matrix
dim(mat1[2, , drop = FALSE])    # now okay
[1] 1 8
mat1[2, , drop = FALSE]    # now okay
site 1.1    1.2    1.3    1.4    2.1    2.2    2.3    2.4   
   B "B1.1" "B1.2" "B1.3" "B1.4" "B2.1" "B2.2" "B2.3" "B2.4"

That's an important argument to add when extracting rows, and getting a vector when you are expecting a matrix would trip you up.

Thinking about multi-dimensional arrays

Visualising a 2-D array - a matrix or table - is easy, we do it all the time. I like to think of a 3-D array as a collection of pages, each with rows and columns; the pages form the 3rd dimension. You can bundle pages into books and put several books on a shelf: that's 4-D array. Several shelves form a bookcase (5-D). A row of bookcases along a wall (6-D). Rooms with bookcases along a corridor (7-D)... on different floors (8-D)... in different wings of the library (9-D). That should be enough!

It's important to realise though that R stores even 9-D arrays as a single sequence of values, together with an attribute that indicates how the sequence should be cut up. The first few values belong to the 1st column on the 1st page of the 1st book...; next comes the 2nd column on the 1st page, and each of the columns in turn until the 1st page has been dealt with. Then comes the 1st column on the 2nd page, and so on.

It's very easy to change the dimensions attribute, a bit more difficult to rearrange the values in the sequence.

Convert our simple matrix to a 3-D array

So mat1 is stored as a single sequence of values. Let's look at the structure:

 chr [1:3, 1:8] "A1.1" "B1.1" "C1.1" "A1.2" "B1.2" "C1.2" "A1.3" "B1.3" ...
 - attr(*, "dimnames")=List of 2
  ..$ site    : chr [1:3] "A" "B" "C"
  ..$ occasion: chr [1:8] "1.1" "1.2" "1.3" "1.4" ...

We want an array with sites (rows) x visits (columns) x years (pages). So it should fill in all three rows of the first four columns, then move to a second page.

( arr1 <- array(mat1, c(3, 4, 2)) )
, , 1

     [,1]   [,2]   [,3]   [,4]  
[1,] "A1.1" "A1.2" "A1.3" "A1.4"
[2,] "B1.1" "B1.2" "B1.3" "B1.4"
[3,] "C1.1" "C1.2" "C1.3" "C1.4"

, , 2

     [,1]   [,2]   [,3]   [,4]  
[1,] "A2.1" "A2.2" "A2.3" "A2.4"
[2,] "B2.1" "B2.2" "B2.3" "B2.4"
[3,] "C2.1" "C2.2" "C2.3" "C2.4"

Notice that we don't need to tell R to "unwrap" mat1, it will do that automatically. But we have lost the names; our old dimnames attribute won't fit the new array. We'll do it again, and then we'll check by pulling out one slice from each dimension.

dimnames(arr1) <- list(site = siteNames, visit=1:4, year=1:2)
, , year = 1

site 1      2      3      4     
   A "A1.1" "A1.2" "A1.3" "A1.4"
   B "B1.1" "B1.2" "B1.3" "B1.4"
   C "C1.1" "C1.2" "C1.3" "C1.4"

, , year = 2

site 1      2      3      4     
   A "A2.1" "A2.2" "A2.3" "A2.4"
   B "B2.1" "B2.2" "B2.3" "B2.4"
   C "C2.1" "C2.2" "C2.3" "C2.4"

arr1[3,,] # site 3
visit 1      2     
    1 "C1.1" "C2.1"
    2 "C1.2" "C2.2"
    3 "C1.3" "C2.3"
    4 "C1.4" "C2.4"
arr1[,2,] # visit 2
site 1      2     
   A "A1.2" "A2.2"
   B "B1.2" "B2.2"
   C "C1.2" "C2.2"
arr1[,,1] # year 1
site 1      2      3      4     
   A "A1.1" "A1.2" "A1.3" "A1.4"
   B "B1.1" "B1.2" "B1.3" "B1.4"
   C "C1.1" "C1.2" "C1.3" "C1.4"

A more complicated example

Now an example with four dimensions. Our data set has sites, visits, and years, as before, and now just 2 species ("a" and "b"). In the data file, there is a row for each site and each species, grouped by species.

rowNames <- as.vector(t(outer(c("a", "b"), siteNames, paste0)))
mat2 <- outer(rowNames, colNames, paste0)
dimnames(mat2) <- list(rowNames, colNames)
   1.1     1.2     1.3     1.4     2.1     2.2     2.3     2.4    
aA "aA1.1" "aA1.2" "aA1.3" "aA1.4" "aA2.1" "aA2.2" "aA2.3" "aA2.4"
aB "aB1.1" "aB1.2" "aB1.3" "aB1.4" "aB2.1" "aB2.2" "aB2.3" "aB2.4"
aC "aC1.1" "aC1.2" "aC1.3" "aC1.4" "aC2.1" "aC2.2" "aC2.3" "aC2.4"
bA "bA1.1" "bA1.2" "bA1.3" "bA1.4" "bA2.1" "bA2.2" "bA2.3" "bA2.4"
bB "bB1.1" "bB1.2" "bB1.3" "bB1.4" "bB2.1" "bB2.2" "bB2.3" "bB2.4"
bC "bC1.1" "bC1.2" "bC1.3" "bC1.4" "bC2.1" "bC2.2" "bC2.3" "bC2.4"
 [1] "aA1.1" "aB1.1" "aC1.1" "bA1.1" "bB1.1" "bC1.1" "aA1.2" "aB1.2" "aC1.2" "bA1.2" "bB1.2"
[12] "bC1.2" "aA1.3" "aB1.3" "aC1.3" "bA1.3" "bB1.3" "bC1.3" "aA1.4" "aB1.4" "aC1.4" "bA1.4"
[23] "bB1.4" "bC1.4" "aA2.1" "aB2.1" "aC2.1" "bA2.1" "bB2.1" "bC2.1" "aA2.2" "aB2.2" "aC2.2"
[34] "bA2.2" "bB2.2" "bC2.2" "aA2.3" "aB2.3" "aC2.3" "bA2.3" "bB2.3" "bC2.3" "aA2.4" "aB2.4"
[45] "aC2.4" "bA2.4" "bB2.4" "bC2.4"

I've also displayed the sequence of values that we will have to organise into an array. This won't work as nicely as last time, and we have no choice about the structure of the initial array; we'll change it later.

The first 3 values are the 3 sites with species "a" and the first visit in the first year; these will go in column 1, with a row for each site. The next 3 are the 3 sites with species "b" and that goes into the 2nd column, so species are in columns. Now we come to the second visit in year 1, and this should go on the second page; visits will be pages. After 4 pages, we have included all the data for the first year, and we start a new book for the second year.

So the data will be read into a 4-D array with 3 rows (sites) x 2 columns (species) x 4 pages (visits) x 2 books (years).

As a rule of thumb, see which dimensions move fastest (compare with a clock with second, minute and hour hands): the sites "rotate" once for each species, the set of species rotates once for each visit, the set of visits rotates once for each year.

Or think of it as nesting: for the rows the sites are nested within species, and in the columns visits are nested within years. As usual in R, rows come first,  so again it's sites, then species, then visits, then years.

This is how we do it.

arr2 <- array(mat2, c(3, 2, 4, 2))
dimnames(arr2) <- list(site = siteNames,
                     species = c("a", "b"),
                     visit = paste0("v", 1:4),
                     year = paste0("y", 1:2))
, , visit = v1, year = y1

site a       b      
   A "aA1.1" "bA1.1"
   B "aB1.1" "bB1.1"
   C "aC1.1" "bC1.1"

, , visit = v2, year = y1

site a       b      
   A "aA1.2" "bA1.2"
   B "aB1.2" "bB1.2"
   C "aC1.2" "bC1.2"

, , visit = v3, year = y1

site a       b      
   A "aA1.3" "bA1.3"
   B "aB1.3" "bB1.3"
   C "aC1.3" "bC1.3"

, , visit = v4, year = y1

site a       b      
   A "aA1.4" "bA1.4"
   B "aB1.4" "bB1.4"
   C "aC1.4" "bC1.4"

, , visit = v1, year = y2

site a       b      
   A "aA2.1" "bA2.1"
   B "aB2.1" "bB2.1"
   C "aC2.1" "bC2.1"

, , visit = v2, year = y2

site a       b      
   A "aA2.2" "bA2.2"
   B "aB2.2" "bB2.2"
   C "aC2.2" "bC2.2"

, , visit = v3, year = y2

site a       b      
   A "aA2.3" "bA2.3"
   B "aB2.3" "bB2.3"
   C "aC2.3" "bC2.3"

, , visit = v4, year = y2

site a       b      
   A "aA2.4" "bA2.4"
   B "aB2.4" "bB2.4"
   C "aC2.4" "bC2.4"

Well, that's very nice, but it isn't the format we want: we'll use the aperm() function to permute the dimensions.

The order of the dimensions in the current array, arr2, is 1.sites, 2.species, 3.visits, 4.years, and we want sites (currently #1) x visits (#3) x years (#4) x species (#2). So we need to enter perm = c(1, 3, 4, 2):

( arr3 <- aperm(arr2, c(1,3,4,2)) )
, , year = y1, species = a

site v1      v2      v3      v4     
   A "aA1.1" "aA1.2" "aA1.3" "aA1.4"
   B "aB1.1" "aB1.2" "aB1.3" "aB1.4"
   C "aC1.1" "aC1.2" "aC1.3" "aC1.4"

, , year = y2, species = a

site v1      v2      v3      v4     
   A "aA2.1" "aA2.2" "aA2.3" "aA2.4"
   B "aB2.1" "aB2.2" "aB2.3" "aB2.4"
   C "aC2.1" "aC2.2" "aC2.3" "aC2.4"

, , year = y1, species = b

site v1      v2      v3      v4     
   A "bA1.1" "bA1.2" "bA1.3" "bA1.4"
   B "bB1.1" "bB1.2" "bB1.3" "bB1.4"
   C "bC1.1" "bC1.2" "bC1.3" "bC1.4"

, , year = y2, species = b

site v1      v2      v3      v4     
   A "bA2.1" "bA2.2" "bA2.3" "bA2.4"
   B "bB2.1" "bB2.2" "bB2.3" "bB2.4"
   C "bC2.1" "bC2.2" "bC2.3" "bC2.4"

An example for you...

If you want to experiment more with this, try turning the arr3 array back into a 2-D matrix, but this time with the rows grouped by site (instead of by species) and the occasions groups by visit (instead of year). The result should look something like this:

"aA1.1" "aA2.1" "aA1.2" "aA2.2" "aA1.3" "aA2.3" "aA1.4" "aA2.4"
"bA1.1" "bA2.1" "bA1.2" "bA2.2" "bA1.3" "bA2.3" "bA1.4" "bA2.4"
"aB1.1" "aB2.1" "aB1.2" "aB2.2" "aB1.3" "aB2.3" "aB1.4" "aB2.4"
"bB1.1" "bB2.1" "bB1.2" "bB2.2" "bB1.3" "bB2.3" "bB1.4" "bB2.4"
"aC1.1" "aC2.1" "aC1.2" "aC2.2" "aC1.3" "aC2.3" "aC1.4" "aC2.4"
"bC1.1" "bC2.1" "bC1.2" "bC2.2" "bC1.3" "bC2.3" "bC1.4" "bC2.4"

Hint: Use aperm() to get a matrix with the values in the right rows/columns/pages/books, then change the dimensions with array() or matrix().

Updated 26 Feb 2017 by Mike Meredith