Archives for posts with tag: R

Recently, I’ve been playing with patterns of drought in paleo records of streamflow. One of the earliest and most helpful tools I’ve developed, identifies and characterizes droughts in extremely long time series using R. I’m still hacking my way through it, but this is what has been cooking thus far…

For this example I’ll be using the Meko et al. 2007 tree-ring reconstruction of the Colorado River’s annual natural flow at Lees Ferry, AZ (extending from A.D. 762-2005). You can check out the data yourself. I’ll be loading it in an R example below.

The heart of this relies on FindDrought, a function that returns a simple binary mask to determine whether a particular year is a “drought year” or not. Here, I’m defining drought as years of below-mean flow, broken by two or more years of above average flow.

FindDrought <- function(x, averageflow=mean(x)) {
  # Determines whether a given time period is in a drought, given a record.
  #
  # Args:
  #   x: A vector of annual stream flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  #
  # Returns:
  #   A vector indicating whether drought present (TRUE) or not present (FALSE).
  foo <- rep(TRUE, length(x))
  foo[x >= averageflow] <- FALSE
  runs <- rle(foo)
  runs$values[runs$lengths < 2 & runs$values == FALSE] <- TRUE
  return(inverse.rle(runs))
}

Having defined what years are and are not droughts, we can then use this information in the DefineDrought. This function loops through and collects basic characteristics about each drought event, returning each drought’s start year, stop year, duration, and “intensity” (the sum of anomalies).

DefineDrought <- function(x, y, averageflow=mean(y)) {
  # Determines basic characteristics of droughts in a given flow time series.
  #
  # Args:
  #   x: Year of time series observation, in a vector format.
  #   y: Time series or vector of river flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  #
  # Returns:
  #   A data frame with columns giving:
  #   1) The starting period of a drought - "start".
  #   2) The period a drought stops - "stop".
  #   3) The number of periods that the drought was present - "duration".
  #   4) The sum of anomalies for the drought period - "anom.sum".
  drought.present <- FindDrought(y, averageflow = averageflow)
  start <- c()
  stop <- c()
  duration <- c()
  anom.sum <- c()
  temp.start <- FALSE
  temp.duration <- 0
  temp.anom.sum <- 0
  # There is likely a more elegant way to handle this aside form loops.
  for (obs in 1:length(y)) {
    if (drought.present[obs] == FALSE) {
      temp.start <- FALSE
      temp.duration <- 0
      temp.anom.sum <- 0
    }
    else if (drought.present[obs] == TRUE) {
      if (!temp.start)
        temp.start <- x[obs]
      temp.duration <- temp.duration + 1
      temp.anom.sum <- temp.anom.sum + (y[obs] - averageflow)
      if ((drought.present[obs + 1] == FALSE) | (obs + 1 > length(y))) {
        start <- append(start, temp.start)
        stop <- append(stop, x[obs])
        duration <- append(duration, temp.duration)
        anom.sum <- append(anom.sum, temp.anom.sum)
      }
    }
  }
  return(data.frame(start, stop, duration, anom.sum))
}

If this tugs at your bobber, you can put this to use in a simple interactive session:

lees <- read.table('ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/reconstructions/northamerica/usa/upper-colorado-flow2007.txt',
  header = TRUE,
  skip = 158,  # To skip the meta data.
  na.strings = 'NULL')

# Drum-roll, please.
lees.droughts <- DefineDrought(x = lees$Year, y = lees$RecMAF)

Here, DefineDrought returns a simple dataframe with the structure:

'data.frame':	169 obs. of  4 variables:
 $ start   : int  762 770 774 779 786 790 797 805 812 822 ...
 $ stop    : int  767 771 774 781 786 794 799 809 819 830 ...
 $ duration: num  6 2 1 3 1 5 3 5 8 9 ...
 $ anom.sum: num  -2.51 -1.81 -2.33 1.14 -5.89 ...

You’ll note that this is very sensitive to the slightest decline in streamflow. You are able to tweak DefineDrought‘s sensitivity by manually specifying a mean flow value and passing this as an argument. However, especially in exploratory analysis, I’ve found it best to simply sort or filter from this default list so that we can see the entire range of values, even in single year “droughts”.

There you have it! Beats sifting through a time series and manually defining droughts by hand. My next post will (hopefully) be a continuation of this, showing how I’ve used this information to visualize drought onset and impact with ggplot2.

As always, I’m happy to hear questions, comments, corrections or concerns!

EDIT: Updated the FindDrought function based on Cameron Bracken‘s sagely wisdom. Many thanks for introducing me to rle()!

Advertisements

Someone at the lab asked how to ‘do something like running means, but with correlations’. I couldn’t find any existing code that would make a good example, so I just wrote some myself.

It would be nice to do this without looping. If anyone has a clever way to do this, please do let me know.

# 2011-03-04
# v0.01

MovingCor <- function(x, y, window.size=21, method="pearson") {
  # Computes moving correlations between two vectors with symmetrical windows.
  #
  # Args:
  #   x: One of the two vectors whose correlation is to be calculated.
  #   y: The other vector. Note that it must be of the same length as x.
  #   window.size: The size of windows to be used for each calculated
  #                correlation. Note that if even numbers are chosen, the
  #                window will not be skewed as there will be one extra value
  #                on the upper-side of the window. Default size is 21.
  #   method: The method of correlation. May be: "pearson", "kendall", or
  #           "spearman". Default is "pearson".
  #
  # Returns:
  #   A vector of the moving correlations.
  n <- length(x)
  # Setup a few catches for error handling.
  if (TRUE %in% is.na(y) || TRUE %in% is.na(x)) {
    stop("Arguments x and y cannot have missing values.")
  }
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  out <- rep(NA, round(window.size/2))  # Stuffing the returned vector.
  for (value in seq(from = 1, to = n - (window.size - 1))) {
    value.end <- value + (window.size - 1)
    out <- append(out, cor(x[value:value.end],
                           y[value:value.end],
                           method = method))
  }
  out <- append(out, rep(NA, n - length(out)))  # Finish stuffing.
  return(out)
}

EDIT:
There are more nimble functions out there for this, and other window-related tasks. See the caTools‘s runmean function. The package zoo also has a number of quick functions including rollmean and the more general rollapply.

%d bloggers like this: