Recently, I’ve been playing with patterns of drought in paleo records of streamflow. One of the earliest and most helpful tools I’ve developed, identifies and characterizes droughts in extremely long time series using R. I’m still hacking my way through it, but this is what has been cooking thus far…
For this example I’ll be using the Meko et al. 2007 tree-ring reconstruction of the Colorado River’s annual natural flow at Lees Ferry, AZ (extending from A.D. 762-2005). You can check out the data yourself. I’ll be loading it in an R example below.
The heart of this relies on FindDrought, a function that returns a simple binary mask to determine whether a particular year is a “drought year” or not. Here, I’m defining drought as years of below-mean flow, broken by two or more years of above average flow.
FindDrought <- function(x, averageflow=mean(x)) {
# Determines whether a given time period is in a drought, given a record.
#
# Args:
# x: A vector of annual stream flow observations.
# averageflow: If you want to specify a flow average for finding droughts.
#
# Returns:
# A vector indicating whether drought present (TRUE) or not present (FALSE).
foo <- rep(TRUE, length(x))
foo[x >= averageflow] <- FALSE
runs <- rle(foo)
runs$values[runs$lengths < 2 & runs$values == FALSE] <- TRUE
return(inverse.rle(runs))
}
Having defined what years are and are not droughts, we can then use this information in the DefineDrought. This function loops through and collects basic characteristics about each drought event, returning each drought’s start year, stop year, duration, and “intensity” (the sum of anomalies).
DefineDrought <- function(x, y, averageflow=mean(y)) {
# Determines basic characteristics of droughts in a given flow time series.
#
# Args:
# x: Year of time series observation, in a vector format.
# y: Time series or vector of river flow observations.
# averageflow: If you want to specify a flow average for finding droughts.
#
# Returns:
# A data frame with columns giving:
# 1) The starting period of a drought - "start".
# 2) The period a drought stops - "stop".
# 3) The number of periods that the drought was present - "duration".
# 4) The sum of anomalies for the drought period - "anom.sum".
drought.present <- FindDrought(y, averageflow = averageflow)
start <- c()
stop <- c()
duration <- c()
anom.sum <- c()
temp.start <- FALSE
temp.duration <- 0
temp.anom.sum <- 0
# There is likely a more elegant way to handle this aside form loops.
for (obs in 1:length(y)) {
if (drought.present[obs] == FALSE) {
temp.start <- FALSE
temp.duration <- 0
temp.anom.sum <- 0
}
else if (drought.present[obs] == TRUE) {
if (!temp.start)
temp.start <- x[obs]
temp.duration <- temp.duration + 1
temp.anom.sum <- temp.anom.sum + (y[obs] - averageflow)
if ((drought.present[obs + 1] == FALSE) | (obs + 1 > length(y))) {
start <- append(start, temp.start)
stop <- append(stop, x[obs])
duration <- append(duration, temp.duration)
anom.sum <- append(anom.sum, temp.anom.sum)
}
}
}
return(data.frame(start, stop, duration, anom.sum))
}
If this tugs at your bobber, you can put this to use in a simple interactive session:
lees <- read.table('ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/reconstructions/northamerica/usa/upper-colorado-flow2007.txt',
header = TRUE,
skip = 158, # To skip the meta data.
na.strings = 'NULL')
# Drum-roll, please.
lees.droughts <- DefineDrought(x = lees$Year, y = lees$RecMAF)
Here, DefineDrought returns a simple dataframe with the structure:
'data.frame': 169 obs. of 4 variables: $ start : int 762 770 774 779 786 790 797 805 812 822 ... $ stop : int 767 771 774 781 786 794 799 809 819 830 ... $ duration: num 6 2 1 3 1 5 3 5 8 9 ... $ anom.sum: num -2.51 -1.81 -2.33 1.14 -5.89 ...
You’ll note that this is very sensitive to the slightest decline in streamflow. You are able to tweak DefineDrought‘s sensitivity by manually specifying a mean flow value and passing this as an argument. However, especially in exploratory analysis, I’ve found it best to simply sort or filter from this default list so that we can see the entire range of values, even in single year “droughts”.
There you have it! Beats sifting through a time series and manually defining droughts by hand. My next post will (hopefully) be a continuation of this, showing how I’ve used this information to visualize drought onset and impact with ggplot2.
As always, I’m happy to hear questions, comments, corrections or concerns!
EDIT: Updated the FindDrought function based on Cameron Bracken‘s sagely wisdom. Many thanks for introducing me to rle()!

Nice post! The FindDrought function seems a bit generous in the classification of drought. Do you have a reference for this?
Glad you enjoyed the post.
Yeah, this a very liberal definition. It gives droughts with positive anomalies! As you are aware, defining “drought” seems to be one of the bigger pains in this kind of work. In this case, this analysis was part of a larger project with the USBR to look at what atmospheric circulations do when droughts turn ‘on’ and ‘off’. So, this is simply how everyone had defined drought, as to catch a wide variety of events, for this leg of the project.
The nice thing is that DefineDrought can easily work with any function that produces a vector binary mask.
btw, here is my version of the FindDrought function that appears to return the same results:
FindDrought2 <- function(x, limit = mean(x)){
bts = limit] <- 1L
runs <- rle(bts)
runs$values[runs$lengths < 2 & runs$values == 1] <- 0
as.factor(!as.logical(inverse.rle(runs)))
}
Nice work.
the package is up at CRAN. RghcnV3
more versions coming as I refactor old stuff into the new approach
I will be sure to check that out!
Looking forward to future versions.
cool. I’ve just up loaded a new version with a more complete manual and some added functionality. If your working with spatial data ( say drought data that has spatial coordinates) then you might look at “raster” I detail pretty well the process of getting time based data into a 3D raster ( Lat/Lon/Time) and you can always ping me with questions.
also have a look at xblocks in the zoo package