Archives for category: Science

Recently, I’ve been playing with patterns of drought in paleo records of streamflow. One of the earliest and most helpful tools I’ve developed, identifies and characterizes droughts in extremely long time series using R. I’m still hacking my way through it, but this is what has been cooking thus far…

For this example I’ll be using the Meko et al. 2007 tree-ring reconstruction of the Colorado River’s annual natural flow at Lees Ferry, AZ (extending from A.D. 762-2005). You can check out the data yourself. I’ll be loading it in an R example below.

The heart of this relies on FindDrought, a function that returns a simple binary mask to determine whether a particular year is a “drought year” or not. Here, I’m defining drought as years of below-mean flow, broken by two or more years of above average flow.

FindDrought <- function(x, averageflow=mean(x)) {
  # Determines whether a given time period is in a drought, given a record.
  # Args:
  #   x: A vector of annual stream flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  # Returns:
  #   A vector indicating whether drought present (TRUE) or not present (FALSE).
  foo <- rep(TRUE, length(x))
  foo[x >= averageflow] <- FALSE
  runs <- rle(foo)
  runs$values[runs$lengths < 2 & runs$values == FALSE] <- TRUE

Having defined what years are and are not droughts, we can then use this information in the DefineDrought. This function loops through and collects basic characteristics about each drought event, returning each drought’s start year, stop year, duration, and “intensity” (the sum of anomalies).

DefineDrought <- function(x, y, averageflow=mean(y)) {
  # Determines basic characteristics of droughts in a given flow time series.
  # Args:
  #   x: Year of time series observation, in a vector format.
  #   y: Time series or vector of river flow observations.
  #   averageflow: If you want to specify a flow average for finding droughts.
  # Returns:
  #   A data frame with columns giving:
  #   1) The starting period of a drought - "start".
  #   2) The period a drought stops - "stop".
  #   3) The number of periods that the drought was present - "duration".
  #   4) The sum of anomalies for the drought period - "anom.sum".
  drought.present <- FindDrought(y, averageflow = averageflow)
  start <- c()
  stop <- c()
  duration <- c()
  anom.sum <- c()
  temp.start <- FALSE
  temp.duration <- 0
  temp.anom.sum <- 0
  # There is likely a more elegant way to handle this aside form loops.
  for (obs in 1:length(y)) {
    if (drought.present[obs] == FALSE) {
      temp.start <- FALSE
      temp.duration <- 0
      temp.anom.sum <- 0
    else if (drought.present[obs] == TRUE) {
      if (!temp.start)
        temp.start <- x[obs]
      temp.duration <- temp.duration + 1
      temp.anom.sum <- temp.anom.sum + (y[obs] - averageflow)
      if ((drought.present[obs + 1] == FALSE) | (obs + 1 > length(y))) {
        start <- append(start, temp.start)
        stop <- append(stop, x[obs])
        duration <- append(duration, temp.duration)
        anom.sum <- append(anom.sum, temp.anom.sum)
  return(data.frame(start, stop, duration, anom.sum))

If this tugs at your bobber, you can put this to use in a simple interactive session:

lees <- read.table('',
  header = TRUE,
  skip = 158,  # To skip the meta data.
  na.strings = 'NULL')

# Drum-roll, please.
lees.droughts <- DefineDrought(x = lees$Year, y = lees$RecMAF)

Here, DefineDrought returns a simple dataframe with the structure:

'data.frame':	169 obs. of  4 variables:
 $ start   : int  762 770 774 779 786 790 797 805 812 822 ...
 $ stop    : int  767 771 774 781 786 794 799 809 819 830 ...
 $ duration: num  6 2 1 3 1 5 3 5 8 9 ...
 $ anom.sum: num  -2.51 -1.81 -2.33 1.14 -5.89 ...

You’ll note that this is very sensitive to the slightest decline in streamflow. You are able to tweak DefineDrought‘s sensitivity by manually specifying a mean flow value and passing this as an argument. However, especially in exploratory analysis, I’ve found it best to simply sort or filter from this default list so that we can see the entire range of values, even in single year “droughts”.

There you have it! Beats sifting through a time series and manually defining droughts by hand. My next post will (hopefully) be a continuation of this, showing how I’ve used this information to visualize drought onset and impact with ggplot2.

As always, I’m happy to hear questions, comments, corrections or concerns!

EDIT: Updated the FindDrought function based on Cameron Bracken‘s sagely wisdom. Many thanks for introducing me to rle()!


Someone at the lab asked how to ‘do something like running means, but with correlations’. I couldn’t find any existing code that would make a good example, so I just wrote some myself.

It would be nice to do this without looping. If anyone has a clever way to do this, please do let me know.

# 2011-03-04
# v0.01

MovingCor <- function(x, y, window.size=21, method="pearson") {
  # Computes moving correlations between two vectors with symmetrical windows.
  # Args:
  #   x: One of the two vectors whose correlation is to be calculated.
  #   y: The other vector. Note that it must be of the same length as x.
  #   window.size: The size of windows to be used for each calculated
  #                correlation. Note that if even numbers are chosen, the
  #                window will not be skewed as there will be one extra value
  #                on the upper-side of the window. Default size is 21.
  #   method: The method of correlation. May be: "pearson", "kendall", or
  #           "spearman". Default is "pearson".
  # Returns:
  #   A vector of the moving correlations.
  n <- length(x)
  # Setup a few catches for error handling.
  if (TRUE %in% || TRUE %in% {
    stop("Arguments x and y cannot have missing values.")
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  out <- rep(NA, round(window.size/2))  # Stuffing the returned vector.
  for (value in seq(from = 1, to = n - (window.size - 1))) {
    value.end <- value + (window.size - 1)
    out <- append(out, cor(x[value:value.end],
                           method = method))
  out <- append(out, rep(NA, n - length(out)))  # Finish stuffing.

There are more nimble functions out there for this, and other window-related tasks. See the caTools‘s runmean function. The package zoo also has a number of quick functions including rollmean and the more general rollapply.

I’m in the process of preparing my first manuscript to submit to the American Meteorological Society (AMS) and their Journal of Applied Meteorology and Climatology. This is a write-up explaining how I go about doing this. Hopefully it will save you some time if you are in the same boat.

Tools used

The immediate tools I’ve been using at this end of the project are:

My operating system of choice. I’m using version 10.04 Lucid Lynx.
ArcGIS 9.2 (via VirtualBox)
A proprietary GIS system that is something of a standard in the field. It’s important here because I use this for all my cartography.
A must-have language for data analysis. One of the most powerful ways to plot and visualize data.

An excellent tool for image editing. Here, I’ve been using it to convert TIFF to EPS.

A very hand Firefox extension that handles all my bibliographic data. An essential tool for any Modern Grad Student worth their salt.

The feared type-setting system.

Why go through all this trouble?

AMS lets authors submit papers either through a DOC template or a tarball of their LaTeX template. I’ve chosen to use LaTeX not only because I love pain, but also because Microsoft Office is evil. My paper shifts between 20-30 pages and is loaded with massive, high-resolution graphics (1200 dpi) and loads of special characters and formatting. Managing this kind of work in a slow, bulky (and proprietary) DOC would drive anyone crazy. I say this thinking several years down the line when I may want to access the paper and this particular version of DOC may be obsolete. Having said that, I will be quick to note that writing in LaTeX can be a big pain in the ass. The results, however, are awesome (do not Google awesome LaTeX).

At this point in the project, the majority of my writing is done. When I need to collaborate with a bunch of people or have digital copies that people can annotate I’ll have a rough DOC, TXT, or ODT file that I can pass around. Many of my associates use MS Windows and MacOS and are generally not fluent in LaTeX. This is just something to keep in mind if you haven’t already started writing (which is not likely if you’re reading this).

Setting up your workspace

Go grab yourself a copy of the most recent AMS LaTeX template and their author’s guide (for God’s sake, read this!).

The template tarball has the file structure you’re going to want to be working with. Mainly, a working director with two sub-directories: figures and bibliography. I’m sure you can guess what these are for. Feel free to change the name of the parent directory to anything you like, but AMS_Latex does have a nice ring to it.

Now, lets get you the software you’re going to need to build the documents. (Note that these directions are for Ubuntu Lucid and its software repositories. They will likely work on most Debian-based systems, but your mileage may vary.)

You’ll need two packages and their dependencies: texlive-publishers and texlive-latex-extra. You can install these using your favorite package-manager or with the command:
sudo apt-get install texlive-publishers texlive-latex-extra
This should give you everything you need to do some damage.

Bringing on the pain

The only files that you really should be worrying about are 1) the contents of bibliography 2) the contents of figures and 3) the file blank_template.tex.

Your .tex document

You’re going to want to open blank_template.tex and save it under a new name. This new file is going to be where you do all your typing. Use the original blank_template.tex and amspaper.tex as guides to format your document. You can see the final output of blank_template.tex and amspaper.tex in the files with the same name, but using the PDF extension.

These two .tex documents are full of helpful comments and guidance to get you started. Most of your text can just be cut and pasted from OpenOffice or whatever you use. I’ll briefly cover citations/bibliography and figures below. If you need help with more specific, advanced formatting or syntax, just Google it. Chances are there is a nice guide or forum post that will point you in the right direction.

The bibliography

Exporting a bibliography for LaTeX from Zotero is very easy. Open up Zotero and be sure you have the folder with your bibliography highlighted in the left-most pane. Under the Actions menu (the little gray gear) select Export library… and you want to select BibTex from the Format drop-down menu. Now, place this in your bibliography directory so that it replaces references.bib.

That should be all you need to do. Watch for error messages when you build your .tex document, you may need to go back and check references.bib for syntax shenanigans… incomplete parenthesis, quotations, brackets and the like. But, these kinds of problems are rare.

Citations in your .tex document are fairly simple. Here is a brief reference sheet to get you started. The key used to designate each citation in your .tex document is in the first line of each item in your references.bib file.


Once again, read the author’s guide or you will end up redoing alot of your work! There are specific format and size requirements for all of your figures.

One of the biggest problem I had is with image file format. You’re going to want to use EPS for all of your images. This isn’t too big of a deal, because you can easily export high resolution TIFFs from ArcGIS and R and then use the GIMP to convert them to EPS. Be sure that you export them and convert them using the proper image size.

I wouldn’t be surprised if R had the ability to natively export EPS, I export TIFF because it seems to give me a few more options.

If you are using ArcGIS, you’ll want to avoid exporting EPS directly from ArcGIS. In my experience, the results have had poor scaling and used the wrong icons. Instead, export it as a high-resolution TIFF and then use the GIMP to export it as a EPS.

Place all your .eps files into the figures folder and be careful of how you name them because you’ll have to cite these names in your .tex file to insert the images. The original AMS template .tex files are a pretty good example of how to embed your figures, so I wont go into that here.

Creating your document

You need to run a series of commands to put everything together into a PDF. Open up a terminal in your working directory (the one with your .tex file) and line-by-line run:

latex yourfile.tex
bibtex yourfile # Note that there is no .tex file extension here.
latex yourfile.tex
latex yourfile.tex # This 2nd run of latex cleans-out reference dependencies.
dvips yourfile.dvi -o

Pay attention to the output of each of these commands and pick through the output to look for errors or any mistakes. Once your more comfortable with what’s going on, we can put all of this into a bash script to automate the process:

#! /usr/bin/env bash
#This is a quick script to help produce a PDF from .tex files for AMS publication.

if [ ! -e "$1" ]
    echo "The file or directory given does not seem to exist."
    exit 1

filename=${1/${1: -4}}

echo "########## $(date +%F) - RUNNING LATEX1 ##########"
latex $filename.tex
echo "########## $(date +%F) - RUNNING BIBTEX ##########"
bibtex $filename
echo "########## $(date +%F) - RUNNING LATEX2 ##########"
latex $filename.tex
echo "########## $(date +%F) - RUNNING LATEX FINAL ##########"
latex $filename.tex
echo "########## $(date +%F) - CONVERTING DVI TO PS ##########"
dvips $filename.dvi -o $
echo "########## $(date +%F) - CONVERTING PS TO PDF ##########"
ps2pdf14 $
echo "########## $(date +%F) - ALL DONE! ##########"

Run this script with the path of (or simply the file name if it’s in your CWD) your .tex file as the only argument. It will build everything, giving you error messages, and eventually produce your PDF.

That should be enough to get you started, and maybe do some damage. Good luck doing all the rest.

%d bloggers like this: