Read Sidney Hemming’s first post to learn more about the goals of her two-month research cruise off southern Africa and its focus on the Agulhas Current and collecting climate records for the past 5 million years.
Our first day on the ocean was pretty rough. We left the harbor in Mauritius into high winds and choppy seas, and I don’t think I was alone in feeling pretty miserable. I woke up the next day to calm seas and a much better perspective.
We have been busy with meetings, training sessions, and planning for the core flow, and I think people are getting close to being ready for the 12-hour shifts. My shift is 3 p.m. to 3 a.m., and my co-chief scientist Ian Hall’s is the opposite. It works out pretty well relative to our home clocks (when I start my shift, it’s 8 a.m. back in New York), and we’ll have significant overlap. I plan to get started by noon, and Ian will hang around until 6 or so before going to bed. We have decided we’ll take a break for exercise—should be a good strategy.
The staff is wonderful on the ship. They feed me great meals, and there is even an espresso machine right outside the science office where I sit. Today Kevin Grieger, our operations manager, gave us a tour to the bridge, the drilling rig and the core shack, where we met Bubba Attryde, who has been the core specialist since Glomar Challenger days and continues to make innovations. We went down through the motors and pumps, past the moon pool, and out to the JOIDES Resolution‘s helideck.
The helideck has a special role this cruise. On March 26, Ian Hall and Steve Barker will be running in the IAAF/Cardiff University World Half Marathon Championships. It requires 328 laps around the deck, which is noisy and hot. They are doing it to raise money for a small South African charity called the Goedgedacht Trust, which promotes education to help poor rural African children escape grinding poverty. Ian has learned that the money raised will help bring solar power to schools. When we reach Cape Town, some of the children plan to tour the ship.
It is now official that we will start with the Natal Valley site while we wait for clearance from Mozambique to work on what would have been our first site.
The Natal Valley is at the beginning of the Agulhas Current, where the waters flowing through the Mozambique Channel and the East Madagascar Current come together and flow along the southern Africa coast. A central goal of the expedition is to understand the history of the Agulhas Current and its role in climate variability, and this site could help us characterize how the microorganisms and the land-derived sediments it carries have changed over the last 5 million years.
Recently published evidence from the past 270,000 years from very close to the Natal Valley site also shows that there have been significant changes in rainfall in southern Africa on millennial time scales. We are very interested in getting a longer record of rainfall changes with this expedition. So in effect, we have the dual goals of understanding the nearby climate record from Africa and understanding the ocean currents below which the core is located—both the Agulhas Current and deep water circulation, which currently flows north along the western Natal Valley and is the reason for the sediment “contourite” accumulation that we are coring.
We will be getting to the Natal Valley site about 8 p.m. local time on Tuesday, so we should have cores coming in before daylight on Wednesday. You can feel the excitement start to build. Our staff scientist, Leah, has organized everybody well. The groups gave reports on their methods this morning and will turn in drafts of their methods before we get to the first site. It’s getting close!
Sidney Hemming is a geochemist and professor of Earth and Environmental Sciences atLamont-Doherty Earth Observatory. She uses the records in sediments and sedimentary rocks to document aspects of Earth’s history.
By Frankie Pavia
How far is five kilometers, vertically? We leaned over the edge of the boat, staring into the water, watching the last glimmer of light from the in-situ pump disappear into the abyss. The furthest down we could see the pump was 50 meters from the surface—remarkably far to still see light anywhere in the ocean, courtesy of the life-devoid upper waters of the South Pacific.
That’s a comprehensible depth, 50 meters. It’s about the same as a 15-story building. But five kilometers? My German colleague and I could conceptualize five kilometers horizontally—the same as her bike ride to work, the same as the first ever race I ran. Neither of us could quite grasp what flipping 5 kilometers 90 degrees might mean, as our pump continued on its 3-hour vertical journey to that depth.
The spirit of exploration is embedded within all scientific research. It is a quest to probe and understand the unknown. But oceanographers and astronauts have something more than that—the work they do also involves the physical exploration of spaces that have yet to come under dominion of humanity. The ocean and space have not yet been rendered permanently habitable. No human lives at sea or in space without having to depend on land for survival.
I expected to conclude the cruise with a deeper connection to the ocean. I expected to feel like I had performed an act of exploration by sailing from one land mass to another, and as a result to have gained some fundamental understanding of the ocean’s spatial domain.
Yet a week after I stepped foot from the FS Sonne for good, I am left feeling like the ocean is further from my grasp than ever. Five kilometers depth, and all I did was sail across a tiny fraction of the surface. Sure, I hauled back samples from the deep, and I will certainly learn an incredible amount about it from chemical measurements. But did I explore the deep ocean? Is it possible to explore a place without actually traveling there?
I wonder how astronauts feel when they return to Earth. Just like oceanographers experience only the top of the ocean, astronauts only scratch the surface of an incomprehensibly large volume of space. Does it make them feel like a part of something greater, or does experiencing its massive scale make them feel even smaller?
While the ocean is a vast nexus of life, space is seemingly devoid of it. The ocean certainly holds clues as to how life formed on our planet, and where it may exist on distant moons in our solar system. On Mars, it is the locations of long-dessicated oceans and running water where life is thought to have been possible in the distant past. In habitability, oceans are our pluperfect, Earth is our future perfect, space is our future.
The connection between oceans and space will certainly be a source of excitement for science in the coming years. Ice-covered moons in our solar system have liquid water oceans; surely there are planets and moons orbiting stars other than ours that have them as well. How will we ever understand them if we have only seen such a small portion of the ocean’s volume on Earth?
And so we plunge onward into the indomitable vastness of the oceans, of space. I came away feeling further than ever from the oceans after this cruise. To fix that, I must keep exploring.
Read Sidney Hemming’s first post to learn more about the goals of her two-month research cruise off Southern Africa and its focus on the Agulhas Current and collecting climate records for the past 5 million years.
It’s almost midnight here, and we’ll be setting sail around 7 a.m. The transit will take approximately six days to the first coring site. Right now, we have the uncertainty that we may not have Mozambique clearance in time for the first intended site, so we will have to make a decision when we get to the tip of Madagascar about whether to head toward the proposed first site, or instead go to the site that would be #4, the northernmost site in South African waters. Apparently this is a normal thing that the permissions are not granted until just as the ship leaves (we hope that happens here), and in our case we have rumors that the form has been signed but it is unclear where it is.
So Kevin Grieger, our operations manager, has been calculating times for alternative plans and considering plans we may have to drop if we cannot stick with the original schedule. We may have to skip some of the operations, and we may even have to forgo a site. Our highest priority site is the sixth out of six on our geographic path, so we have to be judicious in our planning in order to ensure we get there. And it is the closest to the port in Cape Town — word is we only have eight hours in the schedule between the coring site and the port — exciting but also scary because of all the work we have to get done before getting into port.
My husband, Gary, and I had fun in Mauritius before we came to meet up with the JOIDES Resolution. Ian Hall (the other co-chief), Leah LeVay (the IODP staff scientist) and I boarded the ship on Jan. 30, and we went into Port Luis for dinner that night to meet up with a few of the scientists, Allison Franzese, Steve Barker (former Lamont post-doc), and Sophie Hines. Sophie is a Caltech graduate student who is leading the pore water sampling program for her advisor Jess Adkins (also a former Lamont post-doc) who was unable to participate in the expedition.
So we have been living on the ship since the 30th and getting ready for the cruise. That involves a lot of meetings and training. Many of the science team did not know each other before we got here, and we also did not know about the others’ research plans. The plans will evolve as we discuss potential overlaps and collaborations. And they will also change as we find out what we really are going to encounter in the cores. We are all getting to know each other and learning what each others’ interests are and trying to come up with a plan that will maximize what we can discover with the materials we will collect on this cruise. It is very different than anything I have done before, and it is exciting. I think it will be a really rewarding experience. The group seems to already have developed a good rapport, and we are all very optimistic.
While we have been in Mauritius, the BBC picked up on our work, and twitter has been atwitter with blurbs about the cruise and people on the cruise. We have also had quite a few tours through the ship. Dick Norris (from Scripps Institution of Oceanography) and I went to a girls’ school yesterday and discussed global change and encouraged them to think about science. A small group of the girls from that school came for a tour today, and they seemed really keen and engaged. Lisa Crowder, who oversees the ship’s technicians working with core processing protocols and lab equipment, gave a really awesome show that we all enjoyed! She used the straw-in-the-milkshake analogy for coring in the ocean. It was a great visual!
It is supposed to be quite windy tomorrow, so I’m nervous about being seasick and I’m going to take my Dramamine first thing in the morning. I sure hope I am going to get my sea legs quickly!
Sidney Hemming is a geochemist and professor of Earth and Environmental Sciences at Lamont-Doherty Earth Observatory. She uses the records in sediments and sedimentary rocks to document aspects of Earth’s history.
I am on my way to Mauritius to spend a few days with my husband, Gary, before boarding the JOIDES Resolution in Port Louis for a two-month research cruise, IODP Expedition 361, South African Climates. This is my first cruise as co-chief scientist, so I am both excited and nervous. The goal of the cruise is to obtain climate records for the past 5 million years at six sites around southern Africa. Each has its own special focus.
As with research cruises in general, this represents the culmination of a huge effort over many years, in this case led by Rainer Zahn and my co-chief scientist, Ian Hall. Those efforts included planning workshops, site survey cruises, proposal writing and re-writing, and a lot of development of stratigraphic records and proxies of climate-sensitive factors such as temperature and salinity of surface and intermediate waters, positions of the “Agulhas Retroflection,” and evidence for deep ocean circulation.
The cruise is going to an exciting place in the global climate system. Evidence for the vigor of North Atlantic Deep Water overturning circulation (a.k.a. the Great Ocean Conveyor) can be found in the same sediment cores taken from the floor of the southern Cape Basin, off the southwest coast of South Africa, where scientists have found evidence for Agulhas “leakage” of warm, salty water from the Indian Ocean into the Atlantic Ocean.
The Agulhas Current is the strongest western boundary current in the world’s oceans. It flows along the eastern side of southern Africa, and when it reaches the tip of Africa, it is “retroflected” to flow east, parallel to the Antarctic Circumpolar Current. Changes in the Agulhas Current are coincident with climate change in Africa, and thus it may even have been an important factor in the evolution of our species in Africa. Lamont-Doherty Earth Observatory’s Arnold Gordon has made the case that leakage of salt and heat from the Agulhas Current into the Atlantic Ocean is one of the ingredients that enhances North Atlantic Deep Water overturning circulation.
On this cruise, we’re studying the current to collectively try to uncover the story of southern African climates and their connections with global ocean circulation and climate variability for the past 5 million years.
My role is to use the layers of sediment on the ocean floor that either blew in or washed in from land to contribute to understanding of rainfall and runoff, weathering on Africa, and changes in the Agulhas. (I’m working on this in collaboration with fellow sailing scientists Allison Franzese of Lamont, Margit Simon of the University of Bergen, and Ian Hall of Cardiff University, and shore-based scientist Steve Goldstein at Lamont). Questions about rainfall and runoff and weathering will be tackled in sediment cores that are near major rivers. These efforts will also serve to characterize the composition of sediments being carried in the Agulhas Current.
Fortuitously, the sources of sediments along the eastern coast of South Africa have significantly different radiogenic isotopes than those on the western side. Radiogenic isotopes are isotope systems that change due to radioactive decay of a parent isotope and thus respond to the time-aspects of geologic history. We discovered during the Ph.D. thesis of former graduate student Randye Rutberg that in RC11-83, a pretty famous sediment core from the southern Cape Basin, the land-derived sediments have a higher ratio of Strontium-87 to Strontium-86 during warmer climate intervals than during colder intervals, and the values are so high as to require an external source—that is the eastern side of Africa and carried into the Atlantic via the Agulhas leakage.
Franzese did her Ph.D. thesis on land-derived sediment evidence of changes in the Agulhas between glacial times, about 20,000 years ago, and modern times. She documented the map pattern of variability of both the sources and sediment changes, and further confirmed the role of the Agulhas Current in depositing sediments in the South Atlantic. It will be super exciting to extend the observations back to 5 million years and explore how the sources, as well as the Agulhas Current itself, may have changed.
Sidney Hemming is a geochemist and professor of Earth and Environmental Sciences at Lamont-Doherty Earth Observatory. She uses the records in sediments and sedimentary rocks to document aspects of Earth’s history.
I’ve been making a lot of improvements to paprica, our program for conducting metabolic inference on 16S rRNA gene sequence libraries. The following is a complete analysis example with paprica to clarify the steps described in the manual, and to highlight some of the recent improvements to the method. I’ll continue to update this tutorial as the method evolves. This tutorial assumes that you have all the dependencies for paprica_run.sh installed and in your PATH. If you’re a Mac user you can follow the instructions here. If you’re a Linux user (including Windows users running Linux in a VirtualBox) installation is a bit simpler, just follow the instructions in the manual. The current list of dependencies are:
- Infernal (including Easel)
- pplacer (including Guppy)
- Python version 2.7 (I strongly recommend using Anacondas)
It also assumes that you have the following Python modules installed:
Finally, it assumes that you are using the provided database of metabolic pathways and genome data included in the ref_genome_database directory. At some point in the future I’ll publish a tutorial describing how to use paprica_build.sh to build a custom database. All the dependencies are installed and tested? Before we start let’s get familiar with some terminology.
community structure: The taxonomic structure of a bacterial assemblage.
edge: An edge is a point of placement on a reference tree. Think of it as a branch of the reference tree. Edges take the place of OTUs in this workflow, and are ultimately far more interesting and informative than OTUs. Refer to the pplacer documentation for more.
metabolic structure: The abundance of different metabolic pathways within a bacterial assemblage.
reference tree: This is the tree of representative 16S rRNA gene sequences from all completed Bacterial genomes in Genbank. The topology of the tree defines what pathways are predicted for internal branch points.
Now let’s get paprica. There are two options here. Option 1 is to use git to download the development version. The development version has the most recent bug fixes and features, but has not been fully tested. That means that it’s passed a cursory test of paprica_build.sh and paprica_run.sh on my development system (a Linux workstation), but I haven’t yet validated paprica_run.sh on an independent machine (a Linux VirtualBox running on my Windows laptop). The development version can be downloaded and made executable with:git clone https://github.com/bowmanjeffs/paprica.git cd paprica chmod a+x *sh
Option 2 is to download the last stable version. In this case stable means that paprica_build.sh and paprica_run.sh successfully built on the development system, paprica_run.sh successfully ran on a virtual box, and that paprica_build.sh and paprica_run.sh both successfully ran a second time from the home directory of the development machine. The database produced by this run is the one that can be found in the latest stable version. Downloading and making the latest stable version executable is the recommended way of getting paprica, but requires a couple of additional steps. For the current stable release (v0.22):wget https://github.com/bowmanjeffs/paprica/archive/paprica_v0.22.tar.gz tar -xzvf paprica_v0.22.tar.gz mv paprica-paprica_v0.22 paprica cd paprica chmod a+x *sh
Now using your text editor of choice (I recommend nano) you should open the file titled paprica_profile.txt. This file will look something like:## Variables necessary for the scripts associated with paprica_run.sh and paprica_build.sh. # This is the location of the reference directory. ref_dir=~/paprica/ref_genome_database/ # This is the location of the covariance model used by Infernal. cm=~/paprica/bacterial_ssu.cm ## Variables associated with paprica_build.sh only. # This is the number of cpus RAxML will use. See RAxML manual for guidance. cpus=8 # This is the location where you want your pgdbs placed. This should match # what you told pathway-tools, or set to pathway-tools default location if # you didn't specify anything. pgdb_dir=~/ptools-local/pgdbs/user/ ## Variables associated with paprica_run.sh only. # The fraction of terminal daughters that need to have a pathway for it # to be included in an internal node. cutoff=0.5
For the purposes of this tutorial, and a basic analysis with paprica, we are only concerned with the ref_dir, cm, and cutoff variables. The ref_dir variable is the location of the reference database. If you downloaded paprica to your home directory, and you only intend to use paprica_run.sh, you shouldn’t need to change it. Ditto for cm, which is the location of the covariance model used by Infernal. The cutoff variable specifies in what fraction of genomes in a given clade a metabolic pathway needs to appear in to be assigned to a read placed to that clade. In practice 50 % works well, but you may wish to be more or less conservative depending on your objectives. If you want to change it simply edit that value to be whatever you want.
Now go ahead and make sure that things are working right by executing paprica_run.sh using the provided file test.fasta. From the paprica directory:./paprica_run.sh test
This will produce a variety of output files in the paprica directory:ls test* test.clean.align.sto test.clean.fasta test.combined_16S.tax.clean.align.csv test.combined_16S.tax.clean.align.fasta test.combined_16S.tax.clean.align.jplace test.combined_16S.tax.clean.align.phyloxml test.combined_16S.tax.clean.align.sto test.edge_data.csv test.fasta test.pathways.csv test.sample_data.txt test.sum_pathways.csv
Each sample fasta file that you run will produce similar output, with the following being particularly useful to you:
test.combined_16S.tax.clean.align.jplace: This is a file produced by pplacer that contains the placement information for your sample. You can do all kinds of interesting things with sets of jplace files using Guppy. Refer to the Guppy documentation for my details.
test.combined_16S.tax.clean.align.phyloxml: This is a “fat” style tree showing your the placements of your query on the reference tree. You can view this tree using Archaeopteryx.
test.edge_data.csv: This is a csv format file containing data on edge location in the reference tree that received a placement, such as the number of reads that placed, predicted 16S rRNA gene copies, number of reads placed normalized to 16S rRNA gene copies, GC content, etc. This file describes the taxonomic structure of your sample.
test.pathways.csv: This is a csv file of all the metabolic pathways inferred for test.fasta, by placement. All possible metabolic pathways are listed, the number attributed to each edge is given in the column for that edge.
test.sample_data.txt: This file described some basic information for the sample, such as the database version that was used to make the metabolic inference, the confidence score, total reads used, etc.
test.sum_pathways.csv: This csv format file describes the metabolic structure of the sample, i.e. pathway abundance across all edges.
Okay, that was all well and good for the test.fasta file, which has placements only for a single edge and is not particularly exciting. Let’s try something a bit more advanced. Create a directory for some new analysis on your home directory and migrate the necessary paprica files to it:cd ~ mkdir my_analysis cp paprica/paprica_run.sh my_analysis cp paprica/paprica_profile.txt my_analysis cp paprica/paprica_place_it.py my_analysis cp paprica/paprica_tally_pathways.py my_analysis
Now add some fasta files for this tutorial. The two fasta files are from Luria et al. 2014 and Bowman and Ducklow, 2015. They are a summer and winter surface sample from the same location along the West Antarctic Peninsula. I recommend wget for downloads of this sort, if you don’t have it, and don’t want to install it for some reason, use curl.cd my_analysis wget http://www.polarmicrobes.org/extras/summer.fasta wget http://www.polarmicrobes.org/extras/winter.fasta
We’re cheating a bit here, because these samples have already been QC’d. That means I’ve trimmed for quality and removed low quality reads, removed chimeras, and identified and removed mitochondria, chloroplasts, and anything that didn’t look like it belonged to the domain Bacteria. I used Mothur for all of these tasks, but you may wish to use other tools.
Run time may be a concern for you if you have many query files to run, and/or they are particularly large. The rate limiting step in paprica is pplacer. We can speed pplacer up by telling paprica_place_it.py to split the query fasta into several pieces that pplacer will run in parallel. Be careful of memory useage! pplacer creates two threads automatically when it runs, and each thread uses about 4 Gb of memory. So if you’re system has only 2 cpus and 8 Gb of memory don’t use this option! If you’re system has 32 Gb of RAM I’d recommend 3 splits, so that you don’t max things out.
While we’re modifying run parameters let’s make one additional change. The two provided files have already been subsampled so that they have equal numbers of reads (1,977). We can check this with:grep -c '>' *fasta summer.fasta:1977 winter.fasta:1977
But suppose this wasn’t the case? It’s generally a good idea to subsample your reads to the size of the smallest library so that you are viewing diversity evenly across samples. You can get paprica to do this for you by specifying the number of reads paprica_place_it.py should use.
To specify the number of splits and the number of reads edit the paprica_place_it.py flags in paprica_run.sh:## default line #python paprica_place_it.py -query $query -ref combined_16S.tax -splits 1 ## new line python paprica_place_it.py -query $query -ref combined_16S.tax -splits 3 -n 1000
This will cause paprica to subsample the query file (by random selection) to 1000 reads, and split the subsampled file into three query files that will be run in parallel. The parallel run is blind to you, the output should be identical to a run with no splits (-splits 1). If you use subsampling you’ll also need to change the paprica_tally_pathways.py line, as the input file name will be slightly different.## default line python paprica_tally_pathways.py -i $query.sub.combined_16S.tax.clean.align.csv -o $query ## new line python paprica_tally_pathways.py -i $query.combined_16S.tax.clean.align.csv -o $query
Here we are only analyzing two samples, so running them manually isn’t too much of a pain. But you might have tens or hundreds of samples, and need a way to automate that. We do this with a simple loop. I recommend generating a file with the prefixes of all your query files and using that in the loop. For example the file samples.txt might have:summer winter
This file can be inserted into a loop as:while read f;do ./paprica_run.sh $f done < samples.txt
Note that we don’t run them in parallel using say, gnu parallel, because Infernal is intrinsically parallelized, and we already forced pplacer to run in parallel using -splits.
Once you’ve executed the loop you’ll see all the normal paprica output, for both samples. It’s useful to concatenate some of this information for downstream analysis. The provided utility combine_edge_results.py can do this for you. Copy it to your working directory:cp ~/paprica/utilities/combine_edge_results.py combine_edge_results.py
This script will automatically aggregate everything with the suffix .edge_data.csv. You need to specify a prefix for the output files.python combine_edge_results.py my_analysis
This produces two files:
my_analysis.edge_data.csv: This is the mean genome parameters for each sample. Lots of good stuff in here, see the column labels.
my_analysis.edge_tally.csv: Edge abundance for each sample (corrected for 16S rRNA gene copy). This is your community structure, and is equivalent to an OTU table (but much better!).
To be continued…