News aggregator

Iron Fertilization Might Not Make Oceans Better Carbon Sinks - Eos

Featured News - Fri, 02/05/2016 - 12:00
New research from Lamont's Kassandra Costa suggests more iron during the last ice age did not mean more algae production in the equatorial Pacific, pointing to possible futility of a controversial geoengineering idea.

Setting Off for Two Months at Sea

When Oceans Leak - Wed, 02/03/2016 - 18:59
The scientists aboard the Joides Resolution for Expedition 361

The scientists of Expedition 361, including Co-Chief Scientist Sidney Hemming, will be spending the next two months aboard the Joides Resolution.

Read Sidney Hemming’s first post to learn more about the goals of her two-month research cruise off Southern Africa and its focus on the Agulhas Current and collecting climate records for the past 5 million years.

It’s almost midnight here, and we’ll be setting sail around 7 a.m. The transit will take approximately six days to the first coring site. Right now, we have the uncertainty that we may not have Mozambique clearance in time for the first intended site, so we will have to make a decision when we get to the tip of Madagascar about whether to head toward the proposed first site, or instead go to the site that would be #4, the northernmost site in South African waters. Apparently this is a normal thing that the permissions are not granted until just as the ship leaves (we hope that happens here), and in our case we have rumors that the form has been signed but it is unclear where it is.

So Kevin Grieger, our operations manager, has been calculating times for alternative plans and considering plans we may have to drop if we cannot stick with the original schedule. We may have to skip some of the operations, and we may even have to forgo a site. Our highest priority site is the sixth out of six on our geographic path, so we have to be judicious in our planning in order to ensure we get there. And it is the closest to the port in Cape Town — word is we only have eight hours in the schedule between the coring site and the port — exciting but also scary because of all the work we have to get done before getting into port.

My husband, Gary, and I had fun in Mauritius before we came to meet up with the JOIDES Resolution. Ian Hall (the other co-chief), Leah LeVay (the IODP staff scientist) and I boarded the ship on Jan. 30, and we went into Port Luis for dinner that night to meet up with a few of the scientists, Allison Franzese, Steve Barker (former Lamont post-doc), and Sophie Hines. Sophie is a Caltech graduate student who is leading the pore water sampling program for her advisor Jess Adkins (also a former Lamont post-doc) who was unable to participate in the expedition.

So we have been living on the ship since the 30th and getting ready for the cruise. That involves a lot of meetings and training. Many of the science team did not know each other before we got here, and we also did not know about the others’ research plans. The plans will evolve as we discuss potential overlaps and collaborations. And they will also change as we find out what we really are going to encounter in the cores. We are all getting to know each other and learning what each others’ interests are and trying to come up with a plan that will maximize what we can discover with the materials we will collect on this cruise. It is very different than anything I have done before, and it is exciting. I think it will be a really rewarding experience. The group seems to already have developed a good rapport, and we are all very optimistic.

Before the <i>Joides Resolution</i> leaves port, Lisa Crowder (left) and Rebecca Robinson (right) take students from a girls' school in Mauritius on a tour. Photo: Tim Fulton/IODP

Before the JOIDES Resolution leaves port, Lisa Crowder (left) and Rebecca Robinson (right) take students from a girls’ school in Mauritius on a tour. Photo: Tim Fulton/IODP

While we have been in Mauritius, the BBC picked up on our work, and twitter has been atwitter with blurbs about the cruise and people on the cruise. We have also had quite a few tours through the ship. Dick Norris (from Scripps Institution of Oceanography) and I went to a girls’ school yesterday and discussed global change and encouraged them to think about science. A small group of the girls from that school came for a tour today, and they seemed really keen and engaged. Lisa Crowder, who oversees the ship’s technicians working with core processing protocols and lab equipment, gave a really awesome show that we all enjoyed! She used the straw-in-the-milkshake analogy for coring in the ocean. It was a great visual!

It is supposed to be quite windy tomorrow, so I’m nervous about being seasick and I’m going to take my Dramamine first thing in the morning. I sure hope I am going to get my sea legs quickly!

Sidney Hemming is a geochemist and professor of Earth and Environmental Sciences at Lamont-Doherty Earth Observatory. She uses the records in sediments and sedimentary rocks to document aspects of Earth’s history.

New Columbia Center Aims to Tap Business for Climate Studies - Chronicle of Philanthropy

Featured News - Tue, 02/02/2016 - 13:51
With government funding for climate science stagnant, a new center at Columbia University is working to engage corporate donors to back research on environmental changes and how humans can adapt to them. "It’s a very new way of funding science," said Lamont's Peter deMenocal, director of the Center for Climate & Life.

Listen to Seismic Waves from Inside the Earth - The Creators Project

Featured News - Tue, 02/02/2016 - 12:00
Lamont's Ben Holtzman and the Seismic Sound Lab turn data from seismometers into a visual and auditory experience.

Center for Climate & Life: Changing the Way We Do & Fund Science - Nature

Featured News - Mon, 02/01/2016 - 15:23
Columbia's Center for Climate & Life is engaging corporate philanthropists to boost funding for research into the effects of projected environmental changes and how human systems can adapt.

Greenland's Glaciers & Climate Change - 60 Minutes

Featured News - Sun, 01/31/2016 - 12:00
60 Minutes reports from Greenland's Petermann Glacier, then visits with Lamont-Doherty's Peter deMenocal at the Core Lab to discuss some of the most significant efforts to study climate change happening today.

European Summers Are at Their Warmest in Two Millennia - Climate Progress

Featured News - Fri, 01/29/2016 - 10:52
We may have underestimated how hot European summers are today, compared to the region's past, according to a new study. Lamont's Jason Smerdon explains.

Shaking in U.S. Northeast Caused by Sonic Boom, Not Quake - Reuters

Featured News - Thu, 01/28/2016 - 17:10
Residents from New Jersey to Connecticut reported feeling earthquake-like shaking on Thursday afternoon. Lamont seismologist Won-Young Kim tells Reuters that instruments measured vibrations and low-frequency sound waves consistent with about eight sonic booms. .

Unlocking Antarctica's Secrets: The Ross Ice Shelf - Minnesota Public Radio

Featured News - Thu, 01/28/2016 - 16:57
Underneath Antarctica's Ross Ice Shelf is the least-known piece of ocean floor on our planet. We know almost nothing about it, but it's the size of France, Lamont-Doherty's Robin Bell tells MPR. Bell's IcePod team has been mapping that ocean floor.

Geoengineering Would Not Work in All Oceans - Scientific American

Featured News - Thu, 01/28/2016 - 12:00
Sediment cores show that in the past, higher iron concentrations in the equatorial Pacific did not enhance growth of carbon-storing algae, according to a new study from Lamont's Kassandra Costa.

Uncovering the Stories of Southern Africa’s Climate Past

When Oceans Leak - Wed, 01/27/2016 - 10:19
 Arnold L. Gordon.

The Agulhas Current runs along the southern coast of Africa and is influenced by other flows. Credit: Arnold L. Gordon.

I am on my way to Mauritius to spend a few days with my husband, Gary, before boarding the JOIDES Resolution in Port Louis for a two-month research cruise, IODP Expedition 361, South African Climates. This is my first cruise as co-chief scientist, so I am both excited and nervous. The goal of the cruise is to obtain climate records for the past 5 million years at six sites around southern Africa. Each has its own special focus.

As with research cruises in general, this represents the culmination of a huge effort over many years, in this case led by Rainer Zahn and my co-chief scientist, Ian Hall. Those efforts included planning workshops, site survey cruises, proposal writing and re-writing, and a lot of development of stratigraphic records and proxies of climate-sensitive factors such as temperature and salinity of surface and intermediate waters, positions of the “Agulhas Retroflection,” and evidence for deep ocean circulation.

The cruise is going to an exciting place in the global climate system. Evidence for the vigor of North Atlantic Deep Water overturning circulation (a.k.a. the Great Ocean Conveyor) can be found in the same sediment cores taken from the floor of the southern Cape Basin, off the southwest coast of South Africa, where scientists have found evidence for Agulhas “leakage” of warm, salty water from the Indian Ocean into the Atlantic Ocean.

The Agulhas Current is the strongest western boundary current in the world’s oceans. It flows along the eastern side of southern Africa, and when it reaches the tip of Africa, it is “retroflected” to flow east, parallel to the Antarctic Circumpolar Current. Changes in the Agulhas Current are coincident with climate change in Africa, and thus it may even have been an important factor in the evolution of our species in Africa. Lamont-Doherty Earth Observatory’s Arnold Gordon has made the case that leakage of salt and heat from the Agulhas Current into the Atlantic Ocean is one of the ingredients that enhances North Atlantic Deep Water overturning circulation.

On this cruise, we’re studying the current to collectively try to uncover the story of southern African climates and their connections with global ocean circulation and climate variability for the past 5 million years.

My role is to use the layers of sediment on the ocean floor that either blew in or washed in from land to contribute to understanding of rainfall and runoff, weathering on Africa, and changes in the Agulhas. (I’m working on this in collaboration with fellow sailing scientists Allison Franzese of Lamont, Margit Simon of the University of Bergen, and Ian Hall of Cardiff University, and shore-based scientist Steve Goldstein at Lamont). Questions about rainfall and runoff and weathering will be tackled in sediment cores that are near major rivers. These efforts will also serve to characterize the composition of sediments being carried in the Agulhas Current.

Fortuitously, the sources of sediments along the eastern coast of South Africa have significantly different radiogenic isotopes than those on the western side. Radiogenic isotopes are isotope systems that change due to radioactive decay of a parent isotope and thus respond to the time-aspects of geologic history. We discovered during the Ph.D. thesis of former graduate student Randye Rutberg that in RC11-83, a pretty famous sediment core from the southern Cape Basin, the land-derived sediments have a higher ratio of Strontium-87 to Strontium-86 during warmer climate intervals than during colder intervals, and the values are so high as to require an external source—that is the eastern side of Africa and carried into the Atlantic via the Agulhas leakage.

Franzese did her Ph.D. thesis on land-derived sediment evidence of changes in the Agulhas between glacial times, about 20,000 years ago, and modern times. She documented the map pattern of variability of both the sources and sediment changes, and further confirmed the role of the Agulhas Current in depositing sediments in the South Atlantic. It will be super exciting to extend the observations back to 5 million years and explore how the sources, as well as the Agulhas Current itself, may have changed.

Sidney Hemming is a geochemist and professor of Earth and Environmental Sciences at Lamont-Doherty Earth Observatory. She uses the records in sediments and sedimentary rocks to document aspects of Earth’s history.

Oil Seeps Bringing Up Nutrients for Phytoplankton - The Australian

Featured News - Mon, 01/25/2016 - 17:25
A new study by Ajit Subramaniam and Andy Juhl looks into the discovery of large phytoplankton populations over oil seeps in the Gulf of Mexico.

Analysis with paprica

Chasing Microbes in Antarctica - Mon, 01/25/2016 - 12:42

paprikaThis tutorial is both a work in progress and a living document.  If you see an error, or want something added, please let me know by leaving a comment.

I’ve been making a lot of improvements to paprica, our program for conducting metabolic inference on 16S rRNA gene sequence libraries.  The following is a complete analysis example with paprica to clarify the steps described in the manual, and to highlight some of the recent improvements to the method.  I’ll continue to update this tutorial as the method evolves.  This tutorial assumes that you have all the dependencies for paprica_run.sh installed and in your PATH.  If you’re a Mac user you can follow the instructions here.  If you’re a Linux user (including Windows users running Linux in a VirtualBox) installation is a bit simpler, just follow the instructions in the manual.  The current list of dependencies are:

  1.  Infernal (including Easel)
  2.  pplacer (including Guppy)
  3. Python version 2.7 (I strongly recommend using Anacondas)
  4. Seqmagick

It also assumes that you have the following Python modules installed:

  1.  Pandas
  2.  Joblib
  3.  Biopython

Finally, it assumes that you are using the provided database of metabolic pathways and genome data included in the ref_genome_database directory.  At some point in the future I’ll publish a tutorial describing how to use paprica_build.sh to build a custom database.  All the dependencies are installed and tested?  Before we start let’s get familiar with some terminology.

community structure: The taxonomic structure of a bacterial assemblage.

edge: An edge is a point of placement on a reference tree.  Think of it as a branch of the reference tree.  Edges take the place of OTUs in this workflow, and are ultimately far more interesting and informative than OTUs.  Refer to the pplacer documentation for more.

metabolic structure: The abundance of different metabolic pathways within a bacterial assemblage.

reference tree: This is the tree of representative 16S rRNA gene sequences from all completed Bacterial genomes in Genbank.  The topology of the tree defines what pathways are predicted for internal branch points.

Now let’s get paprica.  There are two options here.  Option 1 is to use git to download the development version.  The development version has the most recent bug fixes and features, but has not been fully tested.  That means that it’s passed a cursory test of paprica_build.sh and paprica_run.sh on my development system (a Linux workstation), but I haven’t yet validated paprica_run.sh on an independent machine (a Linux VirtualBox running on my Windows laptop).  The development version can be downloaded and made executable with:

git clone https://github.com/bowmanjeffs/paprica.git cd paprica chmod a+x *sh

Option 2 is to download the last stable version.  In this case stable means that paprica_build.sh and paprica_run.sh successfully built on the development system, paprica_run.sh successfully ran on a virtual box, and that paprica_build.sh and paprica_run.sh both successfully ran a second time from the home directory of the development machine.  The database produced by this run is the one that can be found in the latest stable version.  Downloading and making the latest stable version executable is the recommended way of getting paprica, but requires a couple of additional steps.  For the current stable release (v0.22):

wget https://github.com/bowmanjeffs/paprica/archive/paprica_v0.22.tar.gz tar -xzvf paprica_v0.22.tar.gz mv paprica-paprica_v0.22 paprica cd paprica chmod a+x *sh

Now using your text editor of choice (I recommend nano) you should open the file titled paprica_profile.txt.  This file will look something like:

## Variables necessary for the scripts associated with paprica_run.sh and paprica_build.sh. # This is the location of the reference directory. ref_dir=~/paprica/ref_genome_database/ # This is the location of the covariance model used by Infernal. cm=~/paprica/bacterial_ssu.cm ## Variables associated with paprica_build.sh only. # This is the number of cpus RAxML will use. See RAxML manual for guidance. cpus=8 # This is the location where you want your pgdbs placed. This should match # what you told pathway-tools, or set to pathway-tools default location if # you didn't specify anything. pgdb_dir=~/ptools-local/pgdbs/user/ ## Variables associated with paprica_run.sh only. # The fraction of terminal daughters that need to have a pathway for it # to be included in an internal node. cutoff=0.5

For the purposes of this tutorial, and a basic analysis with paprica, we are only concerned with the ref_dir, cm, and cutoff variables.  The ref_dir variable is the location of the reference database.  If you downloaded paprica to your home directory, and you only intend to use paprica_run.sh, you shouldn’t need to change it.  Ditto for cm, which is the location of the covariance model used by Infernal.  The cutoff variable specifies in what fraction of genomes in a given clade a metabolic pathway needs to appear in to be assigned to a read placed to that clade.  In practice 50 % works well, but you may wish to be more or less conservative depending on your objectives.  If you want to change it simply edit that value to be whatever you want.

Now go ahead and make sure that things are working right by executing paprica_run.sh using the provided file test.fasta.  From the paprica directory:

./paprica_run.sh test

This will produce a variety of output files in the paprica directory:

ls test* test.clean.align.sto test.clean.fasta test.combined_16S.tax.clean.align.csv test.combined_16S.tax.clean.align.fasta test.combined_16S.tax.clean.align.jplace test.combined_16S.tax.clean.align.phyloxml test.combined_16S.tax.clean.align.sto test.edge_data.csv test.fasta test.pathways.csv test.sample_data.txt test.sum_pathways.csv

Each sample fasta file that you run will produce similar output, with the following being particularly useful to you:

test.combined_16S.tax.clean.align.jplace: This is a file produced by pplacer that contains the placement information for your sample.  You can do all kinds of interesting things with sets of jplace files using Guppy.  Refer to the Guppy documentation for my details.

test.combined_16S.tax.clean.align.phyloxml: This is a “fat” style tree showing your the placements of your query on the reference tree.  You can view this tree using Archaeopteryx.

test.edge_data.csv: This is a csv format file containing data on edge location in the reference tree that received a placement, such as the number of reads that placed, predicted 16S rRNA gene copies, number of reads placed normalized to 16S rRNA gene copies, GC content, etc.  This file describes the taxonomic structure of your sample.

test.pathways.csv: This is a csv file of all the metabolic pathways inferred for test.fasta, by placement.  All possible metabolic pathways are listed, the number attributed to each edge is given in the column for that edge.

test.sample_data.txt: This file described some basic information for the sample, such as the database version that was used to make the metabolic inference, the confidence score, total reads used, etc.

test.sum_pathways.csv: This csv format file describes the metabolic structure of the sample, i.e. pathway abundance across all edges.

Okay, that was all well and good for the test.fasta file, which has placements only for a single edge and is not particularly exciting.  Let’s try something a bit more advanced.  Create a directory for some new analysis on your home directory and migrate the necessary paprica files to it:

cd ~ mkdir my_analysis cp paprica/paprica_run.sh my_analysis cp paprica/paprica_profile.txt my_analysis cp paprica/paprica_place_it.py my_analysis cp paprica/paprica_tally_pathways.py my_analysis

Now add some fasta files for this tutorial.  The two fasta files are from Luria et al. 2014 and Bowman and Ducklow, 2015.  They are a summer and winter surface sample from the same location along the West Antarctic Peninsula.  I recommend wget for downloads of this sort, if you don’t have it, and don’t want to install it for some reason, use curl.

cd my_analysis wget http://www.polarmicrobes.org/extras/summer.fasta wget http://www.polarmicrobes.org/extras/winter.fasta

We’re cheating a bit here, because these samples have already been QC’d.  That means I’ve trimmed for quality and removed low quality reads, removed chimeras, and identified and removed mitochondria, chloroplasts, and anything that didn’t look like it belonged to the domain Bacteria.  I used Mothur for all of these tasks, but you may wish to use other tools.

Run time may be a concern for you if you have many query files to run, and/or they are particularly large.  The rate limiting step in paprica is pplacer.  We can speed pplacer up by telling paprica_place_it.py to split the query fasta into several pieces that pplacer will run in parallel.  Be careful of memory useage!  pplacer creates two threads automatically when it runs, and each thread uses about 4 Gb of memory.  So if you’re system has only 2 cpus and 8 Gb of memory don’t use this option!  If you’re system has 32 Gb of RAM I’d recommend 3 splits, so that you don’t max things out.

While we’re modifying run parameters let’s make one additional change.  The two provided files have already been subsampled so that they have equal numbers of reads (1,977).  We can check this with:

grep -c '>' *fasta summer.fasta:1977 winter.fasta:1977

But suppose this wasn’t the case?  It’s generally a good idea to subsample your reads to the size of the smallest library so that you are viewing diversity evenly across samples.  You can get paprica to do this for you by specifying the number of reads paprica_place_it.py should use.

To specify the number of splits and the number of reads edit the paprica_place_it.py flags in paprica_run.sh:

## default line #python paprica_place_it.py -query $query -ref combined_16S.tax -splits 1 ## new line python paprica_place_it.py -query $query -ref combined_16S.tax -splits 3 -n 1000

This will cause paprica to subsample the query file (by random selection) to 1000 reads, and split the subsampled file into three query files that will be run in parallel. The parallel run is blind to you, the output should be identical to a run with no splits (-splits 1). If you use subsampling you’ll also need to change the paprica_tally_pathways.py line, as the input file name will be slightly different.

## default line python paprica_tally_pathways.py -i $query.sub.combined_16S.tax.clean.align.csv -o $query ## new line python paprica_tally_pathways.py -i $query.combined_16S.tax.clean.align.csv -o $query

Here we are only analyzing two samples, so running them manually isn’t too much of a pain. But you might have tens or hundreds of samples, and need a way to automate that. We do this with a simple loop. I recommend generating a file with the prefixes of all your query files and using that in the loop. For example the file samples.txt might have:

summer winter

This file can be inserted into a loop as:

while read f;do ./paprica_run.sh $f done < samples.txt

Note that we don’t run them in parallel using say, gnu parallel, because Infernal is intrinsically parallelized, and we already forced pplacer to run in parallel using -splits.

Once you’ve executed the loop you’ll see all the normal paprica output, for both samples.  It’s useful to concatenate some of this information for downstream analysis.   The provided utility combine_edge_results.py can do this for you.  Copy it to your working directory:

cp ~/paprica/utilities/combine_edge_results.py combine_edge_results.py

This script will automatically aggregate everything with the suffix .edge_data.csv.  You need to specify a prefix for the output files.

python combine_edge_results.py my_analysis

This produces two files:

my_analysis.edge_data.csv: This is the mean genome parameters for each sample.  Lots of good stuff in here, see the column labels.

my_analysis.edge_tally.csv: Edge abundance for each sample (corrected for 16S rRNA gene copy).  This is your community structure, and is equivalent to an OTU table (but much better!).

To be continued…

Microbes Congregate above Natural Oil Seeps in the Gulf - UPI

Featured News - Mon, 01/25/2016 - 12:00
Scientists have discovered a new biological phenomenon in the Gulf of Mexico. Phytoplankton communities are thriving above natural oil seeps, according to a new study from Lamont's Ajit Subramaniam and Andy Juhl.

Why Are Hurricanes Forming in January? - The Conversation

Featured News - Thu, 01/21/2016 - 12:00
January hurricanes are extremely rare, but this year, two have already formed. Lamont's Adam Sobel takes a look at what's fueling the storms.

This Winter Storm Could Make It Into Coastal Flood Record Books - WXshift

Featured News - Wed, 01/20/2016 - 12:00
Coastal flooding is a major concern as a major winter storm heads for the East Coast this weekend. Lamont-Doherty's Adam Sobel discusses what goes into a storm surge and why the risk is high.

Mystery Beneath the Ice - PBS NOVA

Featured News - Wed, 01/20/2016 - 12:00
What’s behind the death of a tiny creature with an outsized role in the Antarctic? Lamont-Doherty's Hugh Ducklow and his team at Palmer Station take a PBS camera crew beneath the ice.

2015 Officially the Warmest Year on Record - Mashable

Featured News - Wed, 01/20/2016 - 12:00
NOAA and NASA confirm that 2015 was the warmest year on record. Lamont-Doherty's Jason Smerdon calls the record alarming but not surprising. "The trend has been predicted for decades, and all the consequences associated with it have been predicted, as well," he said.

Installing paprica on Mac OSX

Chasing Microbes in Antarctica - Wed, 01/20/2016 - 09:55

The following is a paprica installation tutorial for novice users on Mac OSX (installation is Linux is quite a bit simpler). If you’re comfortable editing your PATH and installing things using bash you probably don’t need to follow this tutorial, just follow the instructions in the manual. If command line operations stress you out, and you haven’t dealt with a lot of weird bioinformatics program installs, use this tutorial.

Please note that this tutorial is a work in progress.  If you notice errors, inconsistencies, or omissions please leave a comment and I’ll be sure to correct them.

paprica is 90 % an elaborate wrapper script (or set of scripts) for several core programs written by other groups. The scripts that execute the pipeline are bash scripts, the scripts that do that actual work are Python. Therefor you need to get Python up and running on your system. The version that came with your system won’t suffice without heavy modification. Best to use a free third-party distro like Anaconda (preferred) or Canopy.  If you already have a mainstream v2.7 Python distro going just make sure that the biopython, joblib, and pandas modules are installed and you’re good to go.

If not please download the Anaconda distro and install it following the developer’s instructions. Allow the installer to modify your PATH variable. Once the installation is complete update it by executing:

conda update conda conda update --all

Then you’ll need to install biopython, joblib, and pandas:

conda install biopython conda install joblib conda install pandas

In case you have conflicts with other Python installations, or some other mysterious problems, it’s a good idea to test things out at this point. Open a shell, type “Python”, and you should get a welcome message that specifies Anaconda as your distro. Type:

import Bio import joblib import pandas

If you get any error messages something somewhere is wrong. Burn some incense and try again. If that doesn’t work try holy water.

One challenge with paprica on OSX has to do with the excellent program pplacer. The pplacer binary for Darwin needs the Gnu Scientific Library (GSL), specifically v1.6 (at the time of writing). You can try to compile this from source, but I’ve had trouble getting this to work on OSX. The easier option is to use a package manager, preferably Homebrew. This means however, that you have to marry one of the OSX package managers and never look back. Fink, Macports, and Homebrew will all get you a working version of GSL. I recommend using Homebrew.

To download Homebrew (assuming you don’t already have it) type:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Follow the on-screen instructions. Once it is downloaded type:

brew install GSL

This should install the Gnu Scientific Library v1.6.

Assuming all that went okay go ahead and download the software you need to execute just the paprica_run.sh portion of paprica. First, the excellent aligner Infernal. From your home directory:

curl -O http://selab.janelia.org/software/infernal/infernal-1.1.1-macosx-intel.tar.gz tar -xzvf infernal-1.1.1-macosx-intel.tar.gz mv infernal-1.1.1-macosx-intel infernal

Then pplacer, which also includes Guppy:

curl -O https://github.com/matsen/pplacer/releases/download/v1.1.alpha17/pplacer-Darwin-v1.1.alpha17.zip unzip pplacer-Darwin-v1.1.alpha17.zip mv pplacer-Darwin-v1.1.alpha17 pplacer

Now comes the tricky bit, you need to add the locations of the executables for these programs to your PATH variable. Don’t screw this up. It isn’t hard to undo screw-ups, but it will freak you out. Before you continue please read the excellent summary of shell startup scripts as they pertain to OSX here:

http://hayne.net/MacDev/Notes/unixFAQ.html#shellStartup

Assuming that you are new to the command line, and did not have a .bash_profile or .profile file already, the Anaconda install would have created .profile and added it’s executables to your path. From your home directory type:

nano .profile

Navigate to the end of the file and type:

export PATH=/Users/your-user-name/infernal/binaries:/Users/you-user-name/pplacer:${PATH}

Don’t be the guy or gal who types your-user-name. Replace with your actual user name. Hit ctrl-o to write out the file, and ctrl-x to exit nano. Re-source .profile by typing:

source .profile

Confirm that you can execute the following programs by navigating to your home directory and executing each of the following commands:

cmalign esl-alimerge pplacer guppy

You should get an error message that is clearly from the program, not a bash error like “command not found”.

Now you need to install the final dependency, Seqmagick. Confirm the most current stable release by going to Github, then download it:

curl -O https://github.com/fhcrc/seqmagick/archive/0.6.1.tar.gz tar -xzvf 0.6.1 cd 0.6.1 python setup.py install

Check the installation by typing:

seqmagick mogrify

You should get a sensible error that is clearly seqmagick yelling at you.

Okay, now you are ready to download paprica and do some analysis! Download the latest stable version of paprica (don’t just blindly download, please check Github for the latest stable release):

curl -O https://github.com/bowmanjeffs/paprica/archive/paprica_v0.23.tar.gz tar -xzvf https://github.com/bowmanjeffs/paprica/archive/paprica_v0.23.tar.gz mv paprica-paprica_v0.23 paprica

Now you need to make paprica_run.sh executable

cd paprica chmod a+x paprica_run.sh

At this point you should be ready to rock. Take a deep breath and type:

./paprica_run.sh test

You should see a lot of output flash by on the screen, and you should find that the files test.pathways.csv, test.edge_data.csv, test.sample_data.txt, and test.sum_pathways.txt in your directory. These are the primary output files from paprica. The other files of interest are the Guppy output files test.combined_16S.tax.clean.align.phyloxml and test.combined_16S.tax.clean.align.jplace. Check out the Guppy documentation for the many things you can do with jplace files. The phyloxml file is an edge fattened tree of the query placements on the reference tree. It can be viewed using Archaeopteryx or another phyloxml capable tree viewer.

To run your own analysis, say on amazing_sample.fasta, simply type:

./paprica_run.sh amazing_sample

Please, please, please, read the manual (included in the paprica download) for further details, such as how to greatly decrease the run time on large fasta files, and how to sub-sample your input fasta. Remember that the fasta file you input should contain only reads you are reasonably sure come from bacteria (an archaeal version is a long term goal), and they should be properly QC’d (i.e. low quality ends and adapters and barcodes and such trimmed away).

Drones in a Cold Climate - Eos

Featured News - Tue, 01/19/2016 - 12:00
As climate change reshapes the Earth's polar regions, scientists turn to drone-mounted cameras to measure sea ice. Lamont-Doherty's Frank Nitsche and colleagues explain the challenges of flying drones near Antarctica. It's tougher than it looks.

Pages

 

Subscribe to Lamont-Doherty Earth Observatory aggregator