News aggregator

Exploring genome content and genomic character with paprica and R

Chasing Microbes in Antarctica - Thu, 07/07/2016 - 13:07

The paprica pipeline was designed to infer the genomic content and genomic characteristics of a set of 16S rRNA gene reads.  To enable this the paprica database organizes this information by phylogeny for many of the completed genomes in Genbank.  In addition to metabolic inference this provides an opportunity to explore how genome content and genomic characteristics are organized phylogenetically.  The following is a brief analysis of some genomic features using the paprica database and R.  If you aren’t familiar with the paprica database this exercise will also familiarize you with some of its content and its organization.

The paprica pipeline and database can be obtained from Github here.  In this post I’ll be using the database associated with version 0.3.1.  The necessary files from the bacteria database (one could also conduct this analysis on the much smaller archaeal database) can be read into R as such:

## Read in the pathways associated with the terminal nodes on the reference tree path <- read.csv('paprica/ref_genome_database/bacteria/terminal_paths.csv', row.names = 1) path[is.na(path)] <- 0 ## Read in the data associated with all completed genomes in Genbank data <- read.csv('paprica/ref_genome_database/bacteria/genome_data.final.csv', row.names = 1) ## During database creation genomes with duplicate 16S rRNA genes were removed, ## so limit to those that were retained data <- data[row.names(data) %in% row.names(path),] ## "path" is ordered by clade, meaning it is in top to bottom order of the reference tree, ## however, "data" is not, so order it here data <- data[order(data$clade),]

One fun thing to do at this point is to look at the distribution of metabolic pathways across the database.  To develop a sensible view it is best to cluster the pathways according to which genomes they are found in.

## The pathway file in the database is binary, so we use Jaccard for distance library('vegan') path.dist <- vegdist(t(path), method = 'jaccard') # distance between pathways (not samples!) path.clust <- hclust(path.dist)

The heatmap function is a bit cumbersome for this large matrix, so the visualization can be made using the image function.

## Set a binary color scheme image.col <- colorRampPalette(c('white', 'blue'))(2) ## Image will order matrix in ascending order, which is not what we want here! image(t(data.matrix(path))[rev(path.clust$order),length(row.names(path)):1],       col = image.col,       ylab = 'Genome',       xlab = 'Pathway',       xaxt = 'n',       yaxt = 'n') box()
The distribution of metabolic pathways across all 3,036 genomes in the v0.3.1 paprica database.

The distribution of metabolic pathways across all 3,036 genomes in the v0.3.1 paprica database.

There are a couple of interesting things to note in this plot.  First, we can see the expected distribution of core pathways present in nearly all genomes, and the interesting clusters of pathways that are unique to a specific lineage.  For clarity row names have been omitted from the above plot, but from within R you can pull out the taxa or pathways that interest you easily enough.  Second, there are some genomes that have very few pathways.  There are a couple of possible reasons for this that can be explored with a little more effort.  One possibility is that these are poorly annotated genomes, or at least the annotation didn’t associate many or any coding sequences with either EC numbers or GO terms – the two pieces of information Pathway-Tools uses to predict pathways during database construction.  Another possibility is that these genomes belong to obligate symbionts (either parasites or beneficial symbionts).  Obligate symbionts often have highly streamlined genomes and few complete pathways.  We can compare the number of pathways in each genome to other genome characteristics for additional clues.

A reasonable assumption is that the number of pathways in each genome should scale with the size of the genome.  Large genomes with few predicted pathways might indicate places where the annotation isn’t compatible with the pathway prediction methods.

## Plot the number of pathways as a function of genome size plot(rowSums(path) ~ data$genome_size,      ylab = 'nPaths',      xlab = 'Genome size') ## Plot P. ubique as a reference point select <- grep('Pelagibacter ubique HTCC1062', data$organism_name) points(rowSums(path)[select] ~ data$genome_size[select],        pch = 19,        col = 'red')
The number of metabolic pathways predicted as a function of genome size for the genomes in the paprica database.

The number of metabolic pathways predicted as a function of genome size for the genomes in the paprica database.

That looks pretty good.  For the most part more metabolic pathways were predicted for larger genomes, however, there are some exceptions.  The red point gives the location of Pelagibacter ubique HTCC1062.  This marine bacterium is optimized for life under energy-limited conditions.  Among its adaptations are a highly efficient and streamlined genome.  In fact it has the smallest genome of any known free-living bacterium.  All the points below it on the x-axis are obligate symbionts; these are dependent on their host for some of their metabolic needs.  There are a few larger genomes that have very few (or even no) pathways predicted.  These are the genomes with bad, or at least incompatible annotations (or particularly peculiar biochemistry).

The other genome parameters in paprica are the number of coding sequences identified (nCDS), the number of genetic elements (nge), the number of 16S rRNA gene copies (n16S), GC content (GC), and phi; a measure of genomic plasticity.  We can make another plot to show the distribution of these parameters with respect to phylogeny.

## Grab only the data columns we want data.select <- data[,c('n16S', 'nge', 'ncds', 'genome_size', 'phi', 'GC')] ## Make the units somewhat comparable on the same scale, a more ## careful approach would log-normalize some of the units first data.select.norm <- decostand(data.select, method = 'standardize') data.select.norm <- decostand(data.select.norm, method = 'range') ## Plot with a heatmap heat.col <- colorRampPalette(c('blue', 'white', 'red'))(100) heatmap(data.matrix(data.select.norm),       margins = c(10, 20),       col = heat.col,       Rowv = NA,       Colv = NA,       scale = NULL,       labRow = 'n',       cexCol = 0.8)
Genomic parameters organized by phylogeny.

Genomic parameters organized by phylogeny.

Squinting at this plot it looks like GC content and phi are potentially negatively correlated, which could be quite interesting.  These two parameters can be plotted to get a better view:

plot(data.select$phi ~ data.select$GC,      xlab = 'GC',      ylab = 'phi')
The phi parameter of genomic plasticity as a function of GC content.

The phi parameter of genomic plasticity as a function of GC content.

Okay, not so much… but I think the pattern here is pretty interesting.  Above a GC content of 50 % there appears to be no relationship, but these parameters do seem correlated for low GC genomes.  This can be evaluated with linear models for genomes above and below 50 % GC.

gc.phi.above50 <- lm(data.select$phi[which(data.select$GC >= 50)] ~ data.select$GC[which(data.select$GC >= 50)]) gc.phi.below50 <- lm(data.select$phi[which(data.select$GC < 50)] ~ data.select$GC[which(data.select$GC < 50)]) summary(gc.phi.above50) summary(gc.phi.below50) plot(data.select$phi ~ data.select$GC,      xlab = 'GC',      ylab = 'phi',      type = 'n') points(data.select$phi[which(data.select$GC >= 50)] ~ data.select$GC[which(data.select$GC >= 50)],        col = 'blue') points(data.select$phi[which(data.select$GC < 50)] ~ data.select$GC[which(data.select$GC < 50)],        col = 'red') abline(gc.phi.above50,        col = 'blue') abline(gc.phi.below50,        col = 'red') legend('bottomleft',        bg = 'white',        legend = c('GC >= 50',                   'GC < 50'),        col = c('blue', 'red'),        pch = 1)

Genomic plasticity (phi) as a function of GC content for all bacterial genomes in the paprica database.

As expected there is no correlation between genomic plasticity and GC content for the high GC genomes (R2 = 0) and a highly significant correlation for the low GC genomes (albeit with weak predictive power; R2 = 0.106, p = 0).  So what’s going on here?  Low GC content is associated with particular microbial lineages but also with certain ecologies.  The free-living low-energy specialist P. ubique HTCC1062 has a low GC content genome for example, as do many obligate symbionts regardless of their taxonomy (I don’t recall if it is known why this is).  Both groups are associated with a high degree of genomic modification, including genome streamlining and horizontal gene transfer.

As Glaciers Melt in Alaska, Landslides Follow - New York Times

Featured News - Tue, 07/05/2016 - 12:00
Cites work by Colin Stark and Göran Ekström.

What Triggered the Massive Glacier Bay Landslide? - CS Monitor

Featured News - Tue, 07/05/2016 - 12:00
Seismic recordings registered a massive landslide in Alaska's Glacier Bay National Park, and scientists are studying how the region's geology and environmental change are elevating the risk of mountain landslides. Cites work by Colin Stark and Göran Ekström.

Massive Landslide Crashes onto Glacier in Southeast Alaska - Alaska Dispatch

Featured News - Sat, 07/02/2016 - 12:00
More than 100 million tons of rock slid down a mountainside in Southeast Alaska on Tuesday morning, sending debris miles across a glacier below and a cloud of dust into the air. Lamont's Colin Stark and colleagues analyzed the landslide through its seismic waves.

Crippled Atlantic Conveyor Linked to Ice Age Climate Change - Science

Featured News - Thu, 06/30/2016 - 14:55
Slowdowns of the Atlantic meridional overturning circulation have long been suspected as a cause of the climate swings during the last ice age, but never definitively shown, until now. The new study “is the best demonstration that this indeed happened,” says Lamont's Jerry McManus.

Antarctic Sea Ice Affects Ocean Circulation - Europa Press

Featured News - Tue, 06/28/2016 - 12:00
A new study led by Lamont's Ryan Abernathey shows how sea ice migration around Antarctica be more important for global ocean overturning circulation than previously thought. (In Spanish)

Predictions of More Blazing Heat, Drought and Fires in the West - Washington Post

Featured News - Thu, 06/23/2016 - 12:00
The burning sensation in the southwestern United States was diagnosed by climate scientists more than a year ago, the Washington Post writes. The Post cites research by Lamont-Doherty scientist Park William into connections between the California drought and climate change.

California Firefighters Wrangle With Dead Trees - KQED

Featured News - Wed, 06/22/2016 - 12:00
California's overworked firefighters are being forced to take on another task — clearing dead and dying trees. John Upton talks with Lamont's Park Williams about the role of drought and rising temperatures.

Greenland's Vast Melt and Its Influence on Atlantic Circulation - Washington Post

Featured News - Mon, 06/20/2016 - 12:00
High-resolution ocean models that can capture eddies are extremely important for understanding the fate of freshwater in the sea around Greenland, says Lamont's Marco Tedesco.

Water Vapor vs Carbon Dioxide: Which 'Wins' In Climate Warming? - Forbes

Featured News - Mon, 06/20/2016 - 11:33
The fact that water vapor is the dominant absorber in the Earth’s greenhouse effect can lead to a flawed narrative about the role of anthropogenic carbon dioxide (CO2) as driver of climate warming. Lamont's Adam Sobel helps explain.

The 6 cent speeding ticket

Chasing Microbes in Antarctica - Fri, 06/17/2016 - 11:51

I’m going to go way of the normal track here and do a bit of social commentary.  I heard a radio piece on my drive home yesterday about the challenge of paying court and legal fees for low income wage earners.  This can trap those guilty of minor offenses (like a traffic infraction) in a cycle of jail and debt that is difficult to break out of.  It never made sense to me that financial penalties – which by their nature are punitive – don’t scale with income, as is common in some European countries.  I decided to try and visualize how the weight of a penalty for say, a speeding ticket, scales with income.  It was tempting to try and scale up, i.e. what is the equivalent of $300 for someone earning $X?  I decided however, that it would be more informative to try and scale down.

Cost equivalent in minimum wage earner dollars of a $300 penalty for individuals at different income levels. Red text is the cost equivalent (y-axis value). X-axis (income) is on a log scale. See text for income data sources.

In this scenario the penalty is $300.  Someone earning the federal minimum wage and working 40 hours a week makes $15,080/year, so the $300 penalty is roughly 2 % of annual income.  So be it, perhaps that’s a fair penalty.  But what would be the equivalent if the same offender earned more?  A private/seaman/airman who has just joined the military earns roughly $18,561/year.  Paying the same ticket (and I know from experience that the military pays a lot of them) would equate to the minimum wage earner paying $243.72.  A graduate student fortunate enough to get a stipend (and own a car) might earn $25,000/year.  Paying the same ticket would be equivalent to the lowest wage earner paying $180.96, and down it goes along the income scale.  If LeBron James, who earned $77.2 million last year in salary and endorsements (according to Forbes), got the ticket, the penalty would be equivalent to the lowest income wage earner paying $0.06.  Salary data came from a variety of sources, including here and here.  Salaries marked with an asterisk in the plot above are medians from these sources.

Sea Ice Retreat May Accelerate Greenland Melting - Science

Featured News - Fri, 06/17/2016 - 11:30
Last summer the northern parts of the Greenland Ice Sheet experienced record melting as summer temperatures rose as high as 66°F. Now, a group of scientists led by Lamont's Marco Tedesco has linked the melt pattern with a high-pressure vortex, known as a block, that loitered north of the island during June and July 2015, wreaking weather havoc. Some researchers say such atmospheric blocks are expected to result from melting sea ice.

The Weird Weather that Entrenched California's Drought - Climate Central

Featured News - Tue, 06/14/2016 - 12:28
Climate change has pushed up average temperatures by nearly 2°F worldwide. Most of California was warmer than that from March through May, with some patches of the state more than 4°F warmer than average. “This does not look like a typical El Niño year out West,” said Lamont's Ben Cook.

Globalized Economy More Susceptible to Weather Extremes, Scientists Warn - Reuters

Featured News - Fri, 06/10/2016 - 17:44
The globalization of the world's economy this century has made it far more vulnerable to the impacts of extreme weather, including heat stress on workers, according to a new study from Lamont's Anders Levermann.

The Carbon Vault

Geopoetry - Fri, 06/10/2016 - 14:41
 K. Allen, 2010

Basaltic rock, Iceland. Photo: K. Allen, 2010

 

The skin of the Earth is the color of tar,

Ridged, freshly healed like the seams of a scar.

Through salt-spattered sky, a gray-winged gull sails;

Steam gently rises, the island exhales.

 

A power plant rests on porous basalt,

In spaces beneath, a dark final vault.

Carbon is cached with a strong crystal lock,

Ashes to ashes, rock back to rock.

 

______________________________________________________

Further reading:

In a First, Iceland Power Plant Turns Carbon Emissions to Stone, K. Krajick, Lamont-Doherty Earth Observatory

Rapid carbon mineralization for permanent disposal of anthropogenic carbon dioxide emissions, Matter et al., Science

Scientists Turn Carbon Dioxide Emissions into Stone, Magill, Climate Central

This is one in a series of posts by Katherine Allen, a researcher in geochemistry and paleoclimate at the School of Earth & Climate Sciences at the University of Maine.

Save

Warmer Arctic, Melting Glaciers Accelerating Greenland Ice Loss - CBC

Featured News - Fri, 06/10/2016 - 12:00
2015 was a record year for high temperatures and melting glaciers in western Greenland, an effect that is amplifying itself and could lead to accelerated warming in the Arctic, new research from Lamont's Marco Tedesco explains.

A New Solution to Carbon Pollution? - Science

Featured News - Thu, 06/09/2016 - 18:03
Researchers working in Iceland, including Lamont's Martin Stute, say they have discovered a new way to trap the greenhouse gas carbon dioxide deep underground by changing it into rock.

Martin Stute: Putting CO2 Away for Good by Turning It to Stone - The Conversation

Featured News - Thu, 06/09/2016 - 17:00
Lamont's Martin Stute writes about the CarbFix project in Iceland, where he has been working with other scientists and engineers to capture CO2 emissions and create permanent storage by turning CO2 to stone.

Iceland Carbon Dioxide Storage Project Locks Away Gas, and Fast - New York Times

Featured News - Thu, 06/09/2016 - 16:20
Lamont scientists have come up with a way to store carbon dioxide that dissolves the gas with water and pumps the resulting mixture — soda water, essentially — down into certain kinds of rocks, where the CO2 reacts with the rock to form a mineral called calcite. By turning the gas into stone, scientists can lock it away permanently.

Pages

 

Subscribe to Lamont-Doherty Earth Observatory aggregator