Thanks, Station Obama: Scientists immortalize the former president in a way never seen before - Salon
I’m happy to report that a paper I wrote during my postdoc at the Lamont-Doherty Earth Observatory was published online today in the ISME Journal. The paper, Bacterial community segmentation facilitates the prediction of ecosystem function along the coast of the western Antarctic Peninsula, uses a novel technique to “segment” the microbial community present in many different samples into a few groups (“modes”) that have specific functional, ecological, and genomic attributes. The inspiration for this came when I stumbled across this blog entry on an approach used in marketing analytics. Imagine that a retailer has a large pool of customers that it would like to pester with ads tailored to purchasing habits. It’s too cumbersome to develop an individualized ad based on each customer’s habits, and it isn’t clear what combination of purchasing-habit parameters accurately describe meaningful customer groups. Machine learning techniques, in this case emergent self-organizing maps (ESOMs), can be used to sort the customers in a way that optimizes their similarity and limits the risk of overtraining the model (including parameters that don’t improve the model).
In a 2D representation of an ESOM, the customers most like one another will be organized in geographically coherent regions of the map. Hierarchical or k-means clustering can be superimposed on the map to clarify the boundaries between these regions, which in this case might represent customers that will respond similarly to a targeted ad. But what’s really cool about this whole approach is that, unlike with NMDS or PCA or other multivariate techniques based on ordination, new customers can be efficiently classified into the existing groups. There’s no need to rebuild the model unless a new type of customer comes along, and it is easy to identify when this occurs.
Back to microbial ecology. Imagine that you have a lot of samples (in our case a five year time series), and that you’ve described community structure for these samples with 16S rRNA gene amplicon sequencing. For each sample you have a table of OTUs, or in our case closest completed genomes and closest estimated genomes (CCGs and CEGs) determined with paprica. You know that variations in community structure have a big impact on an ecosystem function (e.g. respiration, or nitrogen fixation), but how to test the correlation? There are statistical methods in ecology that get at this, but they are often difficult to interpret. What if community structure could be represented as a simple value suitable for regression models?
Enter microbial community segmentation. Following the customer segmentation approach described above, the samples can be segmented into modes based on community structure with an Emergent Self Organizing Map and k-means clustering. Here’s what this looks like in practice:
This segmentation reduces the data for each sample from many dimensions (the number of CCG and CEG present in each samples) to 1. This remaining dimension is a categorical variable with real ecological meaning that can be used in linear models. For example, each mode has certain genomic characteristics:
In panel a above we see that samples belonging to modes 5 and 7 (dominated by the CEG Rhodobacteraceae and CCG Dokdonia MED134, see Fig. 2 above) have the greatest average number of 16S rRNA gene copies. Because this is a characteristic of fast growing, copiotrophic bacteria, we might also associate these modes with high levels of bacterial production.
Because the modes are categorical variables we can insert them right into linear models to predict ecosystem functions, such as bacterial production. Combined with bacterial abundance and a measure of high vs. low nucleic acid bacteria, mode accounted for 76 % of the variance in bacterial production for our samples. That’s a strong correlation for environmental data. What this means in practice is; if you know the mode, and you have some flow cytometry data, you can make a pretty good estimate of carbon assimilation by the bacterial community.
For more on what you can do with modes (such as testing for community succession) check out the article! I’ll post a tutorial on how to segment microbial community structure data into modes using R in a separate post. It’s easier than you think…
I’m happy to announce the release of paprica v0.4.0. This release adds a number of new features to our pipeline for evaluating microbial community and metabolic structure. These include:
- NCBI taxonomy information for each point of placement on the reference tree, including internal nodes.
- Inclusion of the domain Eukarya. This was a bit tricky and requires some further explanation.
Eukaryotic genomes are a totally different beast than their archaeal and bacterial counterparts. First and foremost they are massive. Because of these there aren’t very many completed eukaryotic genomes out there, particularly for singled celled eukaryotes. While a single investigator can now sequence, assemble, and annotate a bacterial or archaeal genome in very little time, eukaryotic genomes still require major efforts by consortia and lots of $$.
One way to get around this scale problem is to focus on eukaryotic transcriptomes instead of genomes. Because much of the eukaryotic genome is noncoding this greatly reduces sequencing volume. Since there is no such thing as a contiguous transcriptome, this approach also implies that no assembly (beyond open reading frames) will be attempted. The Moore Foundation-funded Marine Microbial Eukaryotic Transcriptome Sequencing Project (MMETSP) was an initial effort to use this approach to address the problem of unknown eukaryotic genetic diversity. The MMETSP sequenced transcriptomes from several hundred different strains. The taxonomic breadth of the strains sequenced is pretty good, even if (predictably) the taxonomic resolution is not. Thus, as for archaea, the phylogenetic tree and metabolic inferences should be treated with caution. For eukaryotes there are the additional caveats that 1) not all genes coded in a genome will be represented in the transcriptome 2) the database contains only strains from the marine environment and 3) eukaryotic 18S trees are kind of messy. Considerable effort went into making a decent tree, but you’ve been warned.
Because the underlying data is in a different format, not all genome parameters are calculated for the eukaryotes. 18S gene copy number is not determined (and thus community and metabolic structure are not normalized), the phi parameter, GC content, etc. are also not calculated. However, eukaryotic community structure is evaluated and metabolic structure inferred in the same way as for the domains bacteria and archaea:./paprica-run.sh test.eukarya eukarya
The R/V Langseth Is Helping Uncover Clues to Chile's Offshore Earthquakes - El Mercurio (in Spanish)
North Korea Nuclear Tests: 2010 ‘Explosion’ Was Just An Earthquake, Study Finds - International Business Times
When you rub your hands together to warm them, the friction creates heat. The same thing happens during earthquakes, only on a much larger scale: When a fault slips, the temperature can spike by hundreds of degrees, high enough to alter organic compounds in the rocks and leave a signature. A team of scientists at Columbia University’s Lamont-Doherty Earth Observatory has been developing methods to use those organic signatures to reconstruct past earthquakes and explore where those earthquakes started and stopped and how they moved through the fault zone. The information could eventually help scientists better understand what controls earthquakes.
Lamont geophysicist Heather Savage and geochemist Pratigya Polissar began developing the methods about eight years ago, building on techniques used by the oil industry. Their unique pairing of two fields – rock mechanics and organic geochemistry – made possible innovations that are changing how we look at earthquakes.
The process starts in the field, along a fault where scientists either chip off or drill samples from inside the fault zone. When sediments in a fault zone are heated by the friction of an earthquake, that short but powerful burst of heat alters the chemical composition of organic material inside the rock. (The same process over long periods of time creates oil and gas.) Scientists can examine the organic compounds in those samples and compare the ratio of stable molecules to unstable molecules to measure their thermal maturity and determine how hot each sample became.
“If even a tiny structure within a fault has had an earthquake, we can actually see the difference between how hot that piece of the fault got versus everything outside of it,” Savage said. “What we want to figure out is where the earthquakes in this big fault zone were actually happening. Do they all happen to one side? Are the distributed throughout? Are they all clustered on the weakest material within the fault zone?”
“What this does is give us a picture, almost like a heat map, of the fault itself, and the hottest places are where the earthquakes happened,” Savage said.
When temperatures are high enough, rock can melt, creating glass-like pseudotachylytes. Geologists have used these melted rock remnants for several years, but finding them is rare.
Savage, Polissar, and their team are looking closer, to the molecular level, where they can measure the thermal maturity of common organic compounds to determine how hot the sample became. They often test for methylphenanthrenes, organic molecules that are fairly common in faults within sedimentary rocks between 1 and 5 kilometers below ground. In deeper faults, some 10-14 kilometers down, the scientists can look for diamondoids, which are among the most thermally stable organic compounds.
To put their molecular data into context, the scientists also need to understand how rocks in the fault react to heat and pressure. In Lamont’s Rock and Ice Mechanics Lab, Savage’s team can test rock samples under a wide range of high pressures and temperatures. From their experiments, they can develop models that show how much shear stress and displacement are required to generate specific levels of heat in specific types of rock, and then how that heat will decay through diffusion.
Using these models, the scientists can then look at the geochemical analysis of their samples, determine the temperatures the compounds were exposed to in the past, and estimate the friction from the earthquake and how far the fault slipped.
For example, when the team tested samples from the Pasagshak Point megathrust on Alaska’s Kodiak Island, they measured the ratio of thermally stable diamondoids to thermally unstable alkanes and determined that the temperature during a past earthquake would have risen between 840°C and 1170°C above the normal temperature of the surrounding rock. From that temperature rise, they were able to estimate that the earthquake’s frictional energy would have been 105-227 megajoules per square meter, likely a magnitude 7 or 8 earthquake. Using their experimental friction measurements, they could then estimate that the fault must have slipped 1-8 meters.
At the American Geophysical Union Fall Meeting today in San Francisco, Genevieve Coffey, a graduate student in Savage’s team at Lamont, presented early results from their highest-density testing yet, involving samples taken in transects along the Muddy Mountain thrust in Nevada. One surprise was that the places where one might expect to see high temperatures because of the local structures in the rock were not necessarily the locations where they found it, Coffey said. “Structural variability along a fault does not necessary indicate that slip has occurred along that section,” she said.
Savage’s team is working on similar experiments at the San Andreas fault, and the Japan trench where the Tōhoku earthquake began, and they are working with colleagues on techniques to date the earthquakes.
“The important step for us is to determine how each of those compounds reacts to time and temperature,” Savage said. “That’s going to tell us about the physics of the earthquakes in that fault, which in the long run could lead to a better understanding of earthquake hazards.”
Learn more about Lamont-Doherty Earth Observatory, Columbia University’s home for Earth science research.
Off the coast of New Zealand, there is an area where earthquakes can happen in slow-motion as two tectonic plates grind past one another. The Pacific plate is moving under New Zealand at about 5 centimeters per year there, pulling down the northern end of the island as it moves. Every 14 months or so, the interface slowly slips, releasing the stress, and the land comes back up.
Unlike typical earthquakes that rupture over seconds, these slow-slip events take more than a week, creating an ideal lab for studying fault behavior along the shallow portion of a subduction zone.
In 2015, Spahr Webb, the Jerome M. Paros Lamont Research Professor of Observational Physics at Lamont-Doherty Earth Observatory, and an international team of colleagues became the first to capture these slow-slip earthquakes in progress using instruments deployed under the sea. The data they collected from the New Zealand site, published this year by lead author Laura Wallace of the University of Texas, will help scientists better understand earthquake risks, particularly at trenches, the seismically active interfaces between tectonic plates where one plate dives under another. Members of the team are discussing their work this week at the American Geophysical Union (AGU) Fall Meeting.
“We don’t yet understand the stickiness of the interface between the two plates, and that is partly what determines how big an earthquake you can have,” Webb said. “In particular, we care about the stickiness near the trench, because when you have a lot of motion near a trench, you can generate big tsunamis.”
Previously, scientists thought that the soft sediments piled up near trenches were usually not strong enough to support an earthquake and that they would dampen the slip, Webb said. “We’re recently seen a lot of big tsunamis where there has been large slip right close to the trench,” he said.
One reason the 2011 Tōhoku earthquake in Japan was so devastating was that part of the interface very close to the trench moved a large distance, around 50 meters, pushing the water with it, Webb said. While the main part of the Tōhoku earthquake involved uplift of only a few meters, the part near the trench doubled the size of the tsunami, leading to waves almost 40 meters high at some points along the coast.
To be able to anticipate tsunami-producing earthquakes and more accurately assess regional risks, scientists are studying why some areas of trenches have these slow-slip events, why others continuously creep, and others lock up and build strain that eventually erupts as a tsunami-generating earthquake.
The Alaska Risk
Webb has his sights next on the Aleutian Trench, just off Kodiak Island, Alaska. It is one of the most seismically active parts of the world. A large tsunami-generating earthquake there could wreak havoc not only in Alaska but along the west coast of North America and as far as Hawaii and Japan, as the Good Friday earthquake did in 1964.
Lamont scientists, including Donna Shillington and Geoffrey Abers, who are also presenting their work this week at AGU, have spent years studying the structure of the Aleutian Trench and what happens as the Pacific plate dives beneath the North American plate. Webb and a large group of collaborators now want to find out where sections of the trench are sliding and where sections are locking to help understand what determines where it locks. Finding slow-slip earthquakes could help reveal some of those secrets.
To study the New Zealand slow-slip event, Webb and his colleagues installed an array of 24 absolute pressure gauges and 15 ocean-bottom seismometers directly above the Hikurangi Trough, where two plates converge. Absolute pressure gauges deployed on the seafloor continuously record changes in the pressure of the water above. If the seafloor rises, pressure decreases; if the seafloor moves downward, pressure increases due to the increasing water depth. When the slow-slip event began, the instruments recorded how the seafloor moved.
The scientists found that parts of the Hikurangi interface slipped and others didn’t during the slow-slip event. “It may be that much of the interface slips in these events but you have a few places that are locked, and those finally break and create earthquakes and tsunamis that cause damage,” Webb said.
Most of the instruments used in the New Zealand study were built at Lamont in the OBS (ocean-bottom seismometer) lab started by Webb.
In Alaska, Webb and his collaborators have proposed an experiment that would again use a large numbers of Lamont-built ocean-bottom seismometers and pressure gauges, this time to collect data near Kodiak Island. Alaska is a special challenge for seafloor measurements. The ocean is quite shallow south of Alaska before deepening near the Aleutian Trench, and seismic instruments on the seafloor can be moved by strong currents or damaged by bottom trawling. Webb and the team in the OBS lab at Lamont developed a solution: they built heavy metal shields that sink to the sea floor with the seismometers to protect them.
Once data from the instruments are collected, they will be made publicly available so seismologists across the country can begin to analyze the records in search of clues to the area’s earthquake behavior.
By detecting patterns of earthquakes, scientists can help regional engineers plan construction to better withstand worst-case earthquake scenarios, but predicting earthquake remains elusive.
“If we start seeing precursors based on the off-shore data, then maybe we’ll also get some predictive ability,” Webb said. “The hope is if you have better off-shore measurements, you’ll start to understand things better, and maybe there is some sign of motion happening before the earthquake that will provide some warning.”
Learn more about the work underway at Lamont-Doherty Earth Observatory, Columbia University’s home for Earth science research.