Chasing Microbes in Antarctica

At the base of the polar food chain in the icy waters off Antarctica, phytoplankton are an essential food source for young krill, which in turn sustain many species of marine wildlife. Jeff Bowman is in Antarctica for the field season studying how phytoplankton and bacteria interact, particularly their cooperative interactions. Toxic compounds produced by phytoplankton, for example, may be cleaned up by bacterial partners, allowing photosynthesis to proceed more efficiently, ultimately meaning more food in the food web. 


Posted By: Jeff on May 31, 2022

RAxML is one of the most popular programs around for phylogenetic inference via maximum likelihood. Similarly, hmmalign within HMMER 3 is a popular way to align amino acid sequences against HMMs from Pfam or created de novo. Combine the two and you have an excellent method for constructing phylogenetic trees. But gluing the two together isn’t exactly seamless and novice users might be deterred by a couple of unexpected hurdles. Recently, I helped a student develop a workflow which I’m posting here.

First, define some variables just to make the bash commands a bit cleaner. REF refers to the name of the Pfam hmm that we’re aligning against (Bac_rhodopsin.hmm in this case), while QUERY is the sequence file to be aligned (hop and bop gene products, plus a dinoflagellate rhodopsin as outgroup).


Now, align and convert the alignment to fasta format (required by RAxML-ng).

hmmalign --amino -o $QUERY.sto $REF.hmm $QUERY.fasta
seqmagick convert $QUERY.sto $QUERY.align.fasta

Test which model is best for these data. Here we get LG+G4+F.

modeltest-ng -i $QUERY.align.fasta -d aa -p 8

Check your alignment!

raxml-ng --check --msa $QUERY.align.fasta --model LG+G4+F --prefix $QUERY

Oooh… I bet it failed. Exciting! In this case (using sequences from Uniprot) the long sequence descriptions are incompatible with RAxML-ng. Let’s do a little Python to clean that up.

from Bio import SeqIO

with open('uniprot_hop_bop_reviewed.align.clean.fasta', 'w') as clean_fasta:
for record in SeqIO.parse('uniprot_hop_bop_reviewed.align.fasta', 'fasta'):
record.description = ''
SeqIO.write(record, clean_fasta, 'fasta')

Check again…

raxml-ng --check --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY

If everything is kosher go ahead and fire up your phylogenetic inference. Here I’ve limited bootstrapping to 100 trees. If you have the time/resources do more.

raxml-ng --all --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY --bs-trees 100

Superimpose the bootstrap support values on the best ML tree.

raxml-ng --support --tree $QUERY.raxml.bestTree --bs-trees $QUERY.raxml.bootstraps

And here’s our creation as rendered by Archaeopteryx. Some day I’ll create a tree that is visually appealing, but today is not that day. But you get the point.

Posted By: Jeff on February 12, 2022

Congratulations to Avishek Dutta for his paper Machine Learning Predicts Biogeochemistry from Microbial Community Structure in a Complex Model System that was recently published in the journal Microbiology Spectrum. I’m really excited about this paper; the study it is based on inspired this perspective that I wrote for an mSystems early career special issue last year.

Summary of experimental design and analysis, from Dutta et al., 2022.

The figure above summarizes the experimental design and analysis. The experiment was designed to address the question of whether the microbial community contains sufficient information to predict a biogeochemical state in a dynamic system. The structure of a microbial community is highly sensitive to environmental change. Small changes in the chemical or physical environment will result in a shift in abundance of one or more taxa as mortality and growth rates respond. These shifts in structure are easily observed by amplicon sequencing of taxonomic marker genes. These relative abundance data can be combined with flow cytometry analysis of microbial abundance to yield absolute abundance data.

The trick of course is relating an observed shift in community structure to a specific biogeochemical state. Machine learning provides a number of ways to do this, but all require large training datasets. Fortunately gene sequencing is pretty cheap these days and DNA extractions are much more high-throughput than they were just a few years ago. Because of this it’s possibly to generate community structure data for hundreds of samples in relatively short order. In this study Avishek used over 700 samples from sediment bioreactors and the random forest algorithm to predict the concentration of hydrogen sulfide with a reasonably high degree of accuracy.

Like any statistical model, developing machine learning models takes careful attention to detail. Careful segregation of the data into training and validation sets and engineering of the features used for prediction yield the most honest models that can be best applied for future predictions. Avishek’s paper is an excellent template for developing a predictive machine learning model from microbial community structure data.

Posted By: Jeff on January 15, 2022

We’re on the hunt for a lab manager/senior lab technician to take on a variety of key tasks in the Bowman Lab. The position is being advertised at the Staff Research Associate II level and the ideal applicant will have an MS in a relevant field, or a BS and equivalent experience. We are looking for someone with complementary skills to the rest of the lab; the ideal applicant would have a background in environmental or analytical chemistry to complement our core expertise in microbiology. However, a background in the life sciences also works fine. The formal job posting is pasted below (note that it deviates slightly from what’s described here due to limitations of the UC San Diego HR system).


Under supervision, independently perform a variety of standard laboratory and data analysis procedures (and some non-standard procedures) related to the function of coastal ocean environments. Coordinates and conducts instrument calibrations and data collection for long-term time-series of microbial community structure, microbial abundance, and dissolved gases. Responsible for the operation and maintenance of a membrane inlet mass spectrometer, flow cytometer, and in situ imaging flow cytometer (IFCB), DNA extraction, data entry, and light programming in Python and R. Travel to field stations as needed, which may involve driving University vehicles and operating small boats for diving and coastal field work. Scuba dive to clean and service underwater instrumentation. Coordinate and communicate with lab members about supplies, data and sampling techniques. Oversee and work-direct undergraduate research assistants. Process, analyze, and interpret results from data sets, evaluate quality of data, generate and update design and method documentation, and update web pages. Perform general office duties including but not limited to filing, photocopying, faxing and library searches for research articles. Manage laboratory space, computers, and equipment.

  • Must be able to lift 50 lbs.


  • B.S. in Chemistry, Marine Science, Oceanography, or equivalent combination of education and experience with a strong background in data analysis and computer operations.
  • Demonstrated experience with diving and ability to acquire or maintain AAUS and SIO scientific diving certification.
  • Demonstrated knowledge of mathematics, scientific, and programming principles.
  • Demonstrated experience with R, Matlab, or Python programming languages for data analysis and visualization.
  • Demonstrated laboratory experience. Demonstrated knowledge and experience with laboratory techniques and instrumentation, specifically flow cytometry and DNA extractions. Demonstrated experience with laboratory safety procedures and calibration techniques.
  • Proven ability to work effectively on multiple tasks in parallel, with each requiring a different focus and level of detail and attention. Proven ability to prioritize tasks and solve problems.
  • Demonstrated data entry and data analysis experience. Demonstrated experience with spreadsheets and/or databases for data entry, archival and basic data analysis using standard software (e.g., MS Excel, MS Access, Matlab, or other statistical software packages).
  • Experience communicating and interacting with a variety of people from the public to governmental agencies, students and volunteers. Ability to effectively communicate instructions and interact using tact and diplomacy with diverse personalities including academic, staff, student and volunteer employees and institutions/organizations.
  • Proven ability and experience using PCs, email, internet, general office tools and software.
  • Tolerance of repetitive tasks such as data entry and checking, or extended periods in laboratory filtering samples or analyzing seawater samples via flow cytometry.
  • Demonstrated ability to find and follow written and oral procedures from standard laboratory resources.
  • Must be organized and a self-motivator with the ability to work efficiently while unsupervised.
  • Proven ability to document significant results of data analysis in technical notes. Good writing skills. Ability to integrate data products and methodologies from laboratory and field instrumentation into research results for publication purposes.
  • Proven ability to communicate with technical and scientific personnel. Ability to instruct and aid research associates and students on the use of software packages and data procedures/protocols.
  • Ability to travel for days to weeks for field work and work extended hours as needed.
  • Ability to drive University vehicles to field stations. Valid driver’s license.
  • Proven ability to work with others under demanding conditions, sometimes for extended periods of time.


  • Ability to work at sea. Must have demonstrated experience with SCUBA diving and ability to acquire and maintain AAUS and SIO scientific diving certification.
  • Must have valid driver’s license and ability to drive University vehicles to field stations.
  • Ability to travel for days to weeks for field work and work extended hours as needed.
  • This position is subject to a DMV check for driving record. Fluency in Spanish is preferred.

Posted By: Jeff on January 01, 2022

Congratulations to Luke Piszkin (now a PhD student in the Biophysics Department at the University of Notre Dame) for the first paper in the lab to be first-authored by an undergraduate! Luke’s paper is titled Extremophile enzyme optimization for low temperature and high salinity are fundamentally incompatible and appears in the journal Extremophiles. In the paper Luke explores the molecular basis underlying the intriguing observation that there appear to be very few (no?) extreme halophiles that are also extreme psychrophiles, despite the fact that there are many environments on Earth that are both cold and salty.

Deep Lake Antarctica: cold and salty, but dominated by archaea with a surprisingly high optimal growth temperature. Image from with credits to Ricardo Cavicchioli.

One of these environments is Deep Lake, Antarctica, which supports a microbial community dominated by the mesophilic archaeon Halorubrum lacusprofundi (optimal growth temperature of 36 °C). That’s rather surprising given that your typical true psychrophile conks out at about 18 °C. Like all haloarchaea, what H. lacusprofundi can do is tolerate high levels of salt, up to 4.5 M NaCl or 262 g L-1. That level of salt tolerance is not seen among the documented true psychrophiles. Why not?

In the manuscript we posit that it comes down to the different amino acid substitutions needed to adapt a protein to high salt or low temperature conditions. High salt proteins typically have low isoelectric points, derived from more acidic amino acids. The practical implication of this is that they have a more negatively charged surface that requires a high concentration of salt for stability. This is a requirement for the “salt-in” strategists that dominate the most saline environments (such as salt crystallizer ponds). These microbes are primarily archaea but include a few bacteria, and deal with the high salinity of their environment by accumulating high intracellular concentrations of the salt KCl. This maintains their osmotic balance while excluding more harmful salts, but requires proteins that are compatible with high concentrations of KCl. By contrast most halotolerant bacteria (including psychrophiles that inhabit moderate salinity environments) are “salt-out” strategists that accumulate organic solutes to maintain osmotic balance. These solutes impose no particular requirements on intracellular proteins.

The trick is that amino acid substitutions that lead to a lower isoelectric point also decrease the flexibility of the protein. Increased flexibility is the key protein adaptation to low temperature. Thus the fundamental incompatibility between optimization to low temperature and high salinity. To test this idea Luke dusted off a model, the Protein Evolution Parameter Calculator (PEPC), that I developed many years ago in the waning days of my PhD. After updating the code from Python 2 to Python 3 and making some other improvements, Luke devised an experiment to “evolve” core haloarchaea orthologous group (tucHOG) proteins from H. lacusprofundi and the related mesophile Halorubrum salinarum. By telling the model to select for increased flexibility or decreased isoelectric point he could identify how improvements in one parameter impacted the other. As expected, likely amino acid substitutions (based on position in the protein and the BLOSUM80 substitution matrix) that increased flexibility also strongly favored an increased isoelectric point.

From Piszkin and Bowman, 2022. The directed evolution of tucHOG proteins from H. lacusprofundi and H. salinarum. The proteins were forced to evolve toward increasing flexibility while monitoring the resulting change in isoelectric point.

Posted By: Beth Connors on October 11, 2021

I had the recent pleasure this summer of teaching high school students as a part of a Sally Ride Science Junior Academy. My class was called Polar Microbes, and we discussed adaptations to environments unique to the poles and the importance of microbes to the food webs of the Arctic and Antarctic. One of the things I most wanted to show students was how a simple ecological model could be changed to better fit the polar environment and explicitly include micro-organisms. I was so impressed by how quickly my students were able to understand and change the code underlying the model we used. I wanted to write a quick tutorial to expand that learning to anyone that is intimidated by ecological modeling and wants an easy place to start.

It is valuable to start out with a basic definition: a model is a simple representation of a complex phenomenon. Models are useful because they explicitly describe important mechanisms, which then can be tested against observations. This testing will ultimately demonstrate if your concept of a natural phenomenon was valid or that it needs to be refined.  With very little modeling experience myself, I started with an existing model from the excellent textbook “A Practical Guide to Ecological Modeling” by Karline Soetaert and Peter Herman from Springer. If you use R as a coding language, it is a great book to start modeling, as they have many conceptional explanations paired with highly understandable code. All the examples from the book are in the R package ecolMod:




Once you have the package loaded, you can click through the examples to see how to build a simple ecological model, where a forcing function causes flow between state variables. It is easier to understand with the below visual (Fig 2.1 of Soetart and Herman).

In oceanography, a common real-world application of this conceptual type of model is the NPZD, which stands for Nutrient, Phytoplankton, Zooplankton and Detritus. It is important for us to understand the flow of carbon and nitrogen (among other elements!) through both the macroscopic (zooplankton) and microscopic (detritus that is re-mineralized by bacteria) food web. This is one of the simplest ways to mathematically model it.

Along with figures, the authors are kind enough to include the code for the model. In their code, each of the state variables of NPZ or D (the boxes) are mathematically equal to the flows in minus the flows out. Based on the figure above for instance, PHYTO = f1 – f2. In turn, each of the flows are their own mathematical equations with parameters (constants that are experimentally determined). The equation provided for f1 for instance is:

 f1 = Nuptake  <- maxUptake * PAR/(PAR+ksPAR) * din/(din+ksDIN)

This is because Nuptake is dependent on solar radiation (PAR) and the amount of nutrients that are available (din), as well as the parameters maxUptake, ksPAR and ksDIN which are set as equal to 1/day, 140 muEinst/m2/s and 0.5 mmolN/m3 respectively when we define our parameters later in the model. I encourage you to download the model code and follow how each of the state variable definitions, flows and parameters are connected. Even in a model as simple as this it gets complicated!

Even more exciting are the model solutions, which show a sensible story over two years. As you know from above, the forcing function for the model is PAR (solar radiation), which varies over the season (the sine wave in panel A of the following figure). As PAR increases in the spring, there is a modeled increase in Chlorophyll and Zooplankton (what oceanographers call a “spring bloom”!) and a decrease in DIN.

As I was teaching a class called Polar Microbes, I wanted to change some parts of the model to better reflect a polar environment. Since the model’s forcing function is the seasonal light cycle, I knew it was the first thing that needed to change. The tilt of our rotation axis ensures that our poles have a much more extreme seasonal light cycles, with time in both full darkness and full light.

When you change the model to reflect this planetary fact (just change the PAR function to have a steeper slope and a period of darkness), the output variables change drastically (the Polar Model is in blue below):

Our class had long discussions about this model output. Is it sensible? What can you infer about the polar regions from this? How could it be improved? In our class, we ended up even adding another state variable, Bacteria, and altering the flows from it (viral lysis) to see what happens.

I encourage you to download the ecolMod package and see for yourself! If you are a high school student, consider joining us next summer at Sally Ride Science for my summer class on Polar Microbes as well.

Posted By: Jeff on October 03, 2021

Congrats to Avishek Dutta for his new paper “Detection of sulfate-reducing bacteria as an indicator for successful mitigation of sulfide production” currently available as an early view in Applied and Environmental Microbiology. This was intended to be the second of two papers on a complex experiment that we participated in with BP Biosciences, but the trials and tribulations of peer review led this to be the first. We’re pretty excited about it.

Here’s the quick background. When microbes run out of oxygen the community turns to alternate electron acceptors through anaerobic respiration. One of these is sulfate, which anaerobic respiration reduces to hydrogen sulfide. In addition to smelling bad hydrogen sulfide is pretty reactive and forms sulfuric acid when dissolved in water. For industrial processes this is a problem. Sulfide can destroy products, inhibit desired reactions, and corrode pipes and equipment. To make matters worse, sulfate reducing bacteria (SRBs: those microbes that are capable of using sulfate as an alternate electron acceptor) can form tough biofilms that are hard to dislodge.

One way of dealing with undesired SRBs is to fight biology with biology and add a more preferential electron acceptor. Oxygen would of course work really well, but it typically isn’t feasible to implement oxygen injection on a really large scale. However, nitrate also works well. If nitrate is abundant nitrate reducing bacteria (NRBs) will outcompete SRBs for resources (e.g., labile carbon). Great! Now here’s the challenge… adding massive quantities of nitrate salts is expensive and likely has it’s own ecologically and environmental consequences. So we’d like to do this judiciously, adding just enough nitrate to the system to offset sulfate reduction. But how to know when you’ve added enough? In a really big system (like an oil field) the sulfide production can be happening very far from any possible sampling site so simply measuring the concentration of hydrogen sulfide doesn’t help much. But we can learn some useful things by monitoring the microbial community in the effluent.

Schematic of biofilm dispersal, leading to a recognizable signal in the effluent. From Dutta et al., 2021.

The figure above is a schematic of the formation and decay of the biofilm before, during, and after mitigation. In our study the biofilm was presumed to be sulfidogenic and the mitigation strategy was addition of nitrate salts, but the concept applies equally well to any biofilm and any mitigation strategy. The trick – and this is one of those things that seems painfully obvious after the fact but not before – is that you’re looking for the thing you’re mitigating to appear in the effluent. Although this might seem to suggest increased abundance in the system, it actually represents decay of the biofilm and loss from the system. To take this a step further we used paprica to predict genes in the effluent and then identified anomalies in the abundance of genes involved in sulfate reduction. This anomalies provide specific markers of successful mitigation and a means to a general strategy for monitoring the effectiveness of mitigation.

The detection of anomalies in the predicted abundance of relevant genes provides a way to detect the successful mitigation of SRBs (or any biofilm forming microbes). From Dutta et al., 2021.

Posted By: Jeff on September 16, 2021

Congratulations to Srishti Dasarathy for her first first-authored publication! Srishti’s paper “Multi-year Seasonal Trends in Sea Ice, Chlorophyll Concentration, and Marine Aerosol Optical Depth in the Bellingshausen Sea” is out in advance of print in JGR Atmospheres. This paper was a really long time in coming. For this study, Srishti made use of several different satellite products including measurements of marine aerosol optical depth (MOAD) derived from the CALIPSO satellite. We are not a remote sensing lab and Srishti doesn’t come from a remote sensing or physics background, so the learning curve was pretty steep. It took a couple of years, a lot of Matlab tutorials, and an internship with the CALIPSO team at NASA’s Langley Research Center just to crack the CALIPSO data and start testing hypotheses. Srishti’s main hypothesis was that MOAD would be positively correlated with ocean color and negatively correlated with sea ice, since phytoplankton are known to be a source of volatile organic compounds that can form aerosol particles. Confounding this is that sea spray – which like phytoplankton is associated with open water periods – is also a source of aerosols.

The CALIPSO satellite “curtain”. Figure taken from

One challenge that we faced was that CALIPSO represents data with high spatial resolution along a 2D path or “curtain”, as shown above. The orbital geometry is such that not every point on the globe gets covered; the same curtains get sampled every 16 days. Thus, while spatial resolution is high along the curtain, it is poor orthogonal to the curtain, and temporal resolution is limited to 16 days. This makes it a bit challenging to capture signals associated with relatively ephemeral events (such as phytoplankton blooms).

Basin-scale averages of MOAD, chlorophyll a, ice cover, and wind speed. From Dasarathy et al. 2021.

To work around these limitations Srishti took a basin-scale view of the CALIPSO data and looked for large scale trends that would link CMOD with chlorophyll a or ice cover. This approach isn’t idea and glosses over a lot of interesting details, but it is nonetheless sufficient to reveal some interesting relationships. Most notably that MOAD and chlorophyll a are weakly but significantly correlated in a time-lagged fashion, with a delay of approximately 1 month yielded the strongest correlation. This makes sense, as the volatile organics compounds that link phytoplankton (and ice algal) communities to MAOD are thought to be maximally produced near the end of the phytoplankton bloom as the biomass starts to decay. In the near future new satellite missions like PACE and improved land/sea observing campaigns will allows us to get into the details a bit more, including direct observations of specific blooms and the time- and space-lagged MOAD response!

The strength and sign of the correlation between MAOD and sea ice cover, wind speed, and chlorophyll a change as a function of the time-lag. For chlorophyll a, the strongest correlation with MAOD is observed with a 1-month lag. We hypothesize that this corresponds to the decay of a phytoplankton bloom when we expect the emissions of volatile organic carbon compounds to be maximal.

Posted By: Jeff on September 12, 2021

Last week PhD student Natalia Erazo and I were fortunate to get back into the field after a long pandemic hiatus.  Our mission was to collect mangrove propagules (essentially a detachable bud from which the mangrove seedling sprouts) from the Indian River Lagoon in Florida for an upcoming experiment on mangrove-microbe symbiosis.  Neither of us had worked in Florida before so we teamed up with Candy Feller, an emeritus scientist with the Smithsonian Marine Station in Ft. Pierce, FL.  Candy has been working on mangroves in Florida and around the world for decades and is extremely knowledgeable about the ecology of these systems.  She and husband Ray Feller allowed us to tag along as they checked on a few long-term experiments and study sites up and down the coast.

Natalia, Candy, and I standing in a mixed salt marsh-mangrove habitat near the northern limit of the mangrove range. Photo: Ray Feller.

For those not familiar with Florida’s Atlantic coast, the Indian River Lagoon is a network of estuaries and barrier islands that stretch from north of Cape Canaveral to south of Port St. Lucie.  The barrier islands form a protected waterway that provides habitat for mangroves, manatees, and a variety of other species.  The Indian River Lagoon is home to quite a few people as well, and there are some issues associated with water quality. Nutrients from septic and sewage systems are cited as a cause of high phytoplankton loads and increasingly murky water, leading to a reduction in aquatic vegetation and increased manatee mortality.  Key landscape features in the Lagoon are also the result of human habitation.  For example, much of the mangrove habitat in the Ft. Pierce region exists within engineered mosquito abatement areas.  To reduce the number of mosquitos (currently a nuisance, but previously some did carry disease) berms were created around vast tracts of mangrove habitat.  These areas were then flooded, reducing the breeding success of mosquitos because they lay eggs on wet but not flooded soil. 

Natalia samples propagules from mangroves of the genus Avicennia in a former mosquito abatement area.

Unfortunately, mosquito abatement also killed the mangrove trees which, while salt tolerant and adapted to life in saturated soils, require tidal action to oxygenate the water.  Modern mosquito abatement efforts (while still energy and labor intensive) take this into account and mangroves are thriving in areas that were formerly stagnant abatement ponds.  This is a Good Thing for anyone who likes fish, crabs, shoreline stabilization, and any of the other services that mangroves are well known for providing.

A particularly interesting feature of the Indian River Lagoon is that it is oriented north to south at nearly the northernmost known extent of mangroves on the US Atlantic Coast.  This provides an excellent opportunity to study how mangroves are responding to changing climate.  It’s known that mangroves are extending their range to the north, but climate change is anything but linear, and the rise in atmospheric and sea surface temperatures are accompanied by instabilities and severe perturbations.  The most notable may be freezing events caused by deep intrusions of the now infamous polar vortex.  Such perturbations can have a bigger impact on landscape ecology than the background climate.  Mangroves are very much a tropical species but somewhat resistant to transient freeze events (at least more so than your average Florida orange tree).  How they respond physiologically to these and other stressors that they encounter in their northward progression remains to be seen.

Mangrove trees near the southern end of the Indian River Lagoon. There are no salt marsh habitats in the region, mangrove forests dominate the estuaries.

Salt marsh (with pulp mill in the background) at Fernandina Beach, well north of the current known mangrove range in Florida. Eventually this salt marsh will convert to mangrove forest similar to the previous picture, but the timeline on which this will occur is anyone’s guess.

Posted By: Jeff on May 14, 2021

Congrats to postdoctoral researcher Jesse Wilson for his new paper in Environmental Microbiology Recurrent microbial community types driven by nearshore and seasonal processes in coastal Southern California. Although considerable microbiology work has taken place at the Ellen Browning Scripps Pier this is (surprisingly) the first study to comprehensively look at how bacterial and archaeal community structure change over time. This is also the first of what we hope to be many publications that are a product of the Scripps Ecological Observatory.

Jesse Wilson (left), Avishek Dutta (right), and I prep an in situ sampling pump for the Scripps Ecological Observatory.

As part of the Scripps Ecological Observatory effort we team up with the Southern California Coastal Ocean Observing System (SCCOOS) team for twice-weekly sampling of surface water for microbial community structure via 16S and 18S rRNA gene sequencing and microbial abundance via flow cytometry. As you can see from the SCCOOS and flow cytometry data below it’s a pretty dynamic system! This is why the site is so advantageous for ecological studies; more dynamic means more opportunities to identify co-variants in the environment that signal possible interactions.

From Wilson et al., 2021. Key ecological parameters and flow cytometry data for the Ellen Browning Scripps Pier for an ~18 month period.

At the core of Jesse’s paper is the 16S rRNA gene sequence dataset. What these data provide is a high resolution view of the taxonomy of the bacterial and archaeal community at each sample point. These data are so high resolution – after proper denoising and quality control they represent hundreds to thousands of unique taxa – that it’s often difficult to make inferences from them. Techniques are applied to reduce the complexity of the data and make it easier to see patterns.

From Wilson et al., 2021. Two different techniques were applied to the 16S rRNA gene dataset to reduce the complexity of the microbial community and allow patterns to emerge. The panel at the top shows the occurrence of taxonomic “modes” (our term for SOM-derived classes). The panel at the bottom shows the occurrence of subnetworks in a WGCNA analysis.

Jesse approached the problem from perspectives of both the observations (sampling days) and variables (microbial taxa). For microbial time-series data it is much more common to aggregate variables. A widely used approach involves a technique known as weighted gene correlation network analysis (WGCNA), originally developed for gene expression studies. WGCNA uses network analysis to combine taxa into subnetworks or modules that have like co-occurrence patterns. One advantage to this approach is that the subnetworks are easily correlated to external variables that either drive the pattern (e.g., physical processes) or are influenced by it (e.g., ecophysiology). A disadvantage is that these correlations aren’t predictive. You can’t readily classify new data into the existing subnetworks, and the co-occurrence patterns of the subnetworks themselves contain additional information that isn’t readily captured by this approach.

In a 2017 paper we demonstrated how self-organizing maps (SOMs) can be used to more explicitly link environmental parameters with microbial community structure. SOMs are a form of neural network and collapse complex, multi-dimensional data into a 2D representation that retains the major relationships present in the original data. The end result of the SOM training process is a 2D model of the data that can be further subdivided into distinct classes. Applied to community structure data (i.e. in microbial community segmentation) the SOM flips the aggregation problem, aggregating samples instead of taxa. That means that each unique sample point can be described by the model as a single discrete variable that nonetheless captures much of the key information present. A major advantage to this approach is that the model is reusable: new data can be very efficiently assigned to existing classes, which is a key advantage for an ongoing ecological monitoring effort.

Results of a “microbial community segmentation” using SOMs. A graphical representation of the model is shown in A. B-E show the association of the microbial modes with different ecological parameters.

This paper is an exciting but very early effort to track microbial processes at the Scripps Ecological Observatory. The time-series presented here ends in June of 2019 – the date our original (and terrible) flow cytometer terminally failed – but twice-weekly data collection have continued. We now have three years of 16S and 18S rRNA gene sequence and flow cytometry data and this collection will continue as long as we’re able to support it! Students and potential postdocs interested in microbial time-series analysis should take note…

Many thanks to the Simons Foundation Early Career Investigator in Marine Microbial Ecology and Evolution program for supporting this work, and to all the SCCOOS technicians and Bowman Lab personnel for bringing us water and processing samples!

Posted By: Jeff on March 12, 2021

Congrats to Benjamin Klempay for his first, first-authored publication in the lab! (wow, didn’t I just write that??) Benjamin is part of the Oceans Across Space and Time (OAST) project and his paper, Microbial diversity and activity in Southern California salter and bitterns: analogues for ancient ocean worlds, appears in a special issue of the journal Environmental Microbiology. In the paper Benjamin does a deep dive into the microbial diversity of the network of lakes that make up the South Bay Salt Works, a little known industrial site/wildlife refuge on San Diego Bay that also happens to be the oldest continually operating solar salt harvesting facility in the US.

OAST team members Maggie Weng, Benjamin Klempay, and Peter Doran at the SBSW in 2020.

Our interest in hypersaline lakes – aside from that fact that they just really weird and fun environments to explore – is their value as analogues for evaporative environments on Mars and other ancient ocean worlds. Once upon a time Mars was wet, and may not have been so dissimilar to many environments on Earth today. As that water was lost the oceans, lakes, and wetlands were reduced by evaporation to saline lakes and ultimately salt pans. These end-state evaporative environments are key targets for Martian exploration today. Extremely salty lakes like those found at the Salt Works are a reasonable representation of the last potentially inhabited environments on the surface of Mars before it became too desiccated to support life. Thus the signatures of ancient Martian life might bear some similarities to contemporary life in these lakes.

From Mars was once a wet world. As it dried the remnant lakes and oceans would have become increasingly saline, eventually representing hypersaline environments like the lakes of the South Bay Salt Works.

The microbial diversity of hypersaline lakes has been studied in depth – as I mentioned before they’re weird and fun places to study – but Benjamin’s work looks at a couple of unexplored elements. First, he didn’t restrict his analysis to sodium chloride lakes at the Salt Works (salterns) but also included magnesium chloride lakes (bitterns) that are thought to be too toxic for life (see a nice discussion of this in a recent OAST paper here). He found an interesting pattern of microbial diversity across these lakes, with diversity decreasing as salinity decreases, then suddenly increasing in the magnesium chloride lakes. The reason for this is the absence of microbial growth in those lakes. Rather than hosting a specialized microbial community they collect microbes from dust, seaspray, and other sources (infall), and preserve this DNA but inactivating the enzymes that would normally degrade it.

Microbial diversity in salterns and bitterns. Diversity increases below the known water activity limit for bacteria and archaea due to external inputs of new genetic material. From Klempay et al. 2021.

Co-authors Anne Dekas and Nestor Arandia-Gorostidi at Stanford also applied nano-SIMS to evaluate single-cell activity levels across the salinity (water activity) gradient. Biomass can be very high in these lakes – 100 fold or more higher than seawater – so we assumed that activity would be high too. The nice thing about nano-SIMS is that it evaluates activity on a per-cell basis. Looked at in this way, most bacteria and archaea had surprisingly low levels of activity. We’re still trying to understand exactly what this means and Anne and Nestor undertook an impressive array of experiments as part of our 2020 field effort to try to get to the bottom of it. We think that the extraordinarily low levels of predation are partially responsible; the eukaryotic protists that typically prey on bacteria and archaea can’t grow at the salinity of the saltiest lakes at South Bay Salt Works. Viruses, the other major source of mortality for bacteria and archaea, don’t generally propagate through low-activity populations. So the haloarchaea that dominate in these lakes may have hit upon a winning evolutionary strategy of slow growth under the protection of a particularly extreme environment.

Single-cell activities as measured by nano-SIMS. From Klempay et al. 2021.

Posted By: Jeff on March 04, 2021

Congrats to Natalia Erazo for her first first-authored publication in the lab! Her paper, Sensitivity of the mangrove-estuarine microbial community to aquaculture effluent, appears in a special issue of the journal iScience. The publication is the culmination of our 2017 field effort in the Cayapas-Mataje and Muisne regions of Ecuador.

Study sites in Cayapas-Mataje and Muisne, Ecuador. From Erazo and Bowman, 2021.

Ecuador is ground zero for mangrove deforestation for shrimp aquaculture. Most of Ecuador’s coastline is in fact completely stripped of mangroves. The biogeochemical consequences of this aren’t hard to imagine. Mangrove forests contain a significant amount of carbon in living biomass and in the sediment. Aquaculture ponds, by contrast, contain a large amount of nitrogen as a result of copious additions of nitrogen-rich shrimp feed. The balance of C to N is one of the fundamental stoichiometric relationships in aquatic chemistry. When it shifts all kinds of interesting things start to happen.

Shrimp aquaculture ponds in Muisne, Ecuador. Once there were mangroves…

The one place in Ecuador where you can find large areas of mangroves is the Cayapas-Mataje Ecological Reserve. CMER is in fact the largest contiguous mangrove forest on the Pacific coast of Latin America. Its status comes from an interesting combination of social and economic factors that left this part of Ecuador relatively undeveloped until recently. There is shrimp aquaculture in the reserve, but it’s nowhere near as expansive as in Muisne and other ex-mangrove sites in Ecuador.

Natalia leveraged the different levels of disturbance present in Cayapas-Mataje, and between Cayapas-Mataje and Muisne, to explore what the impact of all this aquaculture activity is on microbial community structure. After all it’s really the microbial community that responds to and drives the biogeochemistry, so understanding the sensitivity of these communities to the changing conditions gives us insight into how the system is changing as a whole.

Patterns in biogeochemistry and genomic features across the disturbance gradient in this study. Erazo and Bowman, 2021.

By using our paprica pipeline Natalia was able to evaluate changes in microbial community structure, predicted genomic content, and key genome features across the disturbance gradient. A nitrogen excess (relative to phosphorous) was associated with bacteria with larger genomes and more 16S rRNA gene copies, indicative of a more copiotrophic or fast-growing population. This has implications for how carbon is turned over or retained at the higher levels of disturbance.

Distribution of predicted metabolic pathways related to nitrogen cycling across different levels of disturbance. Erazo and Bowman, 2021.

Different microbial metabolisms are also associated with the level of disturbance. The figure above shows the distribution of predicted metabolic pathways associated with nitrogen metabolism. Nitrogen fixation, a feature of microbial symbionts of many plants, is less abundant at high levels of disturbance, while pathways associated with denitrification are more abundant. The interesting thing about this is that these samples are restricted to the mangroves themselves – the high disturbance samples don’t reflect the actual aquaculture ponds – so these changes reflect altered processes in the remaining stands of mangroves. The loss of beneficial, symbiotic bacteria and elevated abundance of putative shellfish pathogens suggests the impacts of aquaculture are not limited to the physical removal of mangrove trees and associated release of carbon.

Posted By: Luke Piszkin on January 20, 2021

This post comes form Luke Piszkin, an undergraduate researcher in the Bowman Lab. Gnu Parallel is a must-have utility for anyone that spends a lot of time in Linux Land, and Luke recently had to gain some Gnu Parallel fluency for his project. Enjoy!


GNU parallel is a Linux shell tool for executing jobs in parallel using multiple CPU cores.
This is a quick tutorial for increasing your workflow, and getting the most out of your machine with parallel.

You can find the current distribution here: Please try some basic commands to make sure it is working.

You will need some basic understanding of “piping” in the command line. I will describe command pipes briefly just for our purposes, but for a more detailed look please see

Piping data in the command line involves taking the output of one command and using it as the input for another. A basic example looks like this:

command_1 | command_2 | command_3 | …

Where the output of command_1 will be used as an input by command_2, command_2 will be used by command_3, and so on. For now, we will only need to use one pipe with parallel. Now let’s look at a basic command run in parallel.

Input: find -type f -name "*.txt" | parallel cat

The house stood on a slight rise just on the edge of the village.
It stood on its own and looked over a broad spread of West Country farmland.
Not a remarkable house by any means - it was about thirty years old, squattist, squarish, made of brick, and had four windows set in the front size and proportion which more or less exactly failed to please the eye
The only person for whom the house was in any way special was Arthur Dent, and that was only because it happened to be the one he lived in.
He had lived in it for about three years, ever since he had moved out of London because it made him nervous and irritable

This command makes use of find to list all the .txt files in my directory, then runs cat on them in parallel, which shows the contents of each file on a new line. We can already see how this is much easier than running each command separately, i.e:

In: cat file1.txt

The house stood on a slight rise just on the edge of the village.

In: cat file2.txt

It stood on its own and looked over a broad spread of West Country farmland.


Also, notice how we do not need any placehold for the files in the second command, because of the pipes.
Now let’s take a more complicated example:

find -type f -name "*beta_gal_vibrio_vulnificus_1_100000_0__H_flex=up_*.txt" ! -name "*tally*" | parallel -j 4 python3 {} flex log

0.001759374417007663, 0.00033497120199255527, 0.9969940359705531
0.0019773468515624356, 0.00022978867370935437, 0.9969940359705531
0.001332602651915014, 0.0005953339816183529, 0.9969940359705531
0.0015118302435556904, 0.0005040931537659636, 0.9969940359705531
0.001320879258211107, 0.0006907926578169569, 0.9969940359705531
0.0016753759966792244, 0.00041583739269117386, 0.9969940359705302
0.0017187095827331082, 0.00036931151058880094, 0.9969940359705531
0.0017045099726521733, 0.00031386214441070197, 0.9969940359705531
0.001399703145023273, 0.0005196629341168314, 0.9969940359705531
0.001436129272321403, 0.0004806654291442482, 0.9969940359705531

This is an example from my research, it takes in a .txt data file and spits out log-fit parameters. Like before, we use find to get a list of all the files we want the second command to process. We use ! -name “tally to exclude any files that have “tally” anywhere in the name because we don’t want to process those. In the second command, we have the option -j 4. This tells parallel to use 4 CPU cores, so it can run 4 commands at a time. You can check your computer specs to see how many cores you have available. If your machine has hyperthreading, then it can create virtual cores to run jobs on too. For instance, my dinky laptop only has 2 cores, but with hyperthreading I can use 4. This is another way to improve your efficiency. In the second command you also see a {} placeholder. This spot is filled by whatever the first command outputs. In this case, we need that placeholder because our input files go between other commands.

You can also use parallel to run a number of identical commands at the same time. This is helpful if you have a program to run on the same data multiple times. For example:

seq 10 | parallel -N0 cat file1.txt

The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.

Here we use seq as a counting mechanism for how many times to run the second command. You can adjust the number of jobs by changing the seq argument. We include the -N0 flag, which tells parallel to ignore any piped inputs because we aren’t using the first command for inputs this time.
Often, I like to include both the time shell tool and the –progress parallel option to see current job status and time for completion:

seq 10 | parallel -N0 cat file1.txt

Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:4/0/100%/0.0s The house stood on a slight rise just on the edge of the village.
local:4/1/100%/1.0s The house stood on a slight rise just on the edge of the village.
local:4/2/100%/0.5s The house stood on a slight rise just on the edge of the village.
local:4/3/100%/0.3s The house stood on a slight rise just on the edge of the village.
local:4/4/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/5/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/6/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:3/7/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:2/8/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:1/9/100%/0.1s The house stood on a slight rise just on the edge of the village.
0.21user 0.46system 0:00.63elapsed 108%CPU (0avgtext+0avgdata 15636maxresident)k
0inputs+0outputs (0major+12089minor)pagefaults 0swaps

And with that, you are well on your way to significantly increasing your job throughput and using the full potential of your machine. You should now have a sufficient understand of parallel to construct a command for your own projects, and to explore more complicated applications of parallelization.
(Bonus points to whoever knows the book that I used for the text files.)

Posted By: Jeff on December 04, 2020

Congrats to Avishek Dutta for his first publication in the Bowman Lab: Understanding Microbial Community Dynamics in Up-Flow Bioreactors to Improve Mitigation Strategies for Oil Souring. Avishek did a remarkable job of resurrecting a stagnant dataset and turning it into a compelling story.

What is oil souring anyway? When oil is extracted in a production field the pressure of the field drops over time. To keep the pressure up the oil company (or more accurately, the subsidiary of the subsidiary tasked with such things) pumps in water, which is frequently seawater. When the water comes back out through wells there are two options. The most economical thing is to release it back into the environment, which has obvious negative consequences for environmental health. Alternatively, the same water can be reused by pumping it back into the ground. The downside to this is that recycled well water typically induces the production of hydrogen sulfide by sulfate reducing bacteria. The hydrogen sulfide reacts with the oil (“souring” it) and creates its own set of environmental and occupational hazards.

The oil industry has spent quite a bit of effort trying to figure out how to mitigate the process of oil souring. A leading method is to introduce nitrate salts into the system to boost the growth of nitrate reducing bacteria. The nitrate reducers out compete the sulfate reducers for reduced carbon (such as volatile fatty acids) and induce other processes that further impeded sulfate reduction.

Although the basic concept is pretty simple the details of this competition between nitrate and sulfate reducers in oil field aquifers is not well understood. In this study Avishek leveraged samples from up-flow bioreactors, analogs of the oil field aquifer system, for 16S rRNA gene analysis of the bacterial and archaeal communities. The bioreactors are vessels filled with sand, seawater, and sources of bioavailable carbon (the oil itself is a source of carbon, but requires a specialized microbial community to degrade). Some of the bioreactors also contain oil. Water flows continually through the system and nitrate salts can be added at appropriate time-points. For this experiment the nitrate amendment (the mitigation or M phase) was halted (the rebound sulfidogenesis or SG phase) and then restarted (the rebound control or RC phase).

From Dutta et al., 2020. Relative abundance of top taxa in during different phases: mitigation (M), rebound control (RC), rebound sulfidogenesis (RS).

Lots of interesting things emerged from this relatively small-scale experiment. For one thing the oil and seawater samples are not that different from one another during mitigation. However, when the nitrate addition is stopped those two treatments start to diverge, with different sulfate reducing taxa present in each. This divergence (but not necessarily the microbial community) persists after the treatment ends. But not all microbial taxa responded to the rather extreme perturbation caused by the nitrate addition. Desulfobacula toluolica, for example, which should have been out-competed during mitigation, remained a significant member of the community.

We’re currently analyzing the results of a much larger bioreactor study that we expect will shed some new light on these processes, so stay tuned!

Posted By: Jeff on October 12, 2020

Postdoctoral researcher Jesse Wilson has a new paper titled Using empirical dynamic modeling to assess relationships between atmospheric trace gases and eukaryotic phytoplankton populations in coastal Southern California in press in the journal Marine Chemistry. This paper is the culmination of a nearly two year effort to bring together two long-term datasets collected at the Ellen Browning Scripps Memorial Pier: the Southern California Coastal Ocean Observing System (SCCOOS) phytoplankton count and the Advanced Global Atmospheric trace Gas Experiment (AGAGE). Both of these programs encompass many more sites than the Scripps Pier, but that’s the happy point of overlap. The SCCOOS phytoplankton count (augmented by the McGowan chlorophyll time-series) is part of an effort to track potential HAB-forming phytoplankton in Southern California. Twice weekly microscope counts are made of key phytoplankton taxa, and weekly measurements are made for chlorophyll a and nutrients. AGAGE is, as the name suggests, a global effort to monitor changes in atmospheric trace gases. They do this using high frequency measurements of key gases with GC-MS and a cryo-concentration system known as Medusa.

Our study was motivated by the need to better understand the contribution of different phytoplankton taxa to atmospheric trace gases. Many phytoplankton (and macroalgae) produce volatile organic compounds (e.g., DMS and isoprene) and other trace gases (e.g., carbonyl sulfide). Some of these gases have interesting functions in the atmosphere, such as the formation of secondary aerosols. Although there are many laboratory studies looking at trace gas production by phytoplankton in culture, environmental studies on this topic are usually limited in space and time by the duration of a single cruise or field campaign.

From Wilson et al., 2020. Temporal patterns for various trace gases. Note the strong seasonality for carbonyl sulfide, iodomethane, and chloromethane. Bromoform and dibromomethane are also seasonal, but exhibited a provocative spike in late 2014.

The temporal and spatial limitation of field campaigns is what makes long-term time-series efforts so valuable. For this study we had 9 years of overlapping data between SCCOOS and AGAGE. To analyze these data Jesse designed an approach based on Empirical Dynamic Modeling (EDM) and Convergent Cross Mapping (CCM). For good measure he also aggregated the available meteorological data using a self-organizing map (SOM). EDM and CCM are emerging techniques that can identify causal relationships between variables. The basic idea behind EDM is that a time-series can be described by its own time-lagged components. Given two time-series (say a trace gas and phytoplankton taxa), if the time-lagged components of one describe the other this is evidence of a causal relationship between the two. For a more in-depth treatment of EDM and CCM see this excellent tutorial on Hao Ye’s website.

Not surprisingly our all-vs-all approach to these datasets was a bit messy. A lot of this is due to the complexity of the natural environment and the spatial and temporal disconnect between the measurements. The phytoplankton counts are hyper-local, and reflect the very patchy nature of the marine environment, while the trace gas measurements are regional at best, as the atmosphere moves and mixes over great distances in only a few hours. Nonetheless we made the assumption that ecological observations at the pier are some reflection of conditions across a wider area, and that trace gas measurements do reflect some local influence. So there should be observable links between the two even if those links are muted.

From Wilson et al., 2020. Depth of color gives the value for rho, a measure of the cross-map ability when a parameter (row) affects a trace gas (column). Only significant values are shown, * indicates that the their is evidence of causal interaction in both directions, which typically indicates interaction with a third, unmeasured variable.

I’m particularly excited about what we can do with these data in the future, when we have several years of molecular data. As part of the Scripps Ecological Observatory initiative we’re sequencing 16S and 18S rRNA genes from the same samples as the SCCOOS microscopy counts. We have around 2.5 years of data so far. Just a few more years until we have a molecular dataset as extensive as the count data analyzed here! The key difference will be in the breadth of that data, which will allow us to identify an order of magnitude more phytoplankton taxa than are counted.

Posted By: Jeff on September 16, 2020

It’s been a long time since I’ve had the bandwidth to write up a code snippet here. This morning I had not quite enough time between Zoom meetings to tackle something more involved, so here goes!

In this case I needed to find ~200 sequence (fasta) files for a student in my lab. They were split across several sequencing runs, and for various logistical reasons it was getting a bit tedious to find the location of each sequence file. To solve the problem I wrote a short Python script to wrap the Linux locate command and copy all the files to a new directory where they could be exported.

First, I created a text file “files2find.txt” with text uniquely matching each file that I needed to find. One of the great things about locate is that it doesn’t need to match the full file name.

head files2find.txt

Then the wrapper:

import subprocess
import shutil

with open('files2find.txt') as file_in:
for line in file_in:
line = line.rstrip()

## Here we use the subprocess module to run the locate command, capturing
## standard out.

temp = subprocess.Popen('locate ' + line,
shell = True,
executable = '/bin/bash',
stdout = subprocess.PIPE)

## The communicate method for object temp returns a tuple. First object
## in the tuple is standard out.

locations = temp.communicate()[0]
locations = locations.decode().split('\n')

## Thank you internet for this one-liner, Python one-liners always throw
## me for a loop (no pun intended). Here we search all items in the locations
## list for a specific suffix that identifies files that we actually want.
## In this case our final analysis files contain "exp.fasta". Of course if
## you're certain of the full file name you could just use locate on that and
## omit this step.

fastas = [i for i in locations if 'exp.fasta' in i]

path = '/path/to/where/you/want/files/'

found = set()

## Use the shutil library to copy found files to a new directory "path".
## Copied files are added to the set "found" to avoid being copied more than
## once, if they exist in multiple locations on your computer.

for fasta in fastas:
file_name = fasta.split('/')[-1]
if file_name not in found:
shutil.copyfile(fasta, path + file_name)

## In the event that no files are found report that here.

if len(fastas) == 0:
print(line, 'not found')

Posted By: Emelia Chamberlain on August 11, 2020

PhD student Emelia Chamberlain sends the following dispatch from Polarstern.

The MOSAiC floe, just prior to break-up.

After 64 days at sea, the RV Polarstern and icy surroundings have officially started to feel like home. I can’t believe how quickly the time has passed, but here we are at the end of MOSAiC Leg 4! A truly special cruise, we witnessed the re-building and complete break-down of an ice camp, the peak and end of the spring under-ice bloom, and were the last to sample from the original MOSAiC floe before that singular, well characterized piece of ice (chosen all the way back in last October!) finally reached the marginal ice zone, and the end of its life as a contiguous floe. It’s really quite incredible to have had the opportunity to contribute to this astounding time series. As part of this expedition, over 200 individual scientists have worked on this one piece of ice, following its drift across the Arctic. Even if we cannot determine the full scope of this project or see all the results just now – it is clear that these data will have an impact for generations to come and, especially as a student, I feel so lucky to have contributed to this legacy.

Polarstern on a sunny day in the Arctic

Our home-base, the RV Polarstern is at the heart of this expedition. Summer weather near the sea-ice edge is extremely variable. Our evening weather reports predicted >95% cloud cover almost every day, but we still lucked out with some gorgeous sunny days. But no matter the weather (or distance from which it finally appears out of the fog), the Polarstern after a long day on the ice is always a welcome sight.

When we first arrived at the MOSAiC floe, we were very excited to find enough of it intact to re-establish the Central Observatory ice camp. Very soon, versions 2.0 of all the ice “cities” began popping up across the floe and the science began in earnest. However, every day we had new reminders of the fact that we were decidedly in the Arctic melt season. Melt-ponds became a dominant feature and often, new roads and pathways had to be forged on the fly to get to sampling locations. For example, starting at around 1.5 m the first week, our last first year ice cores were only 90 cm long.

Looking down from the bridge, this was the logistics area of the ice floe on June 19th, the day after our arrival. The research camp has yet to be set up and the area is still fairly dry and melt-pond free.

Here is the same view from the bridge on Jul 25th. The logistics area has become ponded and brownish due to sediment melting out of the ice, darkening the surface and enhancing melt. It was not uncommon to find rocks, shells, etc. on our floe – remnants from its origins off the coast of the New Siberian Islands (  In the foreground is the remote sensing site, followed by the Balloon Town (measuring atmospheric profiles). MET City and Ocean City are behind, hidden in the fog. 

And now, the same view on Jul 31st, after we hit the marginal ice zone and swell increased to the point of breaking the floe apart.

These melt-dynamics not only provided physical challenges to working on the ice, but also scientific ones as well. How to capture this fresh-water lens and study the impacts of such surface stratification on the biomass blooming beneath the ice? This stratification was seen most clearly in the lead systems surrounding the ship. After several surveys, we were able to characterize 3 clear layers – surface freshwater, a green algal layer (brackish salinities), and the underlying seawater. Over time, the living layer shoaled and went from a happy photosynthetic green, to a clearly dying, particulate organic matter greenish/brown. Capturing this bloom transition was quite exciting for us and I look forward to analyzing how the microbial community in these layers evolved.

Ale taking go pro footage from a lead near ROV Oasis on July 10th. At this point, the biomass layer was around 1m deep and only visible with go-pro on stick or ROV footage.

Using the classical “tubing duct-taped to stick” technique, here I am sampling from a lead on the far-side of the floe on July 22nd. At this point, the biomass layer is clearly visible near the surface, reaching a max of 30 cm deep, with lots of dead Melosira (ice-algae) floating in the current.

In addition to these opportunistic events, Alessandra D’Angelo (PhD, URI) and I were happy to continue progress on the core MethOx project work, started by Jessie Creamean (PhD, CSU) on Leg 1, and Jeff on Leg 3. When conditions allowed, we were lucky enough to have nearly daily CTD casts from the Polarstern rosette (thank you Team Ocean!) and were therefore never in want of water. With both of us on board we were able to maximize sampling and analysis, collecting almost 278 unique samples for the core project work alone! We measured weekly seawater profiles for microbial community structure in conjunction with ambient methane concentrations/isotopes and ran experimental samples to study potential oxidation/production rates of methane using elevated methane in select incubations.

The MethOx team after a successful deployment of the BGC team’s gas-flux chamber at Remote Sensing.

Filling BOD bottles from the CTD for discrete O2/Ar ratios to run on the AWI MIMS.

Some of the most promising incubation samples, however, came from the bottom sections of sea-ice cores. In addition to our weekly water column work, I also took part in the interdisciplinary MOSAiC ice-coring Monday to support the sampling effort of these cores. This event included the collection of approximately 25 cores per site (First Year and Second Year ice) per week for a host of parameters: salinity, net primary production, gypsum content, etc. It was great to be able to start every week out on the ice and witness the changes occurring at our floe first-hand.

A heavily ponded First Year Ice Coring Site and the bridge we had to build (and re-build… and re-build) to get there.

Alison Fong (PhD, AWI) and I preparing to section cores. The tent is used to keep temperatures cool and protect samples from direct sunlight during processing. With air-temperatures hovering between -1 and 1 C for most of the leg, this was key to prevent premature melting of the samples.

Ale “fishing” for methane from a hole used for sediment trap deployments near the First Year Ice site. The syringe is used to carefully sample seawater into gas tight bags without incorporating air bubbles – that way we measure the true seawater signal without any atmospheric interference.

BUT – it’s not over yet. Even as I now take this time to reflect on Leg 4, we are quite busy with preparations for Leg 5 where we will head north and witness the re-freeze of the Arctic fall! And although it will be bittersweet to part from our Leg 4 colleagues… the Akademik Tryoshnikov has arrived and the handover must begin. I look forward to continuing on as the Bowman Lab/MethOx Project representative on MOSAiC Leg 5 and can’t wait to see where the RV Polarstern takes us next!

Another bittersweet farewell, the last weather balloon at the MOSAiC floe site on July 31, after the final break-up of the MOSAiC floe.

Happy scientists on the ice… even at 1:00 in the morning! This was the midpoint of a 24 hr sampling cycle and I think summarizes the energy brought by the Leg 4 team very well. Pictured left to right: Ale, UiT Post-doc Jessie Gardner, myself, and our fearless bear-guard Tereza Svecova.

Posted By: Jeff on July 07, 2020

I was fortunate to have the chance to talk about some of our experiences on MOSAiC Leg 3 with Paul Vogelzang, host of the Smithsonian’s Not Old-Better podcast. The full interview can be found here:

Posted By: Emelia Chamberlain on July 04, 2020

PhD student Emelia Chamberlain sends the following dispatch from Polarstern.

Operating an international expedition in the remote central Arctic would always be a logistically taxing endeavor. Operating an international expedition in the remote central Arctic during a global pandemic is that much more challenging. But through the incredible perseverance of a delayed Leg 3 team and hard work from the dedicated logistical teams at the Alfred Wegner Institute in Germany, MOSAiC Leg 4 is underway!

Our NSF funded project work on MOSAiC will be carried out on this leg by URI Post-doc Alessandra D’Angelo and myself. In a reshuffling of plan operations, I found myself headed North almost a month earlier than expected (and for her almost a month later). On May 1, 2020, approx. one hundred scientists and crew began two weeks of quarantine in a local hotel in Bremerhaven. For the first week we were in total isolation within individual hotel rooms. Meals were brought to our door by the incredible hotel staff. Upon beginning our individualized quarantine, we took two tests, the first test on our arrival and the second one seven days later. We were all to remain in individualized quarantine until the results of our second coronavirus test came back.  Thankfully, everyone tested negative and after seven long days we entered Phase 2 – group quarantine. Even weeks and 3,000 km away from meeting with the Polarstern, MOSAiC Leg 4 had finally begun. Tentatively at first, we emerged from our rooms to gather for meals, planning, and perhaps most importantly, safety briefings. While we were able to be around each other, we still practiced strict social distancing precautions.

Here I play the hypothermia victim during an Arctic Safety training. During melt season, falling into the freezing Arctic waters poses one of the greatest dangers while working out on the ice.

On this leg, I will serve as a member of the BioGeoChemistry team – we are principally interested in studying how climate relevant gasses cycle through the central Arctic’s ocean-ice-atmosphere system. Pictured here is myself, URI post-doc and Leg 4 BGC team lead Alessandra D’Angelo, and U. Gronigen post-doc Deborah Bozzato. The fourth and final member of our team, Falk Pätzold, was already onboard Polarstern having also participated in Leg 3.

After another 10 days, and negative test results for all, we began our journey Northward on the R/V Maria S. Merian and R/V Sonne to meet with the MOSAiC workhorse, R/V Polarstern. While overall lovely research vessels, the MS Merian and Sonne are not icebreakers, meaning the Polarstern would need to travel south of the ice edge to refuel and make the personnel exchange. Therefore, all ships coordinated to meet in Adventfjorden, Svalbard. It required a lot of time, but it was a beautiful journey. Due to some unexpected ice-pressure preventing southerly travel we ended up reaching the designated meeting point (near Longyearbyen, Norway) almost 2 weeks earlier than Polarstern! I spent most of the time planning, catching up on some coding, and working out in the many group sporting activities being held on board, all in prep for the labor-intensive ice activities. (Zumba and yoga are of course perfect analogs for pulling sledges of equipment across slushy sea ice…)

Turns out, scientists have a lot of luggage – most of it being heavy scientific equipment. After several iterations of moving all of this gear on and off multiple vessels (busses, ships, etc.), we have the assembly line down pat.

“C is for container!” The container behind this massive pile of luggage was my home for most of my time on the Merian. Between the two ships there were not enough cabins for the entire Leg 4 team so a few of us were tucked into these cozy make-shift apartments. To commemorate the experience, we took a “container crew” photo during move out.

Polarstern arriving in Svalbard.

Once the Polarstern finally arrived in Adventfjorden, a flurry of handover activities commenced. We met with the Leg 3 team, explored the labs, reviewed protocols, and heard all about their experience on the ice. Prior to departing the MOSAiC ice floe, there was a dynamic shift in the ice leading to some sampling sites splitting from the rest. This, however, is not surprising given its fast trajectory south and the onset of the summer melt-season. The ice drift (which can be tracked here in real time) has brought the MOSAiC floe to approximately 82º latitude, air temperatures mostly remain above freezing, and surface water is currently measured at -1.7ºC (warm for under-ice). With polar summer in full swing and such exciting ice dynamics, I look forward to getting back to the floe and tracking the ecological changes through this transition! But first, we must get there. I’m hoping to utilize this next phase of transit to explore the R/V Polarstern. Since it will be my new home for the next ~4.5 months, I suppose I should learn how to find my way from the lab to the mess hall without getting lost…

Before the ships pulled side to side and deployed the gangplank, people were ferried from ship to ship by small boats in order to make the most out of the few days we had for knowledge tranfer and handover activities.

While checking out our lab onboard Polarstern, I ran into Igor, the lab’s totem. Thus far he has done a pretty good job of keeping the instruments in check and running, but I hope that he will be pleased by my bringing him a friend.

Leg 4 waves goodbye from a departing Polarstern to those Leg 3 participants traveling home onboard the Merian. Bye Svalbard, to the North!

Posted By: Jeff on June 11, 2020

Many thanks to Michelle Babcock, communications specialist for the OAST project, for producing this great video of our 2019 field effort at the South Bay Salt Works in San Diego. This field effort was supposed to be a low-key opportunity for the OAST team to practice working together in the field, but the ideal nature of the site and some of our preliminary findings have elevated it to a primary field site. It’s a good thing too… with COVID-19 making a hash of international travel plans it’s the one place we know we can reach!

Posted By: Jeff on May 16, 2020

A common exercise in environmental microbiology is counting bacterial cells with an epifluorescent microscope. During my PhD I spend many hours hunched over a microscope in a darkened room, contemplating which points of light were bacteria (and should thus be counted) and which were not. With a large cup of coffee from Seattle’s U District Bean and Bagel and an episode of “This American Life” playing in the background it’s not a bad way to spend the day. But it definitely felt like a procedure that needed some technological enhancement. My biggest concerns were always objectivity and reproducibility; it’s really hard to determine consistently which “regions of interest” (ROIs) to count. There are of course commercial software packages for identifying and counting ROIs in a microscope image. But why pay big money for a software subscription when you can write your own script? I had some free time during our slow transit to Polarstern during MOSAiC Leg 3 and thought I’d give it a try. The following tutorial borrows heavily from the image segmentation example in Wherens and Kruisselbrink, 2018.

We start with a png image from a camera attached to a microscope. The green features are bacteria and phytoplankton that have been stained with Sybr Green. These are the ROIs that we’d like to identify and count. The image quality is actually pretty bad here; this was made with the epifluorescent scope at Palmer Station back in 2015, and the scope and camera needed some TLC. It turns out that I don’t actually have many such images on my laptop, and I can’t simply go and make a new one because we aren’t allowed in the lab right now! Nonetheless the quality is sufficient for our purposes.

First, we need to convert the image into an RGB matrix that we can work with. I’m sure there’s a way to do this in R, but I couldn’t find an expedient method. Python made it easy.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '15170245.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
for i in range(0, img.shape[1]):
for j in range(0, img.shape[0]):
print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
print(i + 1, j + 1, sep = ',', file = xy_out)

Next we break out R to do the actual analysis (which yes, could be done in Python…). The basic strategy is to use a self organizing map (SOM) with 2 layers. One layer will be color, the other will be position. We’ll use this information to identify distinct classes corresponding to diagnostic features of ROIs. Last, we’ll iterate across all the pixels that appear to belong to ROIs and attempt to draw an ellipse around each group of pixels that makes up a ROI. First, we read in the data produced by the Python script:

scope.rgb <- read.csv('15170245.rgb4r.csv', header = F)
scope.xy <- read.csv('15170245.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

Then we define a function to render the image described by these matrices:

plotImage <- function(scope.xy, scope.rgb){
image.col <- rgb(scope.rgb[,"red"], scope.rgb[,"green"], scope.rgb[,"blue"], maxColorValue = 1)
x.dim <- max(scope.xy$X)
y.dim <- max(scope.xy$Y)

temp.image <- 1:dim(scope.rgb)[1]
dim(temp.image) <- c(y.dim, x.dim)
col = image.col,
ylab = paste0('Y:', y.dim),
xlab = paste0('X:', x.dim))

## give it a test

plotImage(scope.xy, scope.rgb)

You'll note that the function flips the image. While annoying, this doesn't matter at all for identifying ROIs. If it bothers you go ahead and tweak the function :). Now we need to train our SOM. The SOM is what does the heavy lifting of identifying different types of pixels in the image.

#### train som ####

som.grid <- somgrid(10, 10, 'hexagonal')
som.model <- supersom(list('rgb' = data.matrix(scope.rgb), 'coords' = data.matrix(scope.xy)), whatmap = c('rgb', 'coords'), user.weights = c(1, 9), grid = som.grid)

Now partition the SOM into k classes with k-means clustering. The value for k has to be determined experimentally but should be consistent for all the images in a set (i.e. a given type of microscopy image). <- som.model$codes[[1]] <- kmeans(, centers = 6)

## Get mean colors for clusters (classes)

class.col <- c()

for(k in 1:max($cluster)){ <-[which($cluster == k),]
temp.col <- colMeans(
temp.col <- rgb(temp.col[1], temp.col[2], temp.col[3], maxColorValue = 1)
class.col <- append(class.col, temp.col)

## Make a plot of the som with the map units colored according to mean color
## of owning class.

type = 'mapping',
bg = class.col[$cluster],
keepMargins = F,
col = NA)

text(som.model$grid$pts, labels =$cluster)

Here's where we have to be a bit subjective. We need to make an informed decision about which classes constitute ROIs. Looking at this map I'm gonna guess 3 and 6. The classes and structure of your map will of course be totally different, even if you start with the same training image. To make use of this information we first predict which classes our original pixels belong to.

## predict for RGB only

image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb')

Then we identify those pixels that below to the classes we think describe ROIs.

## select units that correspond to ROIs

target.units = which($cluster %in% c(3, 6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

Now comes the tricky bit. Knowing which pixels belong to ROIs isn't actually that useful, as each ROI is composed of many different pixels. So we need to aggregate the pixels into ROIs. Again, this requires a little experimentation, but once you figure it out for a given sample type it should work consistently. The key parameter here is "resolution" which we define as how far apart two pixels of the same class need to be to constitute different ROIs. The value 20 seems to work reasonably well for this image.

## loop through all pixels. if there's a pixel within n distance of it, check to
## see if that pixel belongs to an ROI. If so, add the new pixel to that area. If not,
## create a new ROI. Currently a pixel can be "grabbed" by an ROI produced later.

findROI <- function(resolution = 20){
aoi <- 1
aoi.pixels <- = dim(target.pixels)[1], ncol = 3))
colnames(aoi.pixels) <- c('X', 'Y', 'aoi')

for(i in 1:dim(target.pixels)[1]){

if([i, 'roi']) == T){
pixel.x <- target.pixels[i, 'X']
pixel.y <- target.pixels[i, 'Y']
nns <- which(abs(target.pixels[, 'X'] - pixel.x) < resolution & abs(target.pixels[, 'Y'] - pixel.y) < resolution)

roi.pixels[nns, c('X', 'Y')] <- target.pixels[nns, c('X', 'Y')]
roi.pixels[nns, 'roi'] <- roi
roi <- roi + 1

roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

To evaluate our discovery of ROIs we plot an ellipse around each ROI in the original image.

## approximate each roi as an ellipse. need x, y, a, b

plotROI <- function(roi.pixels){

for(roi in unique(roi.pixels$roi)){
temp.pixels <- roi.pixels[which(roi.pixels$roi == roi),]
temp.a <- max(temp.pixels$X) - min(temp.pixels$X)
temp.b <- max(temp.pixels$Y) - min(temp.pixels$Y)
temp.x <- mean(temp.pixels$X)
temp.y <- mean(temp.pixels$Y)

plot.y <- temp.y
draw.ellipse(temp.x, plot.y, temp.a, temp.b, border = 'red')

plotImage(scope.xy, scope.rgb)

It certainly isn't perfect, the two chained diatoms in particular through off our count. We did, however, do a reasonable job of finding all the small ROIs that represent the smaller, harder to count cells. So how does the model perform for ROI identification on a new image? Here's a new image acquired with the same exposure settings on the same scope. We use the same Python code to convert it to RGB and XY matrices.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '14000740.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
for i in range(0, img.shape[1]):
for j in range(0, img.shape[0]):
print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
print(i + 1, j + 1, sep = ',', file = xy_out)

Then we predict and plot.

scope.rgb <- read.csv('14000740.rgb4r.csv', header = F)
scope.xy <- read.csv('14000740.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

plotImage(scope.xy, scope.rgb)
image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb') # predict for rgb only

target.units = which($cluster %in% c(3,6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

plotImage(scope.xy, scope.rgb)

Not bad! Again, it isn't perfect, some ROIs are grouped together and some are missed (largely a function of the variable focal plane). These can be fixed by experimenting with the model parameters and resolution. Of course if you had hundreds of such images, perhaps representing multiple randomly selected fields from many images, you could process in a few minutes what would take many hours to count.

At this point I can feel the collective judgement of every environmental microbiologist since van Leeuwenhoek for promoting a method that might reduce the serendipitous discovery that comes with spending hours staring through a microscope. So here's a reminder to spend time getting familiar with your samples under the microscope, regardless of how you identify ROIs!

Posted By: Jeff on May 10, 2020

Self-organizing maps (SOMs) are a form of neural network and a wonderful way to partition complex data.  In our lab they’re a routine part of our flow cytometry and sequence analysis workflows, but we use them for all kinds of environmental data (like this).  All of the mainstream data analysis languages (R, Python, Matlab) have packages for training and working with SOMs.  My favorite is the R package Kohonen, which is simple to use but can support some fairly complex analysis through SOMs with multiple data layers and supervised learning (superSOMs).  The Kohonen package has a nice, very accessible paper that describe its key features, and some excellent examples.  This tutorial applies our basic workflow for a single-layer SOM to RGB color data.  RGB color space segmentation is a popular way to evaluate machine learning algorithms, as it is intrinsically multi-variate and inherently meaningful.  Get like colors grouping together and you know that you’ve set things up correctly!

This application of SOMs has two steps.  Each of these steps can be thought of as an independent data reduction step.  It’s important to remember that you’re not reducing dimensions per se, as you would in a PCA, you’re aggregating like data so that you can describe them as convenient units (instead of n individual observations).  The final outcome, however, represents a reduction in dimensionality to a single parameter for all observations (e.g., the color blue instead of (0, 0, 255) in RGB colorspace).  The first step – training the SOM – assigns your observations to map units.  The second step – clustering the map units into classes – finds the structure underlying the values associated with the map units after training.  At the end of this procedure each observation belongs to a map unit, and each map unit belongs to a class.  Thus each observation inherits the class of its associated map unit.  If that’s not clear don’t sweat it.  It will become clear as you go through the procedure.

First, let’s generate some random RGB data.  This takes the form of a three column matrix where each row is a pixel (i.e. an observation).

#### generate some RGB data ####

## select the number of random RGB vectors for training data

sample.size <- 10000

## generate dataframe of random RGB vectors

sample.rgb <- = sample.size, ncol = 3))
colnames(sample.rgb) <- c('R', 'G', 'B')

sample.rgb$R <- sample(0:255, sample.size, replace = T)
sample.rgb$G <- sample(0:255, sample.size, replace = T)
sample.rgb$B <- sample(0:255, sample.size, replace = T)

Next, we define a map space for the SOM and train the model.  Picking the right grid size for the map space is non-trivial; you want about 5 elements from the training data per map unit, though you’ll likely find that they’re not uniformly distributed.  It’s best to use a symmetrical map unless you have a very small training dataset, hexagonal map units, and a toroidal shape.  The latter is important to avoid edge effects (a toroid has no edges).

One important caveat for the RGB data is that we’re not going to bother with any scaling or normalizing.  The parameters are all on the same scale and evenly distributed between 0 and the max value of 255.  Likely your data are not so nicely formed! 

#### train the SOM ####

## define a grid for the SOM and train


grid.size <- ceiling(sample.size ^ (1/2.5))
som.grid <- somgrid(xdim = grid.size, ydim = grid.size, topo = 'hexagonal', toroidal = T)
som.model <- som(data.matrix(sample.rgb), grid = som.grid)

One you’ve trained the SOM it’s a good idea to explore the output of the `som` function to get a feel for the different items in there.  The output takes the form of nested lists.  Here we extract a couple of items that we’ll need later, and also create a distance matrix of the map units.  We can do this because the fundamental purpose of map units is to have a codebook vector that mimics the structure of the training data.  During training each codebook vector is iteratively updated along with its neighbors to match the training data.  After sufficient iterations the codebook vectors reflect the underlying structure of the data.

## extract some data to make it easier to use <- som.model$codes[[1]] <- rgb([,1],[,2],[,3], maxColorValue = 255)
som.dist <- as.matrix(dist(

Now that we have a trained SOM let’s generate a descriptive plot.  Since the data are RGB colors, if we color the plot accordingly it should be sensible.  For comparison, we first create a plot with randomized codebook vectors.  This represents the SOM at the start of training.

## generate a plot of the untrained data. this isn't really the configuration at first iteration, but
## serves as an example

type = 'mapping',
bg =[, size = length(],
keepMargins = F,
col = NA,
main = '')

And now the trained SOM:

## generate a plot after training.

type = 'mapping',
bg =,
keepMargins = F,
col = NA,
main = '')

So pretty! The next step is to cluster the map units into classes.  As with all clustering analysis, a key question is how many clusters (k) should we define?  One way to inform our decision is to evaluate the distance between all items assigned to each cluster for many different values of k.  Ideally, creating a scree plot of mean within-cluster distance vs. k will yield an inflection point that suggests a meaningful value of k.  In practice this inflection point is extremely sensitive to the size of the underlying data (in this case, the number of map units), however, it can be a useful starting place.  Consider that the RGB data were defined continuously, meaning that there is no underlying structure!  Nonetheless we still get an inflection point.

#### look for a reasonable number of clusters ####

## Evaluate within cluster distances for different values of k. This is
## more dependent on the number of map units in the SOM than the structure
## of the underlying data, but until we have a better way...

## Define a function to calculate mean distance within each cluster. This
## is roughly analogous to the within clusters ss approach

clusterMeanDist <- function(clusters){
cluster.means = c()

for(c in unique(clusters)){
temp.members <- which(clusters == c)

if(length(temp.members) > 1){
temp.dist <- som.dist[temp.members,]
temp.dist <- temp.dist[,temp.members]
cluster.means <- append(cluster.means, mean(temp.dist))
}else(cluster.means <- 0)



try.k <- 2:100
cluster.dist.eval <- = 3, nrow = (length(try.k))))
colnames(cluster.dist.eval) <- c('k', 'kmeans', 'hclust')

for(i in 1:length(try.k)) {
cluster.dist.eval[i, 'k'] <- try.k[i]
cluster.dist.eval[i, 'kmeans'] <- clusterMeanDist(kmeans(, centers = try.k[i], iter.max = 20)$cluster)
cluster.dist.eval[i, 'hclust'] <- clusterMeanDist(cutree(hclust(vegdist(, k = try.k[i]))

plot(cluster.dist.eval[, 'kmeans'] ~ try.k,
type = 'l')

lines(cluster.dist.eval[, 'hclust'] ~ try.k,
col = 'red')

legend = c('k-means', 'hierarchical'),
col = c('black', 'red'),
lty = c(1, 1))

Having picked a reasonable value for k (let’s say k = 20) we can evaluate different clustering algorithms.  For our data k-means almost always performs best, but you should choose what works best for your data.  Here will evaluate k-means, hierarchical clustering, and model-based clustering.  What we’re looking for in the plots is a clustering method that produces contiguous classes.  If classes are spread all across the map, then the clustering algorithm isn’t capturing the structure of the SOM well.

#### evaluate clustering algorithms ####

## Having selected a reasonable value for k, evaluate different clustering algorithms.


## Define a function for make a simple plot of clustering output.
## This is the same as previousl plotting, but we define the function
## here as we wanted to play with the color earlier.

plotSOM <- function(clusters){
type = 'mapping',
bg =,
keepMargins = F,
col = NA)

add.cluster.boundaries(som.model, clusters)

## Try several different clustering algorithms, and, if desired, different values for k

cluster.tries <- list()

for(k in c(20)){

## model based clustering using pmclust <- pmclust(, K = k, algorithm = 'em')$class # model based <- pmclust(, K = k, algorithm = 'aecm')$class # model based <- pmclust(, K = k, algorithm = 'apecm')$class # model based <- pmclust(, K = k, algorithm = 'apecma')$class # model based <- pmclust(, K = k, algorithm = 'kmeans')$class # model based

## k-means clustering

som.cluster.k <- kmeans(, centers = k, iter.max = 100, nstart = 10)$cluster # k-means

## hierarchical clustering

som.dist <- dist( # hierarchical, step 1
som.cluster.h <- cutree(hclust(som.dist), k = k) # hierarchical, step 2

## capture outputs

cluster.tries[[paste0('', k)]] <-
cluster.tries[[paste0('', k)]] <-
cluster.tries[[paste0('', k)]] <-
cluster.tries[[paste0('', k)]] <-
cluster.tries[[paste0('', k)]] <-
cluster.tries[[paste0('som.cluster.k.', k)]] <- som.cluster.k
cluster.tries[[paste0('som.cluster.h.', k)]] <- som.cluster.h

## Take a look at the various clusters. You're looking for the algorithm that produces the
## least fragmented clusters.


For brevity I’m not showing the plots produced for all the different clustering algorithms. For these data the k-means and hierarchical clustering algorithms both look pretty good, I have a slight preference for the k-means version:

The SOM and final map unit clustering represent a classification model that can be saved for use with later data.  Once huge advantage to using SOMs over other analysis methods (e.g., ordination techniques) is their usefulness for organizing newly collected data.  New data, if necessary scared and normalized in the same way as the training data, can be classified by finding the map unit with the minimum distance to the new observation.  To demonstrate this, we’ll generate and classify a small new RGB dataset (in reality classifying in this way is very efficient, and could accommodate a huge number of new observations).  First, we save the SOM and final clustering.

## The SOM and map unit clustering represent a classification model. These can be saved for
## later use.

som.cluster <- som.cluster.k
som.notes <- c('Clustering based on k-means')

save(file = 'som_model_demo.Rdata', list = c('som.cluster', 'som.notes', 'som.model', ''))

Then we generate new RGB data, classify it, and make a plot to compare the original data, the color of the winning map unit, and the color of the cluster that map unit belongs to.

#### classification ####

## make a new dataset to classify ## <- 20 <- =, ncol = 3))
colnames( <- c('R', 'G', 'B')$R <- sample(0:255,, replace = T)$G <- sample(0:255,, replace = T)$B <- sample(0:255,, replace = T)

## get the closest map unit to each point <- map(som.model, newdata = data.matrix(

## get the classification for closest map units <- som.cluster[$unit.classif]

## compare colors of the new data, unit, and class, first define a function
## to calculate the mean colors for each cluster

clusterMeanColor <- function(clusters){
cluster.means = c() <- som.model$codes[[1]]

for(c in sort(unique(clusters))){
temp.members <- which(clusters == c)

if(length(temp.members) > 1){ <-[temp.members,]
temp.means <- colMeans(
temp.col <- rgb(temp.means[1], temp.means[2], temp.means[3], maxColorValue = 255)
cluster.means <- append(cluster.means, temp.col)
}else({ <-[temp.members,]
temp.col <- rgb([1],[2],[3], maxColorValue = 255)
cluster.means <- append(cluster.means, temp.col)



class.colors <- clusterMeanColor(som.cluster)

plot(1:length($R), rep(1, length($R)),
col = rgb($R,$G,$B, maxColorValue = 255),
ylim = c(0,4),
pch = 19,
cex = 3,
xlab = 'New data',
yaxt = 'n',
ylab = 'Level')

axis(2, at = c(1, 2, 3), labels = c('New data', 'Unit', 'Class'))

points(1:length($unit.classif), rep(2, length($unit.classif)),
col =[$unit.classif],
pch = 19,
cex = 3)

points(1:length(, rep(3, length(,
col = class.colors[],
pch = 19,
cex = 3)

Looks pretty good! Despite defining only 20 classes, class seems to be a reasonable representation of the original data. Only slight differences in color can be observed between the data, winning map unit, and class.

Posted By: Jeff on May 02, 2020

Where to begin… I started writing this post several days ago on a nearly empty plane flying from Charlotte, N.C. to San Diego. That flight marked the end of MOSAiC Leg 3 for me, though Leg 3 will continue for several more weeks. We were supposed to be done on April 4, however, the COVID-19 pandemic and dynamic sea ice conditions pretty well hashed that plan. I took advantage of a single opportunity to leave the Polarstern early (meaning only a couple of weeks late) to help with the childcare situation at home. The remaining brave and dedicated crew and scientists are expected to return by ship to Europe in late May.

Tromsø, Norway. High on my list of nice places.

Our journey began in Tromsø, Norway in the innocent days of January when COVID-19 seemed like a regional rather than global problem. We had several Chinese expedition members on Leg 3, but they all made it out just fine and – though they were concerned about the situation back home – we figured things would improve in due time. After a week of safety training we were transported to the Russian icebreaker Kapitan Dranitsyn for what we thought would be a 2-3 week voyage to the Polarstern.

Leg 3 aboard the Kapitan Dranitsyn in Tromsø on January 28.

A lot of thought and effort went into determining how the MOSAiC scientists and crew would reach Polarstern throughout the drift. During the winter a conventional icebreaker of Dranitsyn‘s class is not the best way to reach a location in the central Arctic, however, it was the best among the affordable and available options. And in the end it did the job. I’m not sure of the history, but I’m fairly certain that its feat of reaching Polarstern in the dead of winter is unprecedented.

The Dranitsyn plows through “mild” winter weather in the Barents Sea on February 5.

We spent a week in a pleasant fjord just outside of Tromsø waiting for a weather break to cross the Barents Sea to the pack ice. The Barents Sea is notoriously stormy, and we needed wave heights below 4 m to attempt the crossing. Ice breakers don’t ride well in heavy seas, and there were refrigerated shipping containers carrying (our) food for Polarstern stored on deck in the bow. We couldn’t take big waves on the bow, nor could we tolerate much ice formation on the chilling units. Both happened anyway. But at least the crossing was relatively short, and within 72 hours we were in the pack ice.

Dranitsyn on February 6, shortly after entering the pack ice.

The Dranitsyn ended up being a bigger part of Leg 3 than any of us imagined at the time. Had our departure from Polarstern gone as expected, we would have been in transit on Dranitsyn as long as we were doing science on Polarstern. It was an interesting experience. Scientists are pretty good at keeping themselves busy; most of us have a long backlog of analyses to complete and papers to write, in addition to fielding emails from students and colleagues, and handling administrative matters. Absent the internet or email, however, a lot of these responsibilities disappeared. I worked on a proposal and finished a (3 year overdue) paper. Much of the rest of the 5 weeks we were on Dranitsyn I spent in discussion with my Leg 3 colleagues. It was something like an extended MOSAiC workshop. We covered everything from synthesizing across different science themes, to how time and on-ice resources would be shared between groups in a typical week.

The Dranitsyn on February 23, deep in the Arctic ice.

The roughly 2 week delay in reaching the Polarstern was due in large part to ice conditions. Despite previously expeditions onboard icebreakers I hadn’t really thought out it this way before, but the ice compresses and relaxes in response to tides, winds, and other external forces. When the ice is compressed it’s incredibly difficult for an ice breaker to break; there’s simply no place for the ice to be displaced to. Under a relaxed state more leads are open, and there’s more space within the pack ice accommodate the displaced ice as the ship moves through. Temperature also plays a role. Icebreakers are more typically used during the spring summer and fall, when maritime shipping in the Arctic is active and more research activities are taking place. Spring, summer, and fall sea ice is naturally much warmer than winter sea ice. Warmer sea ice is softer and breaks more easily, and the surface of the ice has less friction. This means an icebreaker can more easily ride up and over the ice to break it. With temperatures low and the ice in a “compressed” state it was a tough grind, as this video from our embedded videographers from the UFA production company shows:

Despite the conditions and some weeks of uncertainty we did eventually make it to Polarstern. Seeing that little point of light appear on the horizon after all those weeks of travel made me appreciate how alone we were up there at that time of year.

Dranitsyn approaching Polarstern on February 28. It was another week before we finally said good-bye to the Dranitsyn and settled into our new home in the Arctic.

After we reached Polarstern it took nearly a week to transfer cargo, exchange the scientists and crew, and become familiar enough with our tasks and the surroundings to take over. Many of the MOSAiC observations take place in what’s called the central observatory. This is an aggregation of relatively stable floes around Polarstern that house a number of on-ice installations. These include the colloquially named Met City, Balloon Town, Ocean City, and Droneville sites, among others. Beyond the central observatory are various “pristine” sites for snow and ice sampling, and beyond those lie the nodes of the less frequently visited distributed observatory. The distributed observatory is critical because it provides some spatial context to the intensive observations of the central observatory.

Preparing for a CTD cast on March 6, shortly after the departure of Dranitsyn. In between CTD casts the hole is covered by the Weatherhaven, which has to be picked up and moved out of the way for each deployment. Note the location of the gangway relative to the CTD hole (it’s going to change!). Immediately behind the CTD hole is the logistics zone for staging equipment. In the background, just behind the green flag, you can make out the Ocean City and Balloon Town sites.

We had about a week of “normal” operations before the central Arctic started throwing plot twists at us. The ice was surprisingly dynamic. The reasons behind this will, I think, be an important science outcome of MOSAiC. Thinner, rougher sea ice? More wind stress? Whatever the cause the ice cracked a lot. By chance we were located in what ice dynamicists call a “shear zone”, an area of enhanced kinetic energy within the pack. Here’s the first emergence of a crack, on March 11. You can see the logistics team scrambling to move the snow machines to a safer location. Over the next few weeks this crack grew into a major and ever-evolving lead.

Leg 3’s first encounter with a major crack in the central observatory, on March 11.

By April everyone was pretty used to cracks, leads, and ridges in the central observatory. Overall I was impressed with the resilience of the various team as different installations were threatened, and in some cases destroyed. For the ATMOS (atmospheric sciences) team in particularly, Leg 3 should have been relatively easy – low temps not withstanding. Everyone expected consolidated ice and stable, well-established instruments and protocols. Instead, for a period of time there was a near-daily scramble to maintain power and infrastructure at the Met City site. The adjacent Remote Sensing site was enveloped by a large ridge system and had to be relocated to near the logistics area. Setting up these sites is something that took specialists on Leg 1 many days, on Leg 3 systems had to be dismantled and re-established on the fly. Because spring is a particularly interesting time for atmospheric chemistry in the Arctic the clock was ticking every time a site or instrument went down. The dedication and ingenuity of the scientists at the Met City and the Remote Sensing sites was great to observe. The rest of us helped where we could, but we had our hands pretty full with other problems.

Polarstern crew wrangle power cables that have become trapped between the ship and the floe on March 23. Maintaining the power supply from the ship to the various on-ice installations was a huge challenge given the dynamic ice conditions.

At the top of the problems list was the loss of the hole for the main CTD/rosette system. The CTD is an instrument package that measure conductivity, temperature, and pressure (depth) along with other parameters. It is the fundamental tool in ship-based oceanography. The CTD is attached to a long conductive wire, and embedded within a rosette of sample bottles. The sample bottles are fired at specific depths to collect water for a number of analyses and experiments. Lots of parameters and projects were dependent on this sampling system, and a huge amount of effort had been expended to construct and maintain a hole in the ice for deploying it. On March 15, however, the ice shifted, pushing Polarstern forward. This caused superficial damage to the Weatherhaven covering the hole, but more significantly placed the hole out of reach of the crane that operates the CTD/rosette. Just like that we lost all of our capacity to sample below 1000 m (the central Arctic Ocean is deeper than 4000 m in most places) and to collect large volumes of water from any depth. All sampling had to shift to a much smaller system at the Ocean City site.

On March 15 shifting ice moves the Polarstern forward, rendering the main CTD hole useless.

Why couldn’t we simply make a new hole? It’s worth remembering that the ice in March is near its maximum thickness. It was roughly 160 cm thick when access to the main CTD hole was lost. This discounts any rafting of multiple ice floes, which was probable, and could easily double or triple the thickness. Assuming only a single layer of ice, the way to make a hole big enough for the CTD is with a chainsaw. The thickness of the ice that you can cut is limited by the size of the chainsaw bar. Maybe commercial logging operations have a 2 m bar, we certainly did not! You can cut the ice out in layers – I’ve had to do this in the Antarctic before – but the problem is that you create a bathtub as soon as you start cutting the final layer. To finish the job you’d need a snorkel for yourself and the chainsaw!

Sampling at Ocean City on March 10.

All our water column sampling had to shift to Ocean City, and focus on the upper 1000 m of the water column. Ocean City is the main site for the physical oceanography team. It was designed to accommodate a small team taking ultra-high resolution measurements of the surface ocean. The physical oceanographers went above and beyond sharing their space and resources, and I ended up thoroughly enjoying the time that I spent out at Ocean City. The below video was made on April 20 by lowering a GoPro through the CTD hole at Ocean City while one of the physical oceanographers is conducting high resolution profiling of temperature and salinity. You can see the microstructure profiler used for this near the end of the sequence.

In addition to the water column sampling we carried out sea ice sampling, when conditions allowed. To minimize the impact of light pollution from the vessel on the growth of sea ice algae our preferred ice coring sites were located some distance from the ship. Through the spring and summer, most of the photosynthesis taking place in the central Arctic occurs in the ice itself, rather than the water column. The ice algae have more consistent access to light than their planktonic counterparts, and are famously sensitive to even the lowest levels of light. Ambient light from the ship is more than enough to induce growth in the vicinity during the long polar night. Distance from the ship combined with the dynamic ice conditions created some access challenges.

Delicate maneuvering with a full load of ice cores on April 6. Photo: Eric Brossier.

Despite the access challenges we got some great ice core samples. We fielded two ice coring teams, one for first year ice and one for second year ice. I had the pleasure of working with the second year ice coring team. It was a great US-German-Russian collaborative effort, and we had some good times out there!

Laura Wischnewski and I section sea ice cores. Photo: Eric Brossier.

The combined Leg 3 first year and second year ice coring teams.

The original plan for exchanging Leg 3 with Leg 4 involved flying us all out on ski equipped Antonov An-74 aircraft. This would have been a slick and expedient way to carry out the exchange. It also requires a pretty long runway and permissive global travel. By mid-March it was clear that both of these things were going to be an issue. I’ll be honest, there were some tense weeks where it wasn’t clear when and how Leg 3 would end, and what the future of MOSAiC would be. Kudos to cruise and expedition leadership for navigating us through the ups and downs. In particular AWI logistics had the difficult task of designing and scoping the possible solutions. They did an amazing job of iteratively working through a huge range of options to come up with the one that maximized science and minimized impacts on individual lives. But of course it was a compromise.

The current plan involves the Polarstern leaving the central observatory in mid-May for a rendezvous with one or more ships near Svalbard. The Leg 4 personnel (including Bowman Lab member Emelia Chamberlain) are already under strict quarantine in Bremerhaven, Germany. They’ll remain under quarantine until they depart for Svalbard at roughly the same time Polarstern leaves the central observatory. Once the crew has been exchanged, Leg 3 will sail for Germany and Leg 4 will begin the difficult task of re-establishing observations at the central observatory. An advantage of this plan is that it doesn’t require a complete breakdown of the central observatory. It will require, however, that many of the installations be partially disassembled for safety while Polarstern is away from the flow.

This is how you get a Twin Otter to the central Arctic.

There was one opportunity to leave Polarstern before the official Leg 3-4 exchange. After agonizing over it for a couple of days I decided I needed to take advantage of the opportunity. After an epic few weeks our project was in decent shape, and with two young kids and no school or daycare, attention needed to shift to the home front. On April 22 I stepped onto a Twin Otter operated by Kenn Borek Air Ltd. to begin the long flight home with six other expedition members. We flew to Station Nord in Greenland, then across the Canadian Arctic via Eureka-Resolute-Arctic Bay-Churchill-Toronto, and finally to the US.

Immigration: “You’re coming from where?”

Me: “Resolute”

Immigration: “What were you doing in Resolute”

Me: “Just passing through, we were only there for a few minutes”

Immigration: “So where were you before that?”

Me: “Greenland, but again not very long. See there’s this ship…”

Immigration: “Uh, nevermind. Here’s your passport.”

Spectacular view of the northern Greenland coastline on the approach to Station Nord. Note the obvious interface between the landfast sea ice and the drifting pack ice. This feature is part of the circumpolar flaw-lead system and extended as far as we could see in either direction.

Posted By: Jeff on April 29, 2020

In a sign of the times here’s Ana Barral’s virtual presentation for the 2020 American Society for Biochemistry and Molecular Biology meeting (held remotely). It’s a nice summary of the initial results of our course-based undergraduate research experience (CURE) on microbes on ocean plastics.

Posted By: Emelia Chamberlain on March 20, 2020

Check out the lectures posted by our team as part of the MOSAiC affiliated Massive Open Online Course. This course, available via both Coursera and Youtube was produced by the University of Colorado Boulder in partnership with the Alfred Wegener Institute and with funding from the National Science Foundation. It explores current research surrounding the Arctic ocean-ice-atmosphere system, as well as the questions driving the MOSAiC International Arctic Drift Expedition.


Module 5 Lectures 3-4: Learn from Jeff about the microbial communities living in sea ice.

Module 5 Lecture 7: Learn from Dr. Jesse Creamean, (Leg 1 MOSAiC participant), about atmospheric aerosols in the Arctic.

Module 5 Lecture 8: Learn from Dr. Brice Loose, (Co-PI of our NSF project), about why it is important to study the Arctic in the first place.

Check it out!! –> Coursera & Youtube

Posted By: Beth Connors on January 20, 2020

In the weeks leading up to starting
at Scripps Institution of Oceanography, where I am now a first year PhD
student, I often found myself – for better or worse – turning to Google for
advice on navigating the next five years. I would google any anxiety-fueled
question that popped into my brain, from “What is the hardest year of a PhD?”
to “How competitive is a PhD cohort?” I found the best answers online were from
older students, who used the space to reflect on their recent experiences and
offer up what they had learned along the ride. In that spirit, I hope some
recently accepted graduate student stumbles onto this as they furiously google,
and that it offers comfort and (hopefully) wisdom for a fun and tumultuous transition.

Below is a list of five things I
really learned – and relearned – as a first quarter grad student. I am only one
quarter done with a potentially six-year degree (which if we do some quick GRE
math, means I am one quarter of one sixth done, which is a little over 4
percent), so this is by no means an exhaustive list. I’m actually really
interested to see which of the five becomes more important as the years go on,
and please comment if as a graduate student you think I missed something

Five lessons from the first quarter of graduate school:

Find help.  This I think was the most often repeated piece of advice I saw going into my first year, and it really holds up. Finding people who I could be myself around – to whom I could ask stupid questions and lament about failed experiments with— was essential to my first quarter survival. My best memories from a packed schedule of lab and class are from study groups and group lab coding sessions. I also got involved on campus once I felt settled, which was a great way to meet and work alongside older students and mentors.

Planning ahead saves money. I worked in a lab as an undergrad and as a tech once I graduated, but I was never in charge of purchasing lab equipment or supplies. This quarter marked the transition for me between being an ignorant consumer of lab supplies to a conscientious one, now that I know exactly how expensive it all is and how much money you can waste by doing a poorly designed experiment (extra hint: include controls!). More generally, the application process for PhDs includes applying for grants, which is just the beginning in learning to apply for and manage money as a researcher. I’ve realized I have a lot to learn about budgeting and management in my journey to become a successful scientist.

Grades aren’t everything anymore.  It was a hard habit to break, but all of your time shouldn’t be spent studying for your general first year classes. I learned to diversify how I obtained knowledge. Reading scientific papers and attending seminars from visiting professors were places where I learned the most this quarter. An afternoon spent reading a paper closely related to your research or an hour attending an interesting seminar often meant more to me than studying for a midterm in my more general oceanography classes.

Say yes. I am writing this from a ship off of the coast of Antarctica, where I am conducting field work, all because my advisor asked if I wanted to go and I said yes. Saying yes to collecting samples in the field is one example, but even to something simpler but still scary – like a surfing class or going to a social event where you don’t know anyone – just say yes.  

You will make mistakes. I think the biggest lesson I learned from the first few months of grad school was how often you make mistakes. It is a daily (sometimes hourly) part of life, in both lab and in class. I am still working on how to learn from and move past mistakes, both large and small.

I think since I am still in my
first year I have yet to really experience burnout or writer’s block, which I
know happen often to older PhD students. I feel so fortunate to be able to
study and do science as a Bowman lab member at Scripps, and I hope my insight
from a great first quarter help put any prospective students that are reading this
at ease.

Posted By: Jeff on January 09, 2020

Many thank to atmospheric chemist extraordinaire Jessie Creamean for participating in our NSF project on MOSAiC Leg 1. Jessie’s participation allowed us to have a physical presence during the critical setup phase and freeze in. Her participation was a double win; she has her own DOE funded project with MOSAiC that didn’t include ship time, while our project needed a capable hand for Leg 1. I’ll be picking up where she left off when Leg 3 starts in just a couple of weeks. Jessie shared a few photos from Leg 1.

Jessie Creamean with Akademic Federov in the background.

Checking out a crack in the ice. The ice has been much more dynamic than expected, creating some problems for installing the various observational instruments.

A beautiful view of Polarstern at the onset of the polar night.

Frost flowers! Still a special place in my heart after all these years…

The Russian icebreaker Kapitan Dranitsyn, which will be ferrying the Leg 2 and Leg 3 personnel to Polarstern.

Posted By: Jeff on December 20, 2019

Over the last few months volunteer diver Caitlyn Webster has been putting together a quick outreach video on our CURE-ing Microbes on Ocean Plastics project with National University. In addition to highlighting the project, Caitlyn provides a nice overview of the issue of plastics in the ocean, and some common misconceptions.

Posted By: Jeff on November 06, 2019

The San Diego Union Tribune published an article on our work at the South Bay Salt Works, part of the larger NASA-funded OAST project. Check it out here:

Salt flat researchMeasuring salinity at the South Bay Salt Works (John Gibbins/The San Diego Union-Tribune)

Posted By: Jeff on October 23, 2019

We have a new paper published this week in Frontiers in Environmental Science on estimating ecosystem services along the western Antarctic Peninsula (WAP). This was one of the most challenging academic efforts I’ve been involved in, and is the culmination of nearly 5 years of effort since co-author Barbara Neumann and I conceived the idea during a serendipitous meeting at a Columbia-Kiel University workshop on marine science back when we were both postdocs.

Ecosystem services, the direct and indirect contributions of ecosystems to human well-being, is a concept that’s received a lot of attention as a critical abstraction at the interface between science and policy. Scientists have gotten very good at understanding ecosystem processes and relating them to other ecosystem processes. Economists and social scientists are getting better at quantifying the social and economic costs of environmental change. What’s frequently missing, however, is a framework for linking specific ecosystem processes to social or economic outcomes. This becomes really important if you want to effectively manage resource use; ecosystems perceived as being more socially and economically valuable (i.e. providing more ecosystem services), for example, might warrant more nuanced management.

Ecosystem services are most useful when we can consider their distribution in space and time. However, linking ecosystem services to specific places and times is methodologically challenging. One way to do this is to use expert elicitations via the matrix method. In this approach a collection of experts is formally interviewed in a consistent, scripted fashion to identify “consensus” estimates of service supply from specific ecological units. This approach is typically applied to landscapes, where the ecological units are geographically fixed (think about a mosaic of forest and grassland, each providing different services, but fixed in space).

From Jacobs et al., 2015. Expert based estimates of ecosystem service apply can be mapped to ecological with a known spatial distribution, yielding a spatial map of ecosystem service supply.

But what about the marine environment? Certain ecological features, such as a shoal, gyre, or recurrent eddy can be geographically fixed, but away from such features the marine environment is a fluid mosaic that is not fixed in time or space. We decided to try an approach that was agnostic to location, and instead elicited expert opinions of service supply from the seascape units derived from an objective analysis of macronutrients, chlorophyll, temperature, and salinity in Bowman et al., 2018.

From Neumann et al., 2019. The distribution of objective defined seascape units at different depths along the central west coast of the Antarctic Peninsula. Bowman et al. 2018 identified a total of 8 seascape units that varied in time and space, though most exhibited a tendency toward a certain depth range or location along the onshore-offshore gradient.

For our group of experts we tapped the investigators of the Palmer Long Term Ecological Research (LTER) project (many thanks to all of you!). It was quite a challenge to reconcile the divergent methods – when we conducted the interviews we hadn’t worked out all the details of the seascape unit classification system – but we got there in the end. The approach could use some further refinement before it’s ready to produce a data product for resource managers, however, we hope the proof-of-concept will stimulate further effort at LTERs and elsewhere in the marine environment!

From Neumann et al., 2019. Service supply categorizations for tradition, “landscape” based service providing units and objectively defined seascape units, derived from expert elicitations from the Palmer LTER investigators.

Posted By: Jeff on September 21, 2019

From the tropics to the Arctic… I spent last week in Tromsø , Norway helping prepare the German icebreaker Polarstern for the MOSAiC year-long polar drift expedition. As I’ve written in past posts, I’ve been waiting for this moment since 2012 and it’s hard to believe it’s finally here. MOSAiC is a true coupled ocean-ice-atmosphere study, and the first such study of its scope or scale. There have been modern overwintering expeditions in the Arctic before – most notably the SHEBA expedition of the late 1990’s – but none have approached the breadth or scale of MOSAiC.

The start of the MOSAiC expedition in Tromsø, Norway.

The basic idea behind MOSAiC is to drive Polarstern into the Laptev Sea and tether the ship to an (increasingly rare) large floe of multiyear sea ice. As we move toward winter, the floe and Polarstern will become encased in newly forming sea ice. The ship will drift with this ice through the full cycle of seasons, allowing a rare opportunity to study the physical, chemical, and biological characteristics of sea ice through its full progression of growth and decay.

The German icebreaker Polarstern tethered to an ice floe in the Arctic. Image from

But MOSAiC is about more than sea ice. Sea ice is – for now – a dominant ecological feature of the central Arctic, and it exerts a strong influence on both the atmosphere and the upper ocean. Better predicting the consequences of reduced sea ice cover on these environments is a major goal of the expedition.

With support from the National Science Foundation, for our own little piece of MOSAiC PhD student Emelia Chamberlain and I are collaborating with Brice Loose and postdoctoral researcher Alessandra D’Angelo at the University of Rhode Island, along with colleagues from the Alfred Wegener Institute in Germany. We’ll be looking at how the structure of prokaryotic and eukaryotic communities in sea ice and the upper ocean influence the oxidation of methane (a potent greenhouse gas), and the production and uptake of CO2. I’m looking forward to joining Polarstern in late January for a long, cold stint at the end of the polar night!

Our lab on Polarstern.

We searched in Tromsø for a totem for the lab, but ran a bit short on time and settled for Igor. Trolls are troublesome creatures and not, I think, particularly emblematic of our project team. Cavity ring-down spectrometers and mass specs, however, can be a bit trollish at times. So the totem is for them. Igor will be in charge of our little group of instruments. We can direct our frustrations at him, and hopefully by placating him with offerings we can keep things running smoothly.

The Akademik Federov, a Russian research icebreaker that will sail with Polarstern and help establish the drifting observatory. Federov will return in a few weeks.

Dancing on ice floes. The MOSAiC launch was quite an event with lectures, a party, and a hi-tech light show. The show included an interactive ice floe field – step on the floes and they crack to become open water, slowly freezing after you pass. It was well done.

It’s the Polarstern projected on the Polarstern. So meta.

And they’re off! waving good-bye to the Polarstern.

Posted By: Jeff on September 08, 2019

Natalia Erazo and I are on our way back from an amazing week of sampling in the Cayapas-Mataje Ecological Reserve in northern Ecuador. We first visited the reserve in 2017 and have been anxious to return ever since. Our objectives on this trip were to collect water column and sediment samples to test hypotheses about how shrimp aquaculture impacts mangrove forest health and biogeochemical cycling in mangrove-dominated estuaries. Cayapas-Mataje is an ideal place for this study. The reserve is the largest of its kind along the Pacific Coast of Latin America. The presence of the reserve has prevented the large-scale conversion of mangrove forest to shrimp aquaculture (as has happened further south in Muisne and other parts of the country), however, there are a number of facilities – some quite large – that existed prior to the establishment of the reserve. Thus relatively “pristine” forest can be found immediately adjacent highly impacted forest.

Congratulations to Natalia for receiving a National Geographic Young Explorer award to make this trip a reality! Here are a few choice pictures from the week.

Making a plan. Jesse (blue hat) was our guide in 2017. This trip we were lucky to be joined by Santos, a local fisherman with deep knowledge of the area.

Tambillo. Best village in Cayapas-Mataje.

Bringing the coconuts to market.

You don’t see many dugout canoes in Cayapas-Mataje, though I understand that they’re more common among the indigenous villagers up-river.

Shrimp farm in Cayapas-Mataje. So much nitrogen…

Measuring tree height. Some of the mangroves in Cayapas-Mataje are so high that you wouldn’t believe it if I told you how high (64 meters).

Crabs. Ecologically important. Very camera shy.

Natalia with Jesse’s boat “Los Reyes del Manglar” (Kings of the Mangroves)

Santos with cockles. Cayapas-Mataje supports a major artisinal cockle fishery (see here).

It’s a jungle…

Where’s Natalia?

After a hard days work.

Borbón, our home for the week. Town motto: We make every night an all-night dance party because we can!

Sampling in the mangroves.


Posted By: Emelia Chamberlain on August 18, 2019

These guys are a bit bigger than the microbial organisms we usually study in the Bowman Lab, but are absolute models under a standard light microscope. Here you can see two rotifers (far left with egg sac, and top center) a type of microscopic invertebrate commonly found in freshwater. (PC: E.J. Chamberlain)

Hello! It’s Emelia again – to learn more about me and my research in the Bowman Lab check out this post. I have recently returned from 2 weeks in the Canadian Arctic where I attended an absolutely incredible summer field course entitled “Arctic Microbiomes: From molecules and microbes to ecosystems and health” through the Sentinel North International PhD School at Universite Laval in Quebec, Canada. This course emphasized an interdisciplinary approach to asking (and answering) questions about the role of microbiomes in the Arctic. A microbiome represents the complex interactions of microscopic life (bacteria, archaea, phytoplankton, fungi, viruses, etc.) within a specific habitat. And just as the community that makes up a human gut microbiome can give insights into the health of a person, the diversity of Arctic – soil, pond, sea-ice etc. – microbiomes can give insights into the health of Arctic ecosystems. The Arctic is one of the most rapidly changing places on Earth with warmer temperatures and less ice each year. Key to understanding the broader ecosystem (including human) impacts of this rapid change we must first understand the dynamics of these microbial worlds and how they might buffer, accelerate, or shift in response to, the changing Arctic climatic state.

(PC: Charles W. Greer) Great learning can happen anytime, anywhere. From the classroom…

…to the Great Whale River! (PC: E.J. Chamberlain)

The course was based out of the Center for Northern Studies in Whapmagoostui-Kuujjuarapik. Not entirely remote, there are about 1,400 inhabitants between the Cree First Nation and Inuit communities living in the adjacent villages of Whapmagoostui and Kuujuarapik. The research complex is located at 55º N along the coast of Hudson Bay and is one of 10 stations in the Canadian Network of Northern Research Operators. This field school was fun and informative for many reasons, but here I will briefly recite the top of the list.

Locations of some of the CEN stations, including Whapmagoostui-Kuujjuarapik. PC: CEN

Full Research Complex, taken in the evening (~9 PM) with kitchen (center) and dorm/lab buildings (right). (PC: E.J. Chamberlain)
Main CEN building, run in collaboration with the Cree First Nation of Whapmagoostui as a community center. (PC: E.J. Chamberlain)

1. It really was an International PhD school

(PC: © Pierre Coupel/Seninelle Nord- Universite Laval)

18 students from all around the globe came together to study the microbiota of the Arctic. Every continent was accounted for (and we’ll include Antarctica, considering that many of these polar researchers have spent quite a bit of time there) and there was the possibility that ~5 languages were being spoken simultaneously at any given time. The diversity of this group also extended to scientific expertise – between students and mentors there was a spectrum of research experience, from medical studies of the human gut microbiome to soil microbial ecology and astrobiology. However, while scientific interest may have brought us together, after 10 days of dorm life, sharing meals, and surviving long days in the field, the personal connections and budding cross-continental friendships are what made this school truly unique.  

2. Collaborations with the Cree First Nation

Learning about the native plants of the area and their traditional medicinal & household uses by the Cree community. (PC: E.J. Chamberlain)

Speaking of a cross-cultural experience – as the research complex is on Cree land, it is run in collaboration with the Cree First Nation of Whapmagoostui. Upon our arrival at the station, we were addressed by the Chief – who also happened to be the first female Chief elected in Cree history! She emphasized the importance of learning from the land and provided a human perspective to how we think about research in the North and the challenges facing their community. This type of knowledge exchange continued throughout the school from a science & microscopes workshop held at the local grocery store to traditional tipi building at the research complex. Led by locals, we chopped and prepared the trees ourselves; finally constructing the tipi on our last day at the base. The school also coincided with a yearly heritage festival and we were honored to be included in the local gathering. I learned a lot from the Cree elders, particularly the many changes that they’ve seen in the environment during their lifetimes; an important reminder that climate change is just as much (in fact more of) a human issue as an environmental one.

The finished product! The next group of base-bound researchers will be in charge of adding canvas for the walls. (PC: E.J. Chamberlain)

Sunny – our leader through the tree cutting process helps students place their trim poles into the right position (PC: E.J. Chamberlain)

3. Fieldwork

While sailing to our sample sites we are able to test equipment and ensure that collections will run smoothly. Here I help test out the depth finder while we make our way through the mist. (PC: Flora Amill)

I am a sucker for field work, to me it is the best (and most fun) way to explore the natural world. Even with a rigorous and scientific sampling scheme, there is always the chance to see something new. And this school provided a TON of it in an absolutely GORGEOUS environment – mosquitos and all. One of my favorite days was when we sailed out onto the Great Whale River to take water samples and measure the river’s chemical properties using a hand-held CTD. The water and mist warded off the worst of the mosquitos and I had the opportunity to try out new, state of the art sampling equipment! (Plus I always enjoy a good day on the water.) Some of the other highlights were sampling the local ponds and lakes for cyanobacteria – a type of photosynthetic bacteria that, in these regions, grow in thick filamentous mats. (Formerly known as blue-green algae). It was especially neat because nearby there were some stromatolites – ancient fossilized cyanobacteria from early Earth. These ancient cyanobacteria are responsible for filling the atmosphere with oxygen and making Earth habitable for life like us. In one day we touched the past and collected samples from the present to ask scientific questions about the future.

This sedimentary rock is actually a stromatolite formed from layers of ancient cyanobacteria growth. Cyanobacteria secretes a sticky mucus that binds sediment grains into fine mineral layers that fossilize into the rings seen here. (PC: E.J. Chamberlain)

Sampling microbial mats is all about having the right tools – from bug nets to your good ‘ole Canadian Tire spatula… It’s all in the wrist. (PC: E.J. Chamberlain)

While the weather didn’t cooperate enough for us to actually sample there, we were also able to get a helicopter tour of some of the local permafrost sites! Permafrost encompasses any ground (soil, rock, etc.) that is completely frozen (<0ºC) for at least two consecutive years. However, most permafrost has been frozen for much, much longer than that. The soils are held together by ice and, historically, have been so solidly frozen in some areas that builders considered it more stable to construct on than concrete. In the northern hemisphere, about 1/4 of the land area is made up of permafrost and it is currently melting at unprecedented rates. This not only poses a threat to shorelines and infrastructure but is rapidly and unpredictably changing the microbial communities that live in this unique environment.

Permafrost mounds seen from the helicopter. As the permafrost melts, organic carbon (frozen ancient plant biomass) is released into the adjacent meltwater ponds where it is consumed by hungry bacteria and archaea. The activity rates of this Arctic ~microbiome~ determines how much of this carbon is released into the atmosphere as carbon dioxide or methane – both greenhouse gases. (PC: E.J. Chamberlain)

4. Scientific Expertise & Laboratory Work

Running a qPCR (quantitative polymerase chain reaction). PCR is a technique to make copies of, or amplify, targeted genetic material. qPCR quantifies that material. Here we looked to quantify the amount of toxin-producing genes in our cyanobacteria samples. (PC: E.J. Chamberlain)

As this was a microbiology field school, a good portion of our time was spent analyzing samples in the lab. Many of the techniques we used were similar to the ones we employ here in the Bowman Lab but there was still a lot for me to learn. The first step in most microbiome studies is to simply see who is there. To do this, we extracted genetic material from our samples for DNA sequencing. The first step in this process requires breaking apart the cells from your environmental samples, releasing their genetic material. Then, through a series of chemical reactions and washing steps, this material is extracted from the sample and ready to be amplified and sequenced. Using field-kits and portable sequencing devices, this process can be long and arduous, but thankfully we had many hands in the lab and an excellent cell-phone DJ. By the end of the week we were able to sequence the metagenomes from several of our sampled sites. Then, even without internet (the horror), through the incredible expertise of our mentors, we were able to analyze the diversity of the microbial communities. By pairing who is there with environmental parameters and rate measurements like gas fluxes, we are able to paint a picture of the current functionality and ecosystem services that microscopic life provides.

Measuring the oxygen profile of a microbial mat. (PC: E.J. Chamberlain)

Based on the rotational schedule of this field school I spent most of my days in the lab following those cyanobacteria mats through their subsequent analyses. First we measured the amount of oxygen in each layer of the mats using a micro sensor. This probe allows us to measure O2 gas on the micro-meter scale, giving us an in depth profile for each mat. The top of the mats are photosynthetic, with the highest concentration of chlorophyll just below the surface layer. Towards the bottom of the mats however, respiration becomes the dominant process, and some of the mats even had anoxic bottom layers. This distinct layering would indicate a change in community composition with depth (both cyanobacteria species and other bacteria & viruses that call this mat structure home). To test this, we dissected the mats vertically, separating out layers based on the depth where we saw a distinct change in the oxygen profile. These layers could somewhat be characterized by color which created an easily visible distinction for dissection. These layers were then placed in tubes and analyzed separately in all further analyses.

The dissection station and colorful results (right).
(PC: E.J. Chamberlain).

At the end of the course, we worked on synthesizing all of our results to draw some conclusions about the microbial ecosystems we had been studying for the past week and a half. Each presentation turned into an exciting scientific discussion relying heavily on the diverse expertise and research experience of the mentors and students. I feel incredibly lucky to have been able to learn from these experts and practice the full scientific process in such a unique place.

5. Exploring the North

The North is a fascinating place to do research. We know so little about its environmental processes and there are many scientific questions still begging to be asked. More than that however, the stunning and surprisingly diverse environment, rapidly shifting weather conditions, and richly unique flora and fauna make it a true adventure to explore. Here are some of the pictures I took which I think best capture the north’s wild beauty and ecological diversity.

Rapidly shifting and unpredictable weather makes planning for the field difficult and often delays flights south. (PC: E.J. Chamberlain)

This photo was only taken an hour before the one to the right. The fog rolled in and out constantly most days. (PC: E.J. Chamberlain)

Even in mid July, Hudson Bay was still thick with melting sea ice. It was otherworldly to see the rotted ice washed up on the beach, particularly in contrast to the lush fields & forests nearby. (PC: E.J. Chamberlain)

(PC: E.J. Chamberlain)

(PC: E.J. Chamberlain)

A birds-eye (helicopter) view of the Great Whale River. (PC: E.J. Chamberlain)

Fauna: An adolescent black bear eyes us from the riverbank. (PC: E.J. Chamberlain)
Flora: Cladonia stellaris, or my new favorite lichen. While it looks plant-like, lichen is actually made of two types of microbes – algae and fungi – living in symbiosis. This lichen is an important food source for caribou and reindeer, giving it the common name “reindeer lichen”.
(PC: E.J. Chamberlain)

That’s all for now, folks! To learn more about what I and the rest of this year’s students were up to during the Sentinelle Nord IPS you can check out the group’s field blog here, or follow me on twitter @Antarctic_Emma (see #SNAM19).

(PC: Ligia F. Coelho)

Posted By: Jeff on August 15, 2019

A couple of months ago I was fortunate to have the opportunity to give a lecture at the Birch Aquarium at Scripps in the Perspectives on Ocean Science lecture series. The lecture covered some emerging topics in Arctic Oceanography and provided a brief intro to the upcoming MOSAiC expedition. The lecture was broadcast by UCTV and can be found here. Matthias Wietz – sorry for botching your introduction on the title slide! (Matthias was a PhD student at the Technical University of Denmark when the picture was taken. The record has been set straight.)

Posted By: Jeff on August 11, 2019

Last week we were busy hosting the inaugural Oceans Across Space and Time (OAST, @Space Oceans OAST on Facebook) combined first year meeting and field effort. It was a crazy week but a huge success. The goal of OAST is to improve life detection efforts on future NASA planetary science missions by better understanding how biomass and activity are distributed in habitats that mimic past or present “ocean worlds”. Ocean worlds is a concept that has gained a lot of traction in the last few years (see our Roadmap to Ocean Worlds synthesis paper here). We have a lot of past or present ocean worlds in our solar system (Earth obviously, but also Mars, Europa, Enceledus, and a whole host of other ice-covered moons), and oceans are seen as a natural feature of planetary bodies that are more likely to host life. Our first year effort focused on some open-ocean training for the Icefin robot, designed for exploring the protected spaces below floating ice shelves, and a multi-pronged investigation of the South Bay Salt Works.

The South Bay Salt Works in Chula Vista, CA. A truly amazing site for exploring how microbial activity and biomass are distributed across environmental gradients.

The Salt Works are an amazing environment that my lab has visited previously (see here and here). Our previous work in this environment has raised more questions than answers, so it was great to hit a few of our favorite spots with a top-notch team of limnologists, microbiologists, geochemists, and engineers.

Part of the OAST team setting up next to some very high salinity NaCl-dominated lakes. The pink color of the lakes is the true color, and is common to high salinity lakes. The color comes from carotenoid pigments in the halophilic archaea that dominate these lakes.

This is what I love about NASA – it’s an agency that develops the most sophisticated technology in the history of human civilization, but isn’t afraid to use a rock when the situation calls for it. Spanning several millennia of technological advancement is Maddie Myers (LSU), with Natalia Erazo (SIO) and Carly Novak (Georgia Tech) in the background.

Carly Novak (Georgia Tech) sampling salts with Peter Doran (LSU) and his “surfboard of science” in the background.

Doug Bartlett (SIO), a little out of his element at only 1 atm.

Posted By: Jeff on July 01, 2019

I’m thrilled to learn that my CAREER proposal was just funded by NSF-OPP, though I’m slightly disappointed that they made me change the title from IM-HAPPIER: Investigating Marine Heterotrophic Antarctic Processes, Paradigms, and Inferences through Research and Education to Understanding Microbial Heterotrophic Processes in Coastal Antarctic Waters. Apparently NSF is the only federal agency that doesn’t like a good acronym. This project will address open questions regarding the diversity and ecological function of heterotrophic bacteria and protists in coastal Antarctica. In particular there will be an emphasis on better understanding the mechanisms of bacterial mortality (i.e. protist bacterivory and viral lysis) and the implications for carbon flow through Antarctic marine ecosystems.

We’re coming for you! An unidentified protist (likely mixotrophic member of the genus Teleaulax) captured by microscope at Palmer Station in 2015. Heterotrophic bacteria and protists are ubiquitous in Antarctic waters, but we know surprisingly little about their genetic makeup or ecology.

This project means that after heading north for MOSAiC in 2020, the lab will be heading south for two field seasons in Antarctica. That work will be spearheaded by incoming PhD student Beth Connors. Although several lab members have or will soon be participating in the Palmer LTER cruise along the western Antarctic Peninsula, I haven’t been to the WAP since 2015. Looking forward to going back!

Palmer Station and the ARSV Laurence M. Gould.

CAREER proposals emphasize both education and research, so in addition to field, laboratory, and modeling work we will be developing a new summer Junior Academy course for Sally Ride Science on polar ecology and oceanography.

Posted By: Jeff on June 18, 2019

Thanks to Jesse and Natalia for their help yesterday with the Discover America program run by the US State Department; SIO successfully hosted 35 foreign ambassadors and their spouses for an educational tour of the Scripps Pier. It was quite an experience. Jesse and I successfully dodged the photographers, but here’s a photo of Natalia talking science (presumably) with the ambassador of Cabo Verde and his wife. Downside: not a single diplomat or spouse wanted to go swimming, despite dolphins and balmy water temps!

Natalia talks science with his Excellency Carlos Alberto Wahnon De Carvalho Veiga and Ms. Maria Epifania Cruz Almeida of Cabo Verde. I presume they’re discussing biogeochemical cycling in mangrove forests. Credit: Scripps Communications

Posted By: Jeff on June 12, 2019

As a quick followup to Emelia’s post ( on training for MOSAiC, there is a nice piece out today in the Washington Post on the US-based training for MOSAiC here: It’s alarming to realize that Polarstern will depart from Tromsø, Norway on September 20 – just 100 days from now!

Posted By: Emelia Chamberlain on April 15, 2019

A photo of me with the famous Utqiagvik whale-bone arch, and behind, the Chukchi Sea.

Hello! My name is Emelia Chamberlain and I am a first year PhD student here in the Bowman Lab working on the MOSAiC project. I just got back from a very exciting week in Utqiagvik Alaska for MOSAiC snow and ice training. But first, an overview… As mentioned in an earlier post, the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) project is an international effort to study the Arctic ocean-ice-atmosphere system with the goal of clarifying key climatic and ecological processes as they function in a changing Arctic. Within the larger scope of this project, our lab and collaborators from the University of Rhode Island (URI) will be studying how microbial community structure and ecophysiology control fluxes of oxygen and methane in the central Arctic Ocean.

MOSAiC begins in Sept of 2019, when the German icebreaker RV Polarstern will sail into the Laptev Sea and be tethered to an ice flow. Once trapped in the ice, both ship & scientists will spend the next year drifting through the Arctic. The goal is to set up a central observatory and collect time-series observations across the complete seasonal cycle. This year-long time series will be both exciting and critical for the future of Arctic research, but it is logistically difficult to carry out. The cruise is split up into 6 “legs”, with scientists taking two month shifts collecting observations and living the Arctic life. Resupply will be carried out by other icebreakers and aircraft. I myself will be taking part in the last two legs of this project from June – October 2020, with Jeff, Co-PI Brice Loose (URI), and his post-doc Alessandra D’Angelo (URI) representing our project on the rest of the voyage.

A representation of the central observatory taken from the MOSAiC website

Laboratory training in Bremerhaven, Germany

As one would imagine, with over 600 scientists involved and continuous measurements broken up between multiple teams, this project requires a LOT of advanced planning. However, this is the fun part, as it means we get to travel a lot in preparation! In March, Jeff and I traveled to Potsdam, Germany to participate in a MOSAiC implementation workshop. Shortly after, we took a train up to the Alfred Wegener Institute facilities in Bremerhaven with Brice, Alessandra, and other MOSAiC participants to train on some of the instrumentation we will be operating on the Polarstern. We spent a full week training on instruments like a gas chromatograph, gas-flux measurement chambers, and a membrane inlet mass spectrometer (MIMS). While many of us had operated these types of instruments before, each machine is different and several were engineered or re-designed by participating scientists specifically for MOSAiC.

The AWI engineered MIMS that will be onboard Polarstern. The bubbling chamber ensures precise, daily calibrations (and looks really cool).
A specially designed gas-flux chamber for measuring metabolic gas fluxes in both snow and ice. Photo courtesy of Brice Loose (URI)

The bulk of the training was focused on the MIMS, which will be used to take continuous underway ∆O2/Ar measurements from surface waters during MOSAiC. Water is piped from below the Polarstern and run through the mass spectrometer where dissolved gas concentrations are measured. Argon (Ar), a biologically inert gas, is incorporated into the ocean’s mixed layer at the same rate as oxygen (O2). However, while argon concentrations are evenly distributed, oxygen concentrations are affected by biogeochemical processes (photosynthesis and respiration by biota). We can therefore compare oxygen and argon measurements in the water column to determine how much oxygen has deviated from what we would expect through physical air-sea exchange processes (i.e. deviations from biologic activity). From these oxygen fluxes, we can estimate Net Community Production (NCP), which is defined as the total amount of chemical energy produced by photosynthesis minus that which is used in respiration. This is an important balance to quantify, as it is representative of the amount of carbon removed biologically from the atmosphere (CO2) and sequestered into the ocean pool. The goal is to use these continuous MOSAiC measurements to quantify these biogeochemical budgets through time and get a better understanding of whether the Arctic is net phototrophic or heterotrophic – whether photosynthesis or respiration is the dominant process.  

A behind-the-scenes view of operating the MIMS – photo courtesy of Brice Loose (URI).

Learning how to remove and clean the equilibration tubes These tubes bubble gases into the water for calibration.
PC: Brice Loose (URI)

We will be partially responsible for operating this instrument during our respective legs, and therefore spent a lot of time thinking about what might possibly go wrong during a year on an ice-locked vessel… and how to fix it PC: Brice Loose (URI)

Field training in Utqiagvik, Alaska

Utqiagvik, Alaska (formerly Barrow) is located at the northern tip of Alaska situated between the Chukchi and Beaufort seas. It boasts the northern most point in continental North America.

After a productive week in Bremerhaven, this past week we stepped outside the laboratory with a snow and ice field training session in Utqiagvik, Alaska. One of the challenges of Arctic fieldwork is, of course, that it takes place in the frigid Arctic environment. To help scientists prepare for life on the ice and to help standardize/optimize sampling methods for MOSAiC, there were 3 snow and ice field training sessions organized (the two others took place earlier this year in Finland.) This trip was particularly exciting for me, as it was my first time in the Arctic! Not only did I learn a lot about sampling sea ice but I was struck by the dynamic beauty of the polar landscape. No wonder researchers continue to be fascinated with the unanswered questions of this wild ecosystem.

Up close and personal with a large pressure ridge. Pressure ridges in sea ice are formed when two ice floes collide with each other. You can tell that this ridge was formed from multi-year ice by the thickness of the blocks and their deep blue color. Ice is classified as multi-year when it has survived multiple melt seasons.

Post-doc J.P. Balmonte from Uppsala University meanders his way along the pressure ridge.

The three trainings that everyone had to complete consisted of snow sampling, ice sampling and snow mobile training. Aside from that, people were able to learn or compare more advanced methods for their sampling specialities and test out gear, both scientific and personal weather protection. I was lucky in that the average -18ºC weather we experienced in Utqiagvik will most likely be representative of the type of weather I will be facing in the summer months of MOSAiC. The winter teams will have to contend with quite a bit cooler conditions.

Some days are windier than others and it’s very important to bundle up. However, on this trip I also learned that layers are very important. Working on the ice, especially coring, can be hard work and you don’t want to overheat. Should I need to remove it, beneath my big parka I’ve got on a light puffy jacket, a fleece, and a wool thermal under-layer.

Digging snow-pits is an important aspect for sampling parameters like snow thickness and density. The goal is to get a clear vertical transect of snow to examine depth horizons and sample from. If you look closely, you can see 2 cm thick squares of snow which have been removed from the pit’s wall and weighed before discarding. The wall is built from the snow removed from the working pit and is intended to block researchers from the wind.

Note the meter-stick for snow thickness.
This is a work view I could get used to.

Coring practice! The extension pole between the corer and drill indicate that this is some pretty thick ice. PC: Jeff Bowman

One of the most exciting trainings we had was on how to operate the snow mobiles. These are a critical form of transport on the ice. They often have sleds attached with which to transport gear and samples to and from the ship. As such, we researchers are expected to be able to drive them properly (plus it was pretty fun and allowed us to reach more remote ice locations over our short week in Utquiagvik).

Once out on the ice we practiced tipping the machines over… and how to right them again.
Learning the basics! Note the sled behind ready to be attached to the machine.

While in Utqiagvik, we here at the Bowman Lab decided to make the most of this trip by also collecting some of our own sea-ice cores to sample and experiment with. The goal of our experiment is to determine the best method for melting these cores (necessary for sampling them) while providing the least amount of stress to the resident microbial communities that we are interested in sampling for. I will write up a post covering the methods and ideas behind this experiment soon – but in the meantime, please enjoy this excellent go-pro footage from beneath the ice captured by Jeff during our fieldwork. The brown gunk coating the bottom of the ice is sea-ice algae, mostly made up of diatoms. The ice here is only 68 cm thick allowing for a lot of light penetration and an abundant photosynthetic community. At the end, you can also note the elusive Scientists in their natural sampling habitat.

What’s next?

Jeff looks to the horizon.

As Sept 2019 gets closer, preparations are likely to ramp up even more. Even though I won’t be in the field for another year, it is exciting to think that the start of MOSAiC is rapidly approaching and after these two weeks of training I am feeling much more prepared for the scientific logistics and field challenges that will accompany this research. However, there is still much more to come. In a few weeks I will be jetting off again, but this time to URI to meet up with our collaborators for more instrument training. And thus the preparations continue…

Posted By: Jeff on March 11, 2019

The output from our paprica pipeline for microbial community structure analysis and metabolic inference has changed quite a lot over the last few months. In response to some recent requests here’s a tutorial that walks through an ordination and a few basic plots with the paprica output. The tutorial assumes that you’ve run dada2 on your samples (see starter script here), then paprica (follow this tutorial if needed). I’ll be using data from our recent seagrass paper, grab these data using NCBI prefetch (SRX4496910-SRX4496954) or follow along with your own. Once you’ve run paprica it’s all the same!

First, because paprica operates independently on each sample we need to aggregate the output. This is easily accomplished with the script that you can find in paprica/utilities. Copy this script to your working directory, then execute like this:

./ -edge_in bacteria.edge_data.csv -path_in bacteria.sum_pathways.csv -ec_in bacteria.sum_ec.csv -o 2017.07.03_seagrass_bacteria -unique_in bacteria.unique_seqs.csv

I’m not going to describe the output files in detail, but basically that command provides 1) an edge table (2017.07.03_seagrass_bacteria.edge_tally.csv) that is analogous to an OTU abundance table, 2) a unique read table (2017.07.03_seagrass_bacteria.unique_tally.csv), 3) a file mapping each read to a taxonomic lineage, and 4) a file of additional information on edges, provided as a mean for each sample. For our analysis lets bring the pertinent files into R and do some pre-processing:

## read in the edge and unique abundance tables

tally <- read.csv('2017.07.03_seagrass_bacteria.edge_tally.csv', header = T, row.names = 1)
unique <- read.csv('2017.07.03_seagrass_bacteria.unique_tally.csv', header = T, row.names = 1)

## read in edge_data and taxon_map

data <- read.csv('trinh_bact_samples.edge_data.csv', header = T, row.names = 1)
taxa <- read.csv('trinh_bact_samples.taxon_map.txt', header = T, row.names = 1, sep = '\t', = T)

## convert all na's to 0, then check for low abundance samples

tally[] <- 0
unique[] <- 0

## remove any low abundance samples (i.e. bad library builds), and also
## low abundance reads. This latter step is optional, but I find it useful
## unless you have a particular interest in the rare biosphere. Note that
## even with subsampling your least abundant reads are noise, so at a minimum
## exclude everything that appears only once. <- tally[rowSums(tally) > 5000,] <-[,colSums( > 1000] <- unique[rowSums(unique) > 5000,] <-[,colSums( > 1000]

If your experiment is based on factors (i.e. you want to test for differences between categories of samples) you may want to use DESeq2, otherwise I suggest normalizing by sample abundance.

## normalize <- <-

Now we're going to do something tricky. For both and, rows are observations and columns are variables (edges or unique reads). Those likely don't mean much to you unless you're intimately familiar with the reference tree. We can map the edge numbers to taxa using "taxa" dataframe, but first we need to remove the "X" added by R to make the numbers legal column names. For the unique read labels, we need to split on "_", which divides the unique read identified from the edge number.

## get edge numbers associated with columns, and map to taxa names

tally.lab.Row <- sapply(strsplit(colnames(, 'X'), '[', 2)
tally.lab.Row <- taxa[tally.lab.Row, 'taxon']

unique.lab.Row <- sapply(strsplit(colnames(, '_'), '[', 2)
unique.lab.Row <- taxa[unique.lab.Row, 'taxon']

In the above block of code I labeled the new variables as X.lab.Row, because we'll first use them to label the rows of a heatmap. Heatmaps are a great way to start getting familiar with your data.

## make a heatmap of edge abundance

heat.col <- colorRampPalette(c('white', 'lightgoldenrod1', 'darkgreen'))(100)

scale = NULL,
col = heat.col,
labRow = tally.lab.Row,
margins = c(10, 10))

scale = NULL,
col = heat.col,
labRow = tally.lab.Row,
margins = c(10, 10))

Heatmaps are great for visualizing broad trends in the data, but they aren't a good entry point for quantitative analysis. A good next step is to carry out some kind of ordination (NMDS, PCoA, PCA, CA). Not all ordination methods will work well for all types of data. Here we'll use correspondence analysis (CA) on the relative abundance of the unique reads. CA will be carried out with the package "ca", while "factoextra" will be used to parse the CA output and calculate key additional information. You can find a nice in-depth tutorial on correspondence analysis in R here.

library(factoextra) <- ca( <- get_eigenvalue( <- get_ca_col(

species.x <-$colcoord[,1]
species.y <-$colcoord[,2]

samples.x <-$rowcoord[,1]
samples.y <-$rowcoord[,2]

dim.1.var <- round($variance.percent[1], 1)
dim.2.var <- round($variance.percent[2], 2)

plot(species.x, species.y,
ylab = paste0('Dim 2: ', dim.2.var, '%'),
xlab = paste0('Dim 1: ', dim.1.var, '%'),
pch = 3,
col = 'red')

points(samples.x, samples.y,
pch = 19)

legend = c('Samples', 'Unique reads'),
pch = c(19, 3),
col = c('black', 'red'))

At this point you're ready to crack open the object and start doing some hypothesis testing. There's one more visualization, however, that can help with initial interpolation; a heatmap of the top unique edges contributing to the first two dimensions (which account for nearly all of the variance between samples).

species.contr <-$contrib[,1:2]
species.contr.ordered <- species.contr[order(rowSums(species.contr), decreasing = T),] <- species.contr.ordered[1:10,]

species.contr.lab <- unique.lab.Row[order(rowSums(abs(species.contr)), decreasing = T)]

scale = 'none',
col = heat.col,
Colv = NA,
margins = c(10, 20),
labRow = species.contr.lab[1:10],
labCol = c('Dim 1', 'Dim 2'),
cexCol = 1.5)

From this plot we see that quite a few different taxa are contributing approximately equally to Dim 1 (which accounts for much of the variance between samples), including several different Pelagibacter and Rhodobacteracea strains. That makes sense as the dominant environmental gradient in the study was inside vs. outside of San Diego Bay and we would expect these strains to be organized along such a gradient. Dim 2 is different with unique reads associated with Tropheryma whipplei and Rhodoluna lacicola contributing most. These aren't typical marine strains, and if we look back at the original data we see that these taxa are very abundant in just two samples. These samples are the obvious outliers along Dim 2 in the CA plot.

In this tutorial we covered just the community structure output from paprica, but of course the real benefit to using paprica is its estimation of metabolic potential. These data are found in the *.ec_tally.csv and *path_tally.csv files, and organized in the same way as the edge and unique read abundance tables. Because of this they can be plotted and analyzed in the same way.

Posted By: Jeff on February 25, 2019

We have a new paper out today on the impacts of coastal seagrasses on the microbial community structure of San Diego Bay.  I’m excited about this paper as the first student-led study to come out of my lab.  The study was conceived by Tia Rabsatt, an undergraduate from UVI, during a SURF REU in 2017.  Tia carried out the sample collection, DNA extractions, and flow cytometry, then handed the project off to Sahra Webb.  Sahra carried out the remainder of the project as her Masters thesis.

Tia filters water just outside the mouth of San Diego Bay.  Coronado Island is in the background.

Why the interest in seagrass?  Unlike kelp, seagrasses are true flowering plants.  They’re found around the world from the tropics to the high latitudes and perform a number of important ecosystem functions.  Considerable attention has been given to their importance as nursery habitat for a number of marine organisms.  More recently we’ve come to appreciate the role they play in mediating sediment transport and pollution.  Recent work in Indonesia (which inspired Tia to carry out this study) even showed that the presence of seagrass meadows between inhabited beaches and coral reefs reduced the load of human and coral pathogens within the reefs.

Seagrass, barely visible on a murky collection day.  Confirming seagrass presence/absence was a considerable challenge during the field effort, and one we hadn’t anticipated.  There’s always something…

There are a number of good papers out on the seagrass microbiome – epibionts and other bacteria that are physically associated with the seagrass (see here and here) – but not so many on water column microbes in the vicinity of seagrass meadows.  In this study we took paired samples inside and outside of seagrass beds within and just outside of San Diego Bay.  I’ll be the first to admit that our experimental design was simple, with a limited sample set, and we look forward to a more comprehensive analysis at some point in the future.  Regardless, it worked well for a factor-type analysis using DESeq2; testing for differentially present microbial taxa while controlling for the different locations.

What we found was that (not surprisingly) the influence of seagrass is pretty minor compared to the influence of sample location (inside vs. outside of the bay).  There were, however, some taxa that were more abundant near seagrass even when we controlled for sample location.  These included some expected copiotrophs including members of the Rhodobacteraceae, Puniceispirillum, and Colwellia, as well as some unexpected genera including Synechococcus and Thioglobus (a sulfur oxidizing gammaproteobacteria).  We spent the requisite amount of time puzzling over some abundant Rickettsiales within San Diego Bay.  We usually take these to mean SAR11 (though our analysis used paprica, which usually picks up Pelagibacter just fine), but didn’t look like SAR11 in this case.  An unusual coastal SAR11 clade?  A parasite or endosymbiont with a whonky GC ratio?  TBD…

Posted By: Jeff on December 20, 2018

I’m happy to report that I have a new paper out this week in Frontiers in Microbiology titled Identification of Microbial Dark Matter in Antarctic Environments. I thought that it would be interesting to see how well different Antarctic environments are represented by the available completed genomes (not very was my initial guess), got a little bored at the ISME meeting this summer, and had a go at it.

My approach was to find as many Antarctic 16S rRNA gene sequence datasets as I could on the NCBI SRA (Illumina MiSeq only), reanalyze them using consistent QC and denoising (dada2), and apply our paprica pipeline to see how well the environmental 16S rRNA sequence reads match the full-length reads in a recent build of the paprica database.

First things first, however, it was interesting to see 1) how poorly distributed the available Illumina libraries were around the Antarctic continent, and 2) just how many bad, incomplete, and incorrect submissions exist in SRA. 90 % of the effort on this project was invested in culling my list of projects, tracking down incorrect or erroneous lat/longs, sequence files that weren’t demultiplexed, etc. The demultiplexing issue is particularly irritating as I suspect it results purely from laziness. Of course the errors extend to some of my own data and I was chagrined to see that the accession number in our 2017 paper on microbial transport in the McMurdo Sound region is incorrect. Clearly we can all do better.

The collection locations for 16S rRNA libraries available on the NCBI SRA. From Bowman, 2018. Note the concentration of samples near major research bases along the western Antarctic Peninsula, in Prydz Bay, and at McMurdo Sound.

In the end I ended up with 1,810 libraries that I felt good about, and that could be loosely grouped into the environments shown in the figure above. To get a rough idea of how well each library was represented by genomes in the paprica database I used the map ratio value calculated within paprica by Guppy. The map ratio is the fraction of bases in a query read that match the reference read within the region of alignment. This is a pretty unrefined way to assess sequence similarity, but it’s fast and easy to interpret. My analysis looked at the map ratio value for 1) individual unique reads, 2) samples, and 3) environments. One way to think about #1 is represented by the figure below:

Read map ratio as a function of read abundance for A) Bacteria and B) Archaea, calculated individually for all libraries. The orange lines define an arbitrary cutoff for reads that are reasonably abundant, but have very low map ratios (meaning we should probably think about getting those genomes).

What these plots tell us is that most unique reads were reasonably well represented by the 16S rRNA genes associated with complete genomes (> 80 % map ratio, which is still pretty distant genetically speaking!), however, there are quite a lot of reasonably abundant reads with much lower map ratios (looking at this now it seems painfully obvious that I should have used relative abundance. Oh well).

I didn’t make an effort to track down all the completed genomes associated with Antarctic strains – if that’s even possible – but there is a known deficit of psychrophile genomes. Given that Antarctica tends to be chilly I’ll hazard a guess that there aren’t many complete bacterial or archaeal genomes from Antarctica isolates or metagenomes. Given the novelty of many Antarctic environments, and the number of microbiologists that do work in Antarctica, I’m a little surprised by this. Also kind of excited, however, thinking about how we might solve this for the future…

Posted By: Jeff on December 14, 2018

Abstract submissions are open for AbSciCon 2019!  You can check out the full selection of sessions here, however, I’d like to draw your attention toward the session Salty Goodness: Understanding life, biosignature preservation, and brines in the Solar System.  This session targets planetary scientists and microbiologists (and everyone in between), and we welcome submissions on any aspect of brines and habitability.  Full text follows, help us out by sharing this post widely!

Pure liquid water is only stable in a small fraction of the Solar System; however, salty aqueous solutions (i.e., brines) are more broadly stable. These brine systems however, prove to be some of the most challenging environments for microorganisms, where biology must overcome extreme osmotic stresses, low water activities, chemical toxicity, and depending on the location of the environment, temperature extremes, UV radiation, and intense pressure. Despite these stressors, hypersaline environments on Earth host an astounding diversity of micro- and macroorganisms. With worlds like Mars, Ceres, and outer Solar System Ocean worlds showing the potential for present-day brines, and with upcoming missions to Europa, it is timely to elucidate the potential for such aqueous systems to sustain and support life as well as the stability of these systems on host worlds.

This session is intended to encourage multidisciplinary and cross planetary discussions focused on the phase space of habitability within brines. We seek to discuss 1) the potential and stability of brines on host worlds through both laboratory and modeling experiments, 2) microbial ecology and adaptations to brines, 3) the effects of water activity and chaotropicity on habitability, 4) the ability of hypersaline systems to preserve biomolecules and 5) techniques and technology needed to detect biosignatures in these unique systems.

Posted By: Sabeel Mansuri on December 07, 2018

Hi! I’m Sabeel Mansuri, an Undergraduate Research Assistant for the Bowman Lab at the Scripps Institute of Oceanography, University of California San Diego. The following is a tutorial that demonstrates a pipeline used to assemble and annotate a bacterial genome from Oxford Nanopore MinION data.

This tutorial will require the following (brief installation instructions are included below):

  1. Canu Assembler
  2. Bandage
  3. Prokka
  4. DNAPlotter (alternatively circos)

Software Installation

Canu is a packaged correction, trimming, and assembly program that is forked from the Celera assembler codebase. Install the latest release by running the following:

git clone
cd canu/src

Bandage is an assembly visualization software. Install it by visiting this link, and downloading the version appropriate for your device.


Prokka is a gene annotation program. Install it by visiting this link, and running the installation commands appropriate for your device.


Download the nanopore dataset located here. This is an isolate from a sample taken from a local saline lake at South Bay Salt Works near San Diego, California.

The download will provide a tarball. Extract it:

tar -xvf nanopore.tar.gz

This will create a runs_fastq folder containing 8 fastq files containing genetic data.


Canu can be used directly on the data without any preprocessing. The only additional information needed is an estimate of the genome size of the sample. For the saline isolate, we estimate 3,000,000 base pairs. Then, use the folliowing Canu command to assemble our data:

canu -nanopore_raw -p test_canu -d test_canu runs_fastq/*.fastq genomeSize=3000000 gnuplotTested=true

A quick description of all flags and parameters:

  • -nanopore_raw – specifies data is Oxford Nanopore with no data preprocessing
  • -p – specifies prefix for output files, use “test_canu” as default
  • -d – specifies directory to run test and output files in, use “test_canu” as default
  • genomeSize – estimated genome size of isolate
  • gnuplotTested – setting to true will skip gnuplot testing; gnuplot is not needed for this pipeline

Running this command will output various files into the test_canu directory. The assembled contigs are located in the test.contigs.fasta file. These contigs can be better visualized using Bandage.

Assembly Visualization

Opening Bandage and a GUI window should pop up. In the toolbar, click File > Load Graph, and select the test.contigs.gfa. You should see something like the following:

This graph reveals that one of our contigs appears to be a whole circular chromosome! A quick comparison with the test.contigs.fasta file reveals this is Contig 1. We extract only this sequence from the contigs file to examine further. Note that the first contig takes up the first 38,673 lines of the file, so use head:

head -n38673 test_canu/test_canu.contigs.fasta >> test_canu/contig1.fasta


We blast this Contig using NCBI’s nucleotide BLAST database (linked here) with all default options. The top hit is:

Hit: Halomonas sp. hl-4 genome assembly, chromosome: I
Organism: Halomonas sp. hl-4
Phylogeny: Bacteria/Proteobacteria/Gammaproteobacteria/Oceanospirillales/Halomonadaceae/Halomonas
Max score: 65370
Query cover: 72%
E value: 0.0
Ident 87%

It appears this chromosome is the genome of an organism in the genus Halomonas. We may now be interested in the gene annotation of this genome.

Gene Annotation

Prokka will take care of gene annotation, the only required input is the contig1.fasta file.

prokka --outdir circular --prefix test_prokka test_canu/contig1.fasta

The newly created circular directory contains various files with data on the gene annotation. Take a look inside test_prokka.txt for a quick summary of the annotation. We can take a quick look at the annotation using the DNAPlotter GUI.  For a more customized circular plot use circos.


The analysis above has taken Oxford Nanopore sequenced data, assembled contigs, identified the closest matching organism, and annotated its genome.

Posted By: Jeff on November 16, 2018

This is a quick post of a few photos from our trip to the South Bay Saltworks earlier this week.  Thanks to PhD students Natalia, Emelia, and Srishti for getting up early to go play in the mud, and to Jesse Wilson and Melissa Hopkins for lab-side support!

Getting an early start at one of the lower salinity lakes.

A high salinity lakes with the pink pigmentation clearly visible.  Biology is happening!

A high salinity MgCl2 dominated lake.  It isn’t clear whether anything is living in these lakes – the green pigmentation could be remnants of microbes that lived in a happier time.  Our new OAST project will be further investigating these and other lakes to improve life detection technologies, and better constrain the chemical conditions that are compatible with life.

Srishti and Emelia working very hard at filtering.

Hey Srishti, I think you forgot something!

It will be a long time before we’re done with our analysis for these lakes, but here are a couple of teaser microscope images that reflect the huge difference between an NaCl and MgCl2 dominated lake.

Big, happy bacteria from an NaCl lake at near-saturation.

Same prep applied to an MgCl2 lake.  No sign of large bacterial cells.  There could be life there but it isn’t obvious…

Posted By: Jeff on November 06, 2018

Sometimes working weekends can be a lot of fun.  Last Saturday morning we carried out the second Scripps Institution of Oceanography visit by undergraduate biology majors from National University for our NSF-funded project CURE-ing Microbes on Ocean Plastics.  We recovered a plastic colonization experiment that we started last month, installed the next iteration of the experiment, and finally replaced the pump intake for our continuous flow membrane inlet mass spec (MIMS).  Many thanks to PhD students Natalia Erazo, Srishti Dasarathy, and Emelia Chamberlain for taking the time to work with the the National University undergraduates, and to Kasia Kenitz in the Barton Lab for the diving assist!  Here are a couple of photo/video highlights from the day.

A short video of the plastic colonization experiment after one month of incubation.  Though there has been some swell it hasn’t been a particularly stormy month.  Despite that the cages that hold our plastic wafers were hanging by a thread!  I need to come up with a better system before the winter storms hit…

Chasing a school of baitfish under the pier after installation. At the end of the video you can see the shiny new cage with the next set of plastic wafers, and to the right our newly installed pump intake for the MIMS.

Natalia and Srishti tell it like it is to National University students on the SIO pier.

Checking out microbes in the lab after field sampling on the pier.

Posted By: Jeff on November 02, 2018

This post has been a long time in coming!  I’m happy to announce that our Oceans Across Space and Time (OAST) proposal to the NASA Astrobiology Program has been funded, launching a 5 year, 8+ institution effort to improve life detection strategies for key extraterrestrial environments.  We submitted this proposal in response to the NASA Astrobiology Institute call (NAI; a key funding opportunity within the Astrobiology Program), however, OAST was ultimately funded under a new research coordination network titled Network for Life Detection (NfoLD).  Research coordination networks are a relatively new construct for NASA and provide a better framework for exchanging information between teams than the old “node” based NAI model.  NfoLD will eventually encompass a number of NASA projects looking at various aspects of life detection and funded through a variety of different opportunities (Exobiology, PSTAR, Habitable Worlds, etc).

OAST is led by my colleague Britney Schmidt, a planetary scientist at Georgia Tech (click here for the GT press release).  Joining us from SIO is Doug Bartlett, a deep sea microbial ecologist.  Other institutions with a major role include Stanford, MIT, Louisiana State University, Kansas University, University of Texas at Austin, and Blue Marble Space Institute of Science.

The OAST science objectives are structured around the concept of contemporary, remnant, and relict ocean worlds, and predicated on the idea that the distribution of biomass and biomarkers is controlled by different factors for each of these ocean “states”.  Understanding the distribution of biomass, and the persistence of biomarkers, will lead us to better strategies for detecting life on Mars, Europa, Enceledus, and other past or present ocean worlds.

Earth is unique among the planets in our solar system for having contemporary, remnant, and relict ocean environments.  This is convenient for us as it provides an opportunity to study these environments without all the cost and bother of traveling to other planets just to try unproven techniques.  For OAST, we’ve identified a suite of ocean environments that we think best represent these ocean states.  For contemporary oceans worlds (such as Europa and Enceledus) we’re studying deep hypersaline anoxic basins (DHABs – I might have hit the acronym limit for a single post…) which may be one of the most bizarre microbial habitats on Earth.  These highly stratified ecosystems are energetically very limited and contain an extreme amount of environmental stress through pressure and high salinity.  These are very much like the conditions we’d expect on a place like Europa.  The below video from the BBC’s latest Blue Planet series provides some idea of what these environments are like.

For remnant ocean worlds we will study a number of hypersaline lake systems, such as were likely present on Mars as it transitioned from a wet to a dry world.  Unlike the contemporary Europan ocean, the remnant Martian ocean would have had lots of energy to support life, a condition shared by many saline lake environments on Earth.  This is illustrated by this photo of me holding a biomass-rich filter at the Great Salt Lake in Utah, way back in my undergraduate days.

Sunlight provides an abundance of energy to many hypersaline lake environments.  Despite the challenging conditions imposed by salt, these systems often have high rates of activity and abundant biomass.  Photo: Julian Sachs.

Relict ocean worlds are a smaller component of OAST.  This isn’t for lack of relevance – Mars is a relict ocean world after all – but you can’t do everything, even in five years and with an awesome team.  Nonetheless we will carry out work on what’s known as the Permian mid-continental deposits, to search for unambiguous biomarkers persisting from when the region was a remnant ocean.  OAST will be a big part of what we’re up to for the next five years, so stay tuned!

Posted By: Jesse Wilson on October 22, 2018

Weighted gene correlation network analysis (WGCNA) is a powerful network analysis tool that can be used to identify groups of highly correlated genes that co-occur across your samples. Thus genes are sorted into modules and these modules can then be correlated with other traits (that must be continuous variables).

Originally created to assess gene expression data in human patients, the authors of the WGCNA method and R package  have a thorough tutorial with in-depth explanations ( More recently, the method has been applied to microbial communities (Duran-Pinedo et al., 2011; Aylward et al., 2015; Guidi et al., 2016; Wilson et al., 2018)–the following is a walk though using microbial sequence abundances and environmental data from my 2018 work (

Background: WGCNA finds how clusters of genes (or in our case abundances of operational taxonomic units–OTUs) correlates with traits (or in our case environmental variables or biochemical rates) using hierarchical clusters, novel applications of weighted adjacency functions and topological overlap measures, and a dynamic tree cutting method.

Very simply, each OTU is going to be represented by a node in a vast network and the adjacency (a score between 0 and 1) between each set of nodes will be calculated. Many networks use hard-thresholding (where a connection score [e.g. a Pearson Correlation Coefficient] between any two nodes is noted as 1 if it is above a certain threshold and noted as 0 if it is below it). This ignores the actual strength of the connection so WGCNA constructs a weighted gene (or OTU) co-occurrence adjacency matrix in lieu of ‘hard’ thresholding. Because our original matrix has abundance data the abundance of each OTU is also factored in.

For this method to work you also have to select a soft thresholding power (sft) to which each co-expression similarity is raised in order to make these scores “connection strengths”. I used a signed adjacency function:

  • Adjacency = 0.5*(1+Pearson correlation)^sft

because it preserves the sign of the connection (whether nodes are positively or negatively correlated) and this is recommendation by authors of WGCNA.

You pick your soft thresholding value by using a scale-free topology. This is based on the idea that the probability that a node is connected with k other nodes decays as a power law:

  • p(k)~ k^(-γ)

This idea is linked to the growth of networks–new nodes are more likely to be attached to already established nodes. In general, scale-free networks display a high degree of tolerance against errors (Zhang & Horvath, 2005).

You then turn your adjacency matrix into a Topological Overlap Measure (TOM) to minimize the effects of noise and spurious associations. A topological overlap of two nodes also factors in all of their shared neighbors (their relative interrelatedness)–so you are basically taking a simple co-occurrence between two nodes and placing it in the framework of the entire network by factoring in all the other nodes each is connected to. For more information regarding adjacency matrices and TOMs please see Zhang & Horvath (2005) and Langfelder & Horvath (2007 & 2008).

Start: Obtain an OTU abundance matrix (MB.0.03.subsample.fn.txt) and environmental data (OxygenMatrixMonterey.csv).

The OTU abundance matrix simply has all the different OTUs that were observed in a bunch of different samples (denoted in the Group column; e.g. M1401, M1402, etc.). These OTUs represent 16S rRNA sequences that were assessed with the universal primers 515F-Y (5′-GTGYCAGCMGCCGCGGTAA) and 926R (5′-CCGYCAATTYMTTTRAGTTT) and were created using a 97% similarity cutoff. These populations were previously subsampled to the smallest library size and all processing took place in mothur ( See Wilson et al. (2018) for more details.

The environmental data matrix tells you a little bit more about the different Samples, like the Date of collection, which of two site Locations it was collected from, the Depth or Zone of collection. You also see a bunch of different environmental variables like several different Upwelling indices (for different stations and different time spans), community respiration rate (CR), Oxygen Concentration, and Temperature. Again, see Wilson et al. (2018) for more details.

Code–Initial Stuff:

Read data in:


For this particular file we have to get rid of first three columns since the OTUs don’t actually start until the 4th column:

data1 = data[-1][-1][-1]

You should turn your raw abundance values into a relative abundance matrix and potentially transform it. I recommend a Hellinger Transformation (a square root of the relative abundance)–this effectively gives low weight to variables with low counts and many zeros. If you wanted you could do the Logarithmic transformation of Anderson et al. (2006) here in stead.

library("vegan", lib.loc="~/R/win-library/3.3")
HellingerData<-decostand(data1,method = "hellinger")

You have to limit the OTUs to the most frequent ones (ones that occur in multiple samples so that you can measure co-occurance across samples). I just looked at my data file and looked for where zeros became extremely common. This was easy because mothur sorts OTUs according to abundance. If you would like a more objective way of selecting the OTUs or if your OTUs are not sorted you then this code may help:

lessdata <- data1[,colSums(data1) > 0.05]

(though you will have to decide what cutoff works best for your data).

Code–Making your relative abundance matrix:

You have to reattach the Group Name column:

RelAbun1 = data.frame(data[2],HellingerData[1:750])

Write file (this step isn’t absolutely necessary, but you may want this file later at some point):

write.table(RelAbun1, file = "MontereyRelAbun.txt", sep="\t")

Code–Visualizing your data at the sample level:

Now load the WGCNA package:

library("WGCNA", lib.loc="~/R/win-library/3.3")

Bring data in:


Turn the first column (sample name) into row names (so that only OTUs make up actual columns):

datExpr0 =[,-c(1)]));
names(datExpr0) = names(OTUs)[-c(1)];
rownames(datExpr0) = OTUs$Group;

Check Data for excessive missingness:

gsg = goodSamplesGenes(datExpr0[-1], verbose = 3);

You should get TRUE for this dataset given the parameters above. TRUE means that all OTUs have passed the cut. This means that when you limited your OTUs to the most common ones above that you didn’t leave any in that had too many zeros. It is still possible that you were too choosy though. If you got FALSE for your data then you have to follow some other steps that I don’t go over here.

Cluster the samples to see if there are any obvious outliers:

sampleTree = hclust(dist(datExpr0), method = "average");


par(cex = 0.6);
par(mar = c(0,4,2,0))

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)

The sample dendrogram doesn’t show any obvious outliers so I didn’t remove any samples. If you need to remove some samples then you have to follow some code I don’t go over here.

Now read in trait (Environmental) data and match with expression samples:

traitData = read.csv("OxygenMatrixMonterey.csv");

Form a data frame analogous to expression data (relative abundances of OTUs) that will hold the Environmental traits:

OTUSamples = rownames(datExpr0);
traitRows = match(OTUSamples, traitData$Sample);
datTraits = traitData[traitRows, -1];
rownames(datTraits) = traitData[traitRows, 1];

Outcome: Now your OTU expression (or abundance) data are stored in the variable datExpr0 and the corresponding environmental traits are in the variable datTraits. Now you can visualize how the environmental traits relate to clustered samples.

Re-cluster samples:

sampleTree2 = hclust(dist(datExpr0), method = "average")

Convert traits to a color representation: white means low, red means high, grey means missing entry:

traitColors = numbers2colors(datTraits[5:13], signed = FALSE);

Plot the sample dendrogram and the colors underneath:

plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits[5:13]),
main = "Sample dendrogram and trait heatmap")

Again: white means a low value, red means a high value, and gray means missing entry. This is just initial stuff… we haven’t looked at modules of OTUs that occur across samples yet.


save(datExpr0, datTraits, file = "Monterey-dataInput.RData")

Code–Network Analysis:

Allow multi-threading within WGCNA. This helps speed up certain calculations.
Any error here may be ignored but you may want to update WGCNA if you see one.

options(stringsAsFactors = FALSE);

Load the data saved in the first part:

lnames = load(file = "Monterey-dataInput.RData");

The variable lnames contains the names of loaded variables:


Note: You have a couple of options for how you create your weighted OTU co-expression network. I went with the step-by-step construction and module detection. Please see this document for information on the other methods (

Choose a set of soft-thresholding powers:

powers = c(c(1:10), seq(from = 11, to=30, by=1))

Call the network topology analysis function:
Note: I am using a signed network because it preserves the sign of the connection (whether nodes are positively or negatively correlated); this is recommendation by authors of WGCNA.

sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5, networkType = "signed")


pickSoftThreshold: will use block size 750.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 750 of 750
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.0299 1.47 0.852 399.0000 400.0000 464.00
2 2 0.1300 -1.74 0.915 221.0000 221.0000 305.00
3 3 0.3480 -2.34 0.931 128.0000 125.0000 210.00
4 4 0.4640 -2.41 0.949 76.3000 73.1000 150.00
5 5 0.5990 -2.57 0.966 47.2000 44.0000 111.00
6 6 0.7010 -2.52 0.976 30.1000 27.1000 83.40
7 7 0.7660 -2.47 0.992 19.8000 17.2000 64.30
8 8 0.8130 -2.42 0.986 13.3000 11.0000 50.30
9 9 0.8390 -2.34 0.991 9.2200 7.1900 40.00
10 10 0.8610 -2.24 0.992 6.5200 4.8800 32.20
11 11 0.8670 -2.19 0.987 4.7000 3.3700 26.20
12 12 0.8550 -2.18 0.959 3.4600 2.3300 21.50

This is showing you the power (soft thresholding value), the r2 for the scale independence for each particular power (we shoot for an r2 higher than 0.8), the mean number of connections each node has at each power (mean.k), the median number of connections/node (median.k), and the maximum number of connections (max.k).

Plot the results:

sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;

Scale-free topology fit index (r2) as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],

This line corresponds to using an R^2 cut-off of h:


Mean connectivity as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

I picked a soft thresholding value of 10 because it was well above an r2 of 0.8 (it is a local peak for the r2) and the mean connectivity is still above 0.

So now we just calculate the adjacencies, using the soft thresholding power of 10:

softPower = 10;
adjacency = adjacency(datExpr0, power = softPower, type = "signed");

Then we transform the adjacency matrix into a Topological Overlap Matrix (TOM) and calculate corresponding dissimilarity:

Remember: The TOM you calculate shows the topological similarity of nodes, factoring in the connection strength two nodes share with other “third party” nodes. This will minimize effects of noise and spurious associations:

TOM = TOMsimilarity(adjacency, TOMType = "signed");
dissTOM = 1-TOM

Create a dendogram using a hierarchical clustering tree and then call the hierarchical clustering function:

TaxaTree = hclust(as.dist(dissTOM), method = "average");

Plot the resulting clustering tree (dendrogram):

plot(TaxaTree, xlab="", sub="", main = "Taxa clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04);

This image is showing us the clustering of all 750 OTUs based on the TOM dissimilarity index.

Now you have to decide the optimal module size for your system and should play around with this value a little. I wanted relatively large module so I set the minimum module size relatively high at 30:

minModuleSize = 30;

Module identification using dynamic tree cut (there a couple of different ways to figure out your modules and so you should explore what works best for you in the tutorials by the authors):

dynamicMods = cutreeDynamic(dendro = TaxaTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);

Convert numeric labels into colors:

dynamicColors = labels2colors(dynamicMods)

black blue brown green red turquoise yellow
49 135 113 71 64 216 102

You can see that there are a total of 7 modules (you should have seen that above too) and that now each module is named a different color. The numbers under the colors tells you how many OTUs were sorted into that module. Each OTU is in exactly 1 module, and you can see that if you add up all of the numbers from the various modules you get 750 (the number of OTUs that we limited our analysis to above).

Plot the dendrogram with module colors underneath:

plotDendroAndColors(TaxaTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Taxa dendrogram and module colors")

Now we will quantify co-expression similarity of the entire modules using eigengenes and cluster them based on their correlation:
Note: An eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community.

Calculate eigengenes:

MEList = moduleEigengenes(datExpr0, colors = dynamicColors)
MEs = MEList$eigengenes

Calculate dissimilarity of module eigengenes:

MEDiss = 1-cor(MEs);

Cluster module eigengenes:

METree = hclust(as.dist(MEDiss), method = "average");

Plot the result:

sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")

Now we will see if any of the modules should be merged. I chose a height cut of 0.30, corresponding to a similarity of 0.70 to merge:

MEDissThres = 0.30

Plot the cut line into the dendrogram:

abline(h=MEDissThres, col = "red")

You can see that, according to our cutoff, none of the modules should be merged.

If there were some modules that needed to be merged you can call an automatic merging function:

merge = mergeCloseModules(datExpr0, dynamicColors, cutHeight = MEDissThres, verbose = 3)

The merged module colors:

mergedColors = merge$colors;

Eigengenes of the new merged modules:

mergedMEs = merge$newMEs;

If you had combined different modules then that would show in this plot:

sizeGrWindow(12, 9)

plotDendroAndColors(TaxaTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

If we had merged some of the modules that would show up in the Merged dynamic color scheme.

Rename the mergedColors to moduleColors:

moduleColors = mergedColors

Construct numerical labels corresponding to the colors:

colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;

Save module colors and labels for use in subsequent parts:

save(MEs, moduleLabels, moduleColors, TaxaTree, file = "Monterey-networkConstruction-stepByStep.RData")

Code–Relating modules to external information and IDing important taxa:

Here you are going to identify modules that are significantly associate with environmental traits/biogeochemical rates. You already have summary profiles for each module (eigengenes–remeber that an eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community), so we just have to correlate these eigengenes with environmental traits and look for significant associations.

Defining numbers of OTUs and samples:

nTaxa = ncol(datExpr0);
nSamples = nrow(datExpr0);

Recalculate MEs (module eigengenes):

MEs0 = moduleEigengenes(datExpr0, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);

Now we will visualize it:


textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));

labeledHeatmap(Matrix = moduleTraitCor,
xLabels = names(datTraits),
yLabels = names(MEs),
ySymbols = names(MEs),
colorLabels = FALSE,
colors = greenWhiteRed(50),
textMatrix = textMatrix,
setStdMargins = FALSE,
cex.text = 0.5,
zlim = c(-1,1),
main = paste("Module-trait relationships"))

Each row corresponds to a module eigengene and each column corresponds to an environmental trait or biogeochemical rate (as long as it is continuous–notice that the categorical variables are gray and say NA). Each cell contains the corresponding Pearson correlation coefficient (top number) and a p-value (in parentheses). The table is color-coded by correlation according to the color legend.

You can see that the Brown module is positively correlated with many indices of upwelling while the Black module is negatively correlated with many indices of upwelling. For this work I was particularly interested in CR and so I focused on modules the positively or negatively correlated with CR. The Red module was negatively associated with CR while the Blue module was positively associated with CR.

Let’s look more at the Red module by quantifying the associations of individual taxa with CR:

First define the variable we are interested in from datTrait:

CR =$CR);
names(CR) = "CR"

modNames = substring(names(MEs), 3)
TaxaModuleMembership =, MEs, use = "p"));
MMPvalue =, nSamples));
names(TaxaModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
TaxaTraitSignificance =, CR, use = "p"));
GSPvalue =, nSamples));
names(TaxaTraitSignificance) = paste("GS.", names(CR), sep="");
names(GSPvalue) = paste("p.GS.", names(CR), sep="");

module = "red"
column = match(module, modNames);
moduleTaxa = moduleColors==module;
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(TaxaModuleMembership[moduleTaxa, column]),
abs(TaxaTraitSignificance[moduleTaxa, 1]),
xlab = paste("Module Membership in", module, "module"),
ylab = "Taxa significance for CR",
main = paste("Module membership vs. Taxa significance\n"),
cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

This graph shows you how each taxa (each red dot is an OTU that belongs in the Red module) correlated with 1) the Environmental trait of interest and 2) how important it is to the module. The taxa/OTUs that have high module membership tend to occur whenever the module is represented in the environment and are therefore often connected throughout the samples with other red taxa/OTUs. In this module, these hubs (Red OTUs that occur with other Red OTUs) are also the most important OTUs for predicting CR.

Now lets get more info about the taxa that make up the Red module:

First, merge the statistical info from previous section (modules with high assocation with trait of interest–e.g. CR or Temp) with taxa annotation and write a file that summarizes these results:


You will have to feed in an annotation file–a file listing what Bacteria/Archaea go with each OTU (I am not providing you will this file, but it just had a column with OTUs and a column with the Taxonomy).

annot = read.table("MB.subsample.fn.0.03.cons.taxonomy",header=T,sep="\t");
probes = names(datExpr0)
probes2annot = match(probes, annot$OTU)

Check for the number or probes without annotation (it should return a 0):


Create the starting data frame:

TaxaInfo0 = data.frame(Taxon = probes,
TaxaSymbol = annot$OTU[probes2annot],
LinkID = annot$Taxonomy[probes2annot],
moduleColor = moduleColors,

Order modules by their significance for weight:

modOrder = order(-abs(cor(MEs, CR, use = "p")));

Add module membership information in the chosen order:

for (mod in 1:ncol(TaxaModuleMembership))
oldNames = names(TaxaInfo0)
TaxaInfo0 = data.frame(TaxaInfo0, TaxaModuleMembership[, modOrder[mod]],
MMPvalue[, modOrder[mod]]);
names(TaxaInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
paste("p.MM.", modNames[modOrder[mod]], sep=""))

Order the OTUs in the geneInfo variable first by module color, then by geneTraitSignificance:

TaxaOrder = order(TaxaInfo0$moduleColor, -abs(TaxaInfo0$GS.CR));
TaxaInfo = TaxaInfo0[TaxaOrder, ]

Write file:

write.csv(TaxaInfo, file = "TaxaInfo.csv")

Here is a bit of the output file I got:












NOTES on output:

moduleColor is the module that the OTU was ultimately put into

GS stands for Gene Significance (for us it means taxon significance) while MM stands for module membership
GS.Environmentaltrait = Pearson Correlation Coefficient for that OTU with the trait
Output: p.GS.Environmentaltrait = P value for the preceeding relationship

MM column gives the module membership correlation (correlates expression with module eigengene of a given module). If close to 0 then the taxa is not part of that color module (since each OTU has to be put in a module you may get some OTUs that are close to 0, but they aren’t important to that module). If it is close to 1 or -1 then it is highly connected to that color module genes/taxa.
MM.color = Pearson Correlation Coefficient for Module Membership–i.e. how well that OTU correlates with that particular color module (each OTU has a value for each module but only belongs to one module)
p.MM.color = P value for the preceeding relationship

GS allows incorporation of external info into the co-expression network by showing gene significance. The higher the absolute value of GS the more biologically significant the gene (or in our case taxa).
Modules will be ordered by their significance for weight, with the most significant ones to the left.
Each of the modules (with each OTU assigned to exactly one module) will be represented for the environmental trait you selected
You will have to rerun this for each environmental trait you are interested in


Posted By: Jeff on October 11, 2018

A new NASA Postdoctoral Program (NPP) opportunity was posted today for our Oceans Across Space and Time (OAST) initiative.  What’s OAST?  I can’t tell you yet, because the relevant press releases have been stuck in purgatory for several weeks.  Hopefully I can write that post next week.  Nonetheless you can figure out what we’re all about based on the NPP opportunity here, and pasted below.  If you’re interested act fast, NPP proposals are due November 1!

Past and present oceans are widely distributed in our solar system and are among the best environments to search for extant life. New tools, techniques, and strategies are required to support future life detection missions to ocean worlds. Oceans Across Space and Time (OAST) is a new project under the Network for Life Detection (NfoLD) research coordination network. OAST seeks to understand the distribution, diversity, and limits of life on contemporary, remnant, and relict ocean worlds to facilitate the search for extant life. Central to this effort is the development of an Ocean Habitability Index that characterizes life in the context of the physicochemical stressors and metabolic opportunities of an ocean environment. OAST will be developing the OHI based on field efforts in surficial hypersaline environments and deep hypersaline anoxic basins, and from laboratory-based experimental studies.

Postdoctoral fellows are sought to support OAST activities across three themes: characterizing ocean habitability, detecting and measuring biological activity, and understanding diversity and the limits of evolution. Postdoctoral fellows are expected to interact across two or more institutions within OAST. Participating institutions are Georgia Institute of Technology (Schmidt, Glass, Ingall, Reinhardt, Stewart, Stockton), Scripps Institution of Oceanography at UC San Diego (Bowman, Bartlett), Massachusetts Institute of Technology (Pontrefact and Carr), Louisiana State University (Doran), University of Kansas (Olcott), Stanford University (Dekas), Blue Marble Space Institute of Science (Som), and the University of Texas at Austin (Soderlund). Candidates should contact the PI (Schmidt) and two potential mentors early in the proposal process to scope possible projects in line with the main directions of OAST and NFoLD.

Posted By: Jeff on September 30, 2018

We have an opening for a postdoctoral scholar in the area of bioinformatics and predictive analytics.  The ideal candidate will have demonstrated expertise in one of these two areas and a strong interest in the other.  Possible areas of expertise within predictive analytics include the development and application of neural networks and other machine learning techniques, or nonlinear statistical modeling techniques such as empirical dynamical modeling.  Possible areas of expertise within bioinformatics include genome assembly and annotation, metagenomic or metatranscriptomic analysis, or advanced community structure analysis.  Candidates should be proficient with R, Python, or Matlab, familiar with scientific computing in Linux, and fluent in English.

The successful candidate will take the lead in an exciting new industry collaboration to predict process thresholds from changes in microbial community structure.  The goal of this collaboration is not only to predict thresholds, but to fully understand the underlying ecosystem dynamics through genomics and metabolic modeling.  The candidate will be in residence at Scripps Institution of Oceanography at UC San Diego, but will have the opportunity to work directly with industry scientists in the San Diego area.  The candidate is expected to take on a leadership role in the Bowman Lab, participate in training graduate students and undergraduates, represent the lab at national and international meetings, and publish their work in scholarly journals.  The initial opportunity is for two years with an option for a third year contingent on progress made.  The position starts January of 2019.

Interested candidates should send a CV and a brief, 1-page statement of interest to Jeff Bowman at jsbowman at no later than November 1, 2018.

Posted By: Jeff on September 07, 2018

One of the most popular primer sets for 16S rRNA gene amplicon analysis right now is the 515F/806R set. One of the advantages of this pair is that it amplifies broadly across the domains Archaea and Bacteria. This reduces by half the amount of work required to characterize prokaryotic community structure, and allows a comparison of the relative (or absolute, if you have counts) abundance of bacteria and archaea.  However, paprica and many other analysis tools aren’t designed to simultaneously analyze reads from both domains.  Different reference alignments or covariance models, for example, might be required.  Thus it’s useful to split an input fasta file into separate bacterial and archaeal files.

We like to use the Infernal tool cmscan for this purpose.  First, you’ll need to acquire covariance models for the 16S/18S rRNA genes from all three domains of life.  You can find those on the Rfam website, they are also included in paprica/models if you’ve downloaded paprica.  Copy the models to new subdirectory in your working directory while combining them into a single file:

mkdir cms
cat ../paprica/models/*cm cms/
cd cms

Now you need to compress and index the covariance models using the cmpress utility provided by Infernal.  This takes a while.


Pretty simple.  Now you’re ready to do some work.  The whole Infernal suite of tools has pretty awesome built-in parallelization, but with only three covariance models in the file you won’t get much out of it.  Best to minimize cmscan’s use of cores and instead push lots of files through it at once.  This is easily done with the Gnu Parallel command:

ls *.fasta | parallel -u cmscan --cpu 1 --tblout {}.txt cmscan/ {} > /dev/nul

Next comes the secret sauce.  The command above produces an easy-to-parse, easy-to-read table with classification stats for each of the covariance models that we searched against.  Paprica contains a utility in paprica/utilities/ to parse the table and figure out which model scored best for each read, then make three new fasta files for each of domains Bacteria, Archaea, and Eukarya (the primers will typically pick up a few euks).  We’ll parallelize the script just as we did for cmscan.

ls *.fasta | parallel -u python -prefix {} -out {}

Now you have domain-specific files that you can analyze in paprica or your amplicon analysis tool of choice!