Chasing Microbes in Antarctica

At the base of the polar food chain in the icy waters off Antarctica, phytoplankton are an essential food source for young krill, which in turn sustain many species of marine wildlife. Jeff Bowman is in Antarctica for the field season studying how phytoplankton and bacteria interact, particularly their cooperative interactions. Toxic compounds produced by phytoplankton, for example, may be cleaned up by bacterial partners, allowing photosynthesis to proceed more efficiently, ultimately meaning more food in the food web. 

 ********

Posted By: Emelia Chamberlain on August 18, 2019

These guys are a bit bigger than the microbial organisms we usually study in the Bowman Lab, but are absolute models under a standard light microscope. Here you can see two rotifers (far left with egg sac, and top center) a type of microscopic invertebrate commonly found in freshwater. (PC: E.J. Chamberlain)

Hello! It’s Emelia again – to learn more about me and my research in the Bowman Lab check out this post. I have recently returned from 2 weeks in the Canadian Arctic where I attended an absolutely incredible summer field course entitled “Arctic Microbiomes: From molecules and microbes to ecosystems and health” through the Sentinel North International PhD School at Universite Laval in Quebec, Canada. This course emphasized an interdisciplinary approach to asking (and answering) questions about the role of microbiomes in the Arctic. A microbiome represents the complex interactions of microscopic life (bacteria, archaea, phytoplankton, fungi, viruses, etc.) within a specific habitat. And just as the community that makes up a human gut microbiome can give insights into the health of a person, the diversity of Arctic – soil, pond, sea-ice etc. – microbiomes can give insights into the health of Arctic ecosystems. The Arctic is one of the most rapidly changing places on Earth with warmer temperatures and less ice each year. Key to understanding the broader ecosystem (including human) impacts of this rapid change we must first understand the dynamics of these microbial worlds and how they might buffer, accelerate, or shift in response to, the changing Arctic climatic state.

(PC: Charles W. Greer) Great learning can happen anytime, anywhere. From the classroom…

…to the Great Whale River! (PC: E.J. Chamberlain)

The course was based out of the Center for Northern Studies in Whapmagoostui-Kuujjuarapik. Not entirely remote, there are about 1,400 inhabitants between the Cree First Nation and Inuit communities living in the adjacent villages of Whapmagoostui and Kuujuarapik. The research complex is located at 55º N along the coast of Hudson Bay and is one of 10 stations in the Canadian Network of Northern Research Operators. This field school was fun and informative for many reasons, but here I will briefly recite the top of the list.

Locations of some of the CEN stations, including Whapmagoostui-Kuujjuarapik. PC: CEN

Full Research Complex, taken in the evening (~9 PM) with kitchen (center) and dorm/lab buildings (right). (PC: E.J. Chamberlain)
Main CEN building, run in collaboration with the Cree First Nation of Whapmagoostui as a community center. (PC: E.J. Chamberlain)

1. It really was an International PhD school

(PC: © Pierre Coupel/Seninelle Nord- Universite Laval)

18 students from all around the globe came together to study the microbiota of the Arctic. Every continent was accounted for (and we’ll include Antarctica, considering that many of these polar researchers have spent quite a bit of time there) and there was the possibility that ~5 languages were being spoken simultaneously at any given time. The diversity of this group also extended to scientific expertise – between students and mentors there was a spectrum of research experience, from medical studies of the human gut microbiome to soil microbial ecology and astrobiology. However, while scientific interest may have brought us together, after 10 days of dorm life, sharing meals, and surviving long days in the field, the personal connections and budding cross-continental friendships are what made this school truly unique.  

2. Collaborations with the Cree First Nation

Learning about the native plants of the area and their traditional medicinal & household uses by the Cree community. (PC: E.J. Chamberlain)

Speaking of a cross-cultural experience – as the research complex is on Cree land, it is run in collaboration with the Cree First Nation of Whapmagoostui. Upon our arrival at the station, we were addressed by the Chief – who also happened to be the first female Chief elected in Cree history! She emphasized the importance of learning from the land and provided a human perspective to how we think about research in the North and the challenges facing their community. This type of knowledge exchange continued throughout the school from a science & microscopes workshop held at the local grocery store to traditional tipi building at the research complex. Led by locals, we chopped and prepared the trees ourselves; finally constructing the tipi on our last day at the base. The school also coincided with a yearly heritage festival and we were honored to be included in the local gathering. I learned a lot from the Cree elders, particularly the many changes that they’ve seen in the environment during their lifetimes; an important reminder that climate change is just as much (in fact more of) a human issue as an environmental one.

The finished product! The next group of base-bound researchers will be in charge of adding canvas for the walls. (PC: E.J. Chamberlain)

Sunny – our leader through the tree cutting process helps students place their trim poles into the right position (PC: E.J. Chamberlain)

3. Fieldwork

While sailing to our sample sites we are able to test equipment and ensure that collections will run smoothly. Here I help test out the depth finder while we make our way through the mist. (PC: Flora Amill)

I am a sucker for field work, to me it is the best (and most fun) way to explore the natural world. Even with a rigorous and scientific sampling scheme, there is always the chance to see something new. And this school provided a TON of it in an absolutely GORGEOUS environment – mosquitos and all. One of my favorite days was when we sailed out onto the Great Whale River to take water samples and measure the river’s chemical properties using a hand-held CTD. The water and mist warded off the worst of the mosquitos and I had the opportunity to try out new, state of the art sampling equipment! (Plus I always enjoy a good day on the water.) Some of the other highlights were sampling the local ponds and lakes for cyanobacteria – a type of photosynthetic bacteria that, in these regions, grow in thick filamentous mats. (Formerly known as blue-green algae). It was especially neat because nearby there were some stromatolites – ancient fossilized cyanobacteria from early Earth. These ancient cyanobacteria are responsible for filling the atmosphere with oxygen and making Earth habitable for life like us. In one day we touched the past and collected samples from the present to ask scientific questions about the future.

This sedimentary rock is actually a stromatolite formed from layers of ancient cyanobacteria growth. Cyanobacteria secretes a sticky mucus that binds sediment grains into fine mineral layers that fossilize into the rings seen here. (PC: E.J. Chamberlain)

Sampling microbial mats is all about having the right tools – from bug nets to your good ‘ole Canadian Tire spatula… It’s all in the wrist. (PC: E.J. Chamberlain)

While the weather didn’t cooperate enough for us to actually sample there, we were also able to get a helicopter tour of some of the local permafrost sites! Permafrost encompasses any ground (soil, rock, etc.) that is completely frozen (<0ºC) for at least two consecutive years. However, most permafrost has been frozen for much, much longer than that. The soils are held together by ice and, historically, have been so solidly frozen in some areas that builders considered it more stable to construct on than concrete. In the northern hemisphere, about 1/4 of the land area is made up of permafrost and it is currently melting at unprecedented rates. This not only poses a threat to shorelines and infrastructure but is rapidly and unpredictably changing the microbial communities that live in this unique environment.

Permafrost mounds seen from the helicopter. As the permafrost melts, organic carbon (frozen ancient plant biomass) is released into the adjacent meltwater ponds where it is consumed by hungry bacteria and archaea. The activity rates of this Arctic ~microbiome~ determines how much of this carbon is released into the atmosphere as carbon dioxide or methane – both greenhouse gases. (PC: E.J. Chamberlain)

4. Scientific Expertise & Laboratory Work

Running a qPCR (quantitative polymerase chain reaction). PCR is a technique to make copies of, or amplify, targeted genetic material. qPCR quantifies that material. Here we looked to quantify the amount of toxin-producing genes in our cyanobacteria samples. (PC: E.J. Chamberlain)

As this was a microbiology field school, a good portion of our time was spent analyzing samples in the lab. Many of the techniques we used were similar to the ones we employ here in the Bowman Lab but there was still a lot for me to learn. The first step in most microbiome studies is to simply see who is there. To do this, we extracted genetic material from our samples for DNA sequencing. The first step in this process requires breaking apart the cells from your environmental samples, releasing their genetic material. Then, through a series of chemical reactions and washing steps, this material is extracted from the sample and ready to be amplified and sequenced. Using field-kits and portable sequencing devices, this process can be long and arduous, but thankfully we had many hands in the lab and an excellent cell-phone DJ. By the end of the week we were able to sequence the metagenomes from several of our sampled sites. Then, even without internet (the horror), through the incredible expertise of our mentors, we were able to analyze the diversity of the microbial communities. By pairing who is there with environmental parameters and rate measurements like gas fluxes, we are able to paint a picture of the current functionality and ecosystem services that microscopic life provides.

Measuring the oxygen profile of a microbial mat. (PC: E.J. Chamberlain)

Based on the rotational schedule of this field school I spent most of my days in the lab following those cyanobacteria mats through their subsequent analyses. First we measured the amount of oxygen in each layer of the mats using a micro sensor. This probe allows us to measure O2 gas on the micro-meter scale, giving us an in depth profile for each mat. The top of the mats are photosynthetic, with the highest concentration of chlorophyll just below the surface layer. Towards the bottom of the mats however, respiration becomes the dominant process, and some of the mats even had anoxic bottom layers. This distinct layering would indicate a change in community composition with depth (both cyanobacteria species and other bacteria & viruses that call this mat structure home). To test this, we dissected the mats vertically, separating out layers based on the depth where we saw a distinct change in the oxygen profile. These layers could somewhat be characterized by color which created an easily visible distinction for dissection. These layers were then placed in tubes and analyzed separately in all further analyses.

The dissection station and colorful results (right).
(PC: E.J. Chamberlain).

At the end of the course, we worked on synthesizing all of our results to draw some conclusions about the microbial ecosystems we had been studying for the past week and a half. Each presentation turned into an exciting scientific discussion relying heavily on the diverse expertise and research experience of the mentors and students. I feel incredibly lucky to have been able to learn from these experts and practice the full scientific process in such a unique place.

5. Exploring the North

The North is a fascinating place to do research. We know so little about its environmental processes and there are many scientific questions still begging to be asked. More than that however, the stunning and surprisingly diverse environment, rapidly shifting weather conditions, and richly unique flora and fauna make it a true adventure to explore. Here are some of the pictures I took which I think best capture the north’s wild beauty and ecological diversity.

Rapidly shifting and unpredictable weather makes planning for the field difficult and often delays flights south. (PC: E.J. Chamberlain)

This photo was only taken an hour before the one to the right. The fog rolled in and out constantly most days. (PC: E.J. Chamberlain)

Even in mid July, Hudson Bay was still thick with melting sea ice. It was otherworldly to see the rotted ice washed up on the beach, particularly in contrast to the lush fields & forests nearby. (PC: E.J. Chamberlain)

(PC: E.J. Chamberlain)

(PC: E.J. Chamberlain)

A birds-eye (helicopter) view of the Great Whale River. (PC: E.J. Chamberlain)

Fauna: An adolescent black bear eyes us from the riverbank. (PC: E.J. Chamberlain)
Flora: Cladonia stellaris, or my new favorite lichen. While it looks plant-like, lichen is actually made of two types of microbes – algae and fungi – living in symbiosis. This lichen is an important food source for caribou and reindeer, giving it the common name “reindeer lichen”.
(PC: E.J. Chamberlain)

That’s all for now, folks! To learn more about what I and the rest of this year’s students were up to during the Sentinelle Nord IPS you can check out the group’s field blog here, or follow me on twitter @Antarctic_Emma (see #SNAM19).

(PC: Ligia F. Coelho)

Posted By: Jeff on August 15, 2019

A couple of months ago I was fortunate to have the opportunity to give a lecture at the Birch Aquarium at Scripps in the Perspectives on Ocean Science lecture series. The lecture covered some emerging topics in Arctic Oceanography and provided a brief intro to the upcoming MOSAiC expedition. The lecture was broadcast by UCTV and can be found here. Matthias Wietz – sorry for botching your introduction on the title slide! (Matthias was a PhD student at the Technical University of Denmark when the picture was taken. The record has been set straight.)

Posted By: Jeff on August 11, 2019

Last week we were busy hosting the inaugural Oceans Across Space and Time (OAST, @Space Oceans OAST on Facebook) combined first year meeting and field effort. It was a crazy week but a huge success. The goal of OAST is to improve life detection efforts on future NASA planetary science missions by better understanding how biomass and activity are distributed in habitats that mimic past or present “ocean worlds”. Ocean worlds is a concept that has gained a lot of traction in the last few years (see our Roadmap to Ocean Worlds synthesis paper here). We have a lot of past or present ocean worlds in our solar system (Earth obviously, but also Mars, Europa, Enceledus, and a whole host of other ice-covered moons), and oceans are seen as a natural feature of planetary bodies that are more likely to host life. Our first year effort focused on some open-ocean training for the Icefin robot, designed for exploring the protected spaces below floating ice shelves, and a multi-pronged investigation of the South Bay Salt Works.

The South Bay Salt Works in Chula Vista, CA. A truly amazing site for exploring how microbial activity and biomass are distributed across environmental gradients.

The Salt Works are an amazing environment that my lab has visited previously (see here and here). Our previous work in this environment has raised more questions than answers, so it was great to hit a few of our favorite spots with a top-notch team of limnologists, microbiologists, geochemists, and engineers.

Part of the OAST team setting up next to some very high salinity NaCl-dominated lakes. The pink color of the lakes is the true color, and is common to high salinity lakes. The color comes from carotenoid pigments in the halophilic archaea that dominate these lakes.

This is what I love about NASA – it’s an agency that develops the most sophisticated technology in the history of human civilization, but isn’t afraid to use a rock when the situation calls for it. Spanning several millennia of technological advancement is Maddie Myers (LSU), with Natalia Erazo (SIO) and Carly Novak (Georgia Tech) in the background.

Carly Novak (Georgia Tech) sampling salts with Peter Doran (LSU) and his “surfboard of science” in the background.

Doug Bartlett (SIO), a little out of his element at only 1 atm.

Posted By: Jeff on July 01, 2019

I’m thrilled to learn that my CAREER proposal was just funded by NSF-OPP, though I’m slightly disappointed that they made me change the title from IM-HAPPIER: Investigating Marine Heterotrophic Antarctic Processes, Paradigms, and Inferences through Research and Education to Understanding Microbial Heterotrophic Processes in Coastal Antarctic Waters. Apparently NSF is the only federal agency that doesn’t like a good acronym. This project will address open questions regarding the diversity and ecological function of heterotrophic bacteria and protists in coastal Antarctica. In particular there will be an emphasis on better understanding the mechanisms of bacterial mortality (i.e. protist bacterivory and viral lysis) and the implications for carbon flow through Antarctic marine ecosystems.

We’re coming for you! An unidentified protist (likely mixotrophic member of the genus Teleaulax) captured by microscope at Palmer Station in 2015. Heterotrophic bacteria and protists are ubiquitous in Antarctic waters, but we know surprisingly little about their genetic makeup or ecology.

This project means that after heading north for MOSAiC in 2020, the lab will be heading south for two field seasons in Antarctica. That work will be spearheaded by incoming PhD student Beth Connors. Although several lab members have or will soon be participating in the Palmer LTER cruise along the western Antarctic Peninsula, I haven’t been to the WAP since 2015. Looking forward to going back!

Palmer Station and the ARSV Laurence M. Gould.

CAREER proposals emphasize both education and research, so in addition to field, laboratory, and modeling work we will be developing a new summer Junior Academy course for Sally Ride Science on polar ecology and oceanography.

Posted By: Jeff on June 18, 2019

Thanks to Jesse and Natalia for their help yesterday with the Discover America program run by the US State Department; SIO successfully hosted 35 foreign ambassadors and their spouses for an educational tour of the Scripps Pier. It was quite an experience. Jesse and I successfully dodged the photographers, but here’s a photo of Natalia talking science (presumably) with the ambassador of Cabo Verde and his wife. Downside: not a single diplomat or spouse wanted to go swimming, despite dolphins and balmy water temps!

Natalia talks science with his Excellency Carlos Alberto Wahnon De Carvalho Veiga and Ms. Maria Epifania Cruz Almeida of Cabo Verde. I presume they’re discussing biogeochemical cycling in mangrove forests. Credit: Scripps Communications

Posted By: Jeff on June 12, 2019

As a quick followup to Emelia’s post (https://www.polarmicrobes.org/training-for-mosaic-bremerhaven-utqiagvik/) on training for MOSAiC, there is a nice piece out today in the Washington Post on the US-based training for MOSAiC here: https://www.washingtonpost.com/graphics/2019/national/science/arctic-sea-ice-expedition-to-study-climate-change/?utm_term=.2552b79d5a32. It’s alarming to realize that Polarstern will depart from Tromsø, Norway on September 20 – just 100 days from now!

Posted By: Emelia Chamberlain on April 15, 2019

A photo of me with the famous Utqiagvik whale-bone arch, and behind, the Chukchi Sea.

Hello! My name is Emelia Chamberlain and I am a first year PhD student here in the Bowman Lab working on the MOSAiC project. I just got back from a very exciting week in Utqiagvik Alaska for MOSAiC snow and ice training. But first, an overview… As mentioned in an earlier post, the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) project is an international effort to study the Arctic ocean-ice-atmosphere system with the goal of clarifying key climatic and ecological processes as they function in a changing Arctic. Within the larger scope of this project, our lab and collaborators from the University of Rhode Island (URI) will be studying how microbial community structure and ecophysiology control fluxes of oxygen and methane in the central Arctic Ocean.

MOSAiC begins in Sept of 2019, when the German icebreaker RV Polarstern will sail into the Laptev Sea and be tethered to an ice flow. Once trapped in the ice, both ship & scientists will spend the next year drifting through the Arctic. The goal is to set up a central observatory and collect time-series observations across the complete seasonal cycle. This year-long time series will be both exciting and critical for the future of Arctic research, but it is logistically difficult to carry out. The cruise is split up into 6 “legs”, with scientists taking two month shifts collecting observations and living the Arctic life. Resupply will be carried out by other icebreakers and aircraft. I myself will be taking part in the last two legs of this project from June – October 2020, with Jeff, Co-PI Brice Loose (URI), and his post-doc Alessandra D’Angelo (URI) representing our project on the rest of the voyage.

A representation of the central observatory taken from the MOSAiC website

Laboratory training in Bremerhaven, Germany

As one would imagine, with over 600 scientists involved and continuous measurements broken up between multiple teams, this project requires a LOT of advanced planning. However, this is the fun part, as it means we get to travel a lot in preparation! In March, Jeff and I traveled to Potsdam, Germany to participate in a MOSAiC implementation workshop. Shortly after, we took a train up to the Alfred Wegener Institute facilities in Bremerhaven with Brice, Alessandra, and other MOSAiC participants to train on some of the instrumentation we will be operating on the Polarstern. We spent a full week training on instruments like a gas chromatograph, gas-flux measurement chambers, and a membrane inlet mass spectrometer (MIMS). While many of us had operated these types of instruments before, each machine is different and several were engineered or re-designed by participating scientists specifically for MOSAiC.

The AWI engineered MIMS that will be onboard Polarstern. The bubbling chamber ensures precise, daily calibrations (and looks really cool).
A specially designed gas-flux chamber for measuring metabolic gas fluxes in both snow and ice. Photo courtesy of Brice Loose (URI)

The bulk of the training was focused on the MIMS, which will be used to take continuous underway ∆O2/Ar measurements from surface waters during MOSAiC. Water is piped from below the Polarstern and run through the mass spectrometer where dissolved gas concentrations are measured. Argon (Ar), a biologically inert gas, is incorporated into the ocean’s mixed layer at the same rate as oxygen (O2). However, while argon concentrations are evenly distributed, oxygen concentrations are affected by biogeochemical processes (photosynthesis and respiration by biota). We can therefore compare oxygen and argon measurements in the water column to determine how much oxygen has deviated from what we would expect through physical air-sea exchange processes (i.e. deviations from biologic activity). From these oxygen fluxes, we can estimate Net Community Production (NCP), which is defined as the total amount of chemical energy produced by photosynthesis minus that which is used in respiration. This is an important balance to quantify, as it is representative of the amount of carbon removed biologically from the atmosphere (CO2) and sequestered into the ocean pool. The goal is to use these continuous MOSAiC measurements to quantify these biogeochemical budgets through time and get a better understanding of whether the Arctic is net phototrophic or heterotrophic – whether photosynthesis or respiration is the dominant process.  

A behind-the-scenes view of operating the MIMS – photo courtesy of Brice Loose (URI).

Learning how to remove and clean the equilibration tubes These tubes bubble gases into the water for calibration.
PC: Brice Loose (URI)

We will be partially responsible for operating this instrument during our respective legs, and therefore spent a lot of time thinking about what might possibly go wrong during a year on an ice-locked vessel… and how to fix it PC: Brice Loose (URI)

Field training in Utqiagvik, Alaska

Utqiagvik, Alaska (formerly Barrow) is located at the northern tip of Alaska situated between the Chukchi and Beaufort seas. It boasts the northern most point in continental North America.

After a productive week in Bremerhaven, this past week we stepped outside the laboratory with a snow and ice field training session in Utqiagvik, Alaska. One of the challenges of Arctic fieldwork is, of course, that it takes place in the frigid Arctic environment. To help scientists prepare for life on the ice and to help standardize/optimize sampling methods for MOSAiC, there were 3 snow and ice field training sessions organized (the two others took place earlier this year in Finland.) This trip was particularly exciting for me, as it was my first time in the Arctic! Not only did I learn a lot about sampling sea ice but I was struck by the dynamic beauty of the polar landscape. No wonder researchers continue to be fascinated with the unanswered questions of this wild ecosystem.

Up close and personal with a large pressure ridge. Pressure ridges in sea ice are formed when two ice floes collide with each other. You can tell that this ridge was formed from multi-year ice by the thickness of the blocks and their deep blue color. Ice is classified as multi-year when it has survived multiple melt seasons.

Post-doc J.P. Balmonte from Uppsala University meanders his way along the pressure ridge.

The three trainings that everyone had to complete consisted of snow sampling, ice sampling and snow mobile training. Aside from that, people were able to learn or compare more advanced methods for their sampling specialities and test out gear, both scientific and personal weather protection. I was lucky in that the average -18ºC weather we experienced in Utqiagvik will most likely be representative of the type of weather I will be facing in the summer months of MOSAiC. The winter teams will have to contend with quite a bit cooler conditions.

Some days are windier than others and it’s very important to bundle up. However, on this trip I also learned that layers are very important. Working on the ice, especially coring, can be hard work and you don’t want to overheat. Should I need to remove it, beneath my big parka I’ve got on a light puffy jacket, a fleece, and a wool thermal under-layer.

Digging snow-pits is an important aspect for sampling parameters like snow thickness and density. The goal is to get a clear vertical transect of snow to examine depth horizons and sample from. If you look closely, you can see 2 cm thick squares of snow which have been removed from the pit’s wall and weighed before discarding. The wall is built from the snow removed from the working pit and is intended to block researchers from the wind.

Note the meter-stick for snow thickness.
This is a work view I could get used to.

Coring practice! The extension pole between the corer and drill indicate that this is some pretty thick ice. PC: Jeff Bowman

One of the most exciting trainings we had was on how to operate the snow mobiles. These are a critical form of transport on the ice. They often have sleds attached with which to transport gear and samples to and from the ship. As such, we researchers are expected to be able to drive them properly (plus it was pretty fun and allowed us to reach more remote ice locations over our short week in Utquiagvik).

Once out on the ice we practiced tipping the machines over… and how to right them again.
Learning the basics! Note the sled behind ready to be attached to the machine.

While in Utqiagvik, we here at the Bowman Lab decided to make the most of this trip by also collecting some of our own sea-ice cores to sample and experiment with. The goal of our experiment is to determine the best method for melting these cores (necessary for sampling them) while providing the least amount of stress to the resident microbial communities that we are interested in sampling for. I will write up a post covering the methods and ideas behind this experiment soon – but in the meantime, please enjoy this excellent go-pro footage from beneath the ice captured by Jeff during our fieldwork. The brown gunk coating the bottom of the ice is sea-ice algae, mostly made up of diatoms. The ice here is only 68 cm thick allowing for a lot of light penetration and an abundant photosynthetic community. At the end, you can also note the elusive Scientists in their natural sampling habitat.

What’s next?

Jeff looks to the horizon.

As Sept 2019 gets closer, preparations are likely to ramp up even more. Even though I won’t be in the field for another year, it is exciting to think that the start of MOSAiC is rapidly approaching and after these two weeks of training I am feeling much more prepared for the scientific logistics and field challenges that will accompany this research. However, there is still much more to come. In a few weeks I will be jetting off again, but this time to URI to meet up with our collaborators for more instrument training. And thus the preparations continue…

Posted By: Jeff on March 11, 2019

The output from our paprica pipeline for microbial community structure analysis and metabolic inference has changed quite a lot over the last few months. In response to some recent requests here’s a tutorial that walks through an ordination and a few basic plots with the paprica output. The tutorial assumes that you’ve run dada2 on your samples (see starter script here), then paprica (follow this tutorial if needed). I’ll be using data from our recent seagrass paper, grab these data using NCBI prefetch (SRX4496910-SRX4496954) or follow along with your own. Once you’ve run paprica it’s all the same!

First, because paprica operates independently on each sample we need to aggregate the output. This is easily accomplished with the combine_edge_results.py script that you can find in paprica/utilities. Copy this script to your working directory, then execute like this:

./combine_edge_results.py -edge_in bacteria.edge_data.csv -path_in bacteria.sum_pathways.csv -ec_in bacteria.sum_ec.csv -o 2017.07.03_seagrass_bacteria -unique_in bacteria.unique_seqs.csv

I’m not going to describe the output files in detail, but basically that command provides 1) an edge table (2017.07.03_seagrass_bacteria.edge_tally.csv) that is analogous to an OTU abundance table, 2) a unique read table (2017.07.03_seagrass_bacteria.unique_tally.csv), 3) a file mapping each read to a taxonomic lineage, and 4) a file of additional information on edges, provided as a mean for each sample. For our analysis lets bring the pertinent files into R and do some pre-processing:

## read in the edge and unique abundance tables

tally <- read.csv('2017.07.03_seagrass_bacteria.edge_tally.csv', header = T, row.names = 1)
unique <- read.csv('2017.07.03_seagrass_bacteria.unique_tally.csv', header = T, row.names = 1)

## read in edge_data and taxon_map

data <- read.csv('trinh_bact_samples.edge_data.csv', header = T, row.names = 1)
taxa <- read.csv('trinh_bact_samples.taxon_map.txt', header = T, row.names = 1, sep = '\t', as.is = T)

## convert all na's to 0, then check for low abundance samples

tally[is.na(tally)] <- 0
unique[is.na(unique)] <- 0
rowSums(tally)

## remove any low abundance samples (i.e. bad library builds), and also
## low abundance reads. This latter step is optional, but I find it useful
## unless you have a particular interest in the rare biosphere. Note that
## even with subsampling your least abundant reads are noise, so at a minimum
## exclude everything that appears only once.

tally.select <- tally[rowSums(tally) > 5000,]
tally.select <- tally.select[,colSums(tally.select) > 1000]

unique.select <- unique[rowSums(unique) > 5000,]
unique.select <- unique.select[,colSums(unique.select) > 1000]

If your experiment is based on factors (i.e. you want to test for differences between categories of samples) you may want to use DESeq2, otherwise I suggest normalizing by sample abundance.

## normalize

tally.select <- tally.select/rowSums(tally.select)
unique.select <- unique.select/rowSums(unique.select)

Now we're going to do something tricky. For both unique.select and tally.select, rows are observations and columns are variables (edges or unique reads). Those likely don't mean much to you unless you're intimately familiar with the reference tree. We can map the edge numbers to taxa using "taxa" dataframe, but first we need to remove the "X" added by R to make the numbers legal column names. For the unique read labels, we need to split on "_", which divides the unique read identified from the edge number.

## get edge numbers associated with columns, and map to taxa names

tally.lab.Row <- sapply(strsplit(colnames(tally.select), 'X'), '[', 2)
tally.lab.Row <- taxa[tally.lab.Row, 'taxon']

unique.lab.Row <- sapply(strsplit(colnames(unique.select), '_'), '[', 2)
unique.lab.Row <- taxa[unique.lab.Row, 'taxon']

In the above block of code I labeled the new variables as X.lab.Row, because we'll first use them to label the rows of a heatmap. Heatmaps are a great way to start getting familiar with your data.

## make a heatmap of edge abundance

heat.col <- colorRampPalette(c('white', 'lightgoldenrod1', 'darkgreen'))(100)

heatmap(t(data.matrix(tally.select)),
scale = NULL,
col = heat.col,
labRow = tally.lab.Row,
margins = c(10, 10))

heatmap(t(data.matrix(tally.select)),
scale = NULL,
col = heat.col,
labRow = tally.lab.Row,
margins = c(10, 10))

Heatmaps are great for visualizing broad trends in the data, but they aren't a good entry point for quantitative analysis. A good next step is to carry out some kind of ordination (NMDS, PCoA, PCA, CA). Not all ordination methods will work well for all types of data. Here we'll use correspondence analysis (CA) on the relative abundance of the unique reads. CA will be carried out with the package "ca", while "factoextra" will be used to parse the CA output and calculate key additional information. You can find a nice in-depth tutorial on correspondence analysis in R here.

library(ca)
library(factoextra)

unique.select.ca <- ca(unique.select)
unique.select.ca.var <- get_eigenvalue(unique.select.ca)
unique.select.ca.res <- get_ca_col(unique.select.ca)

species.x <- unique.select.ca$colcoord[,1]
species.y <- unique.select.ca$colcoord[,2]

samples.x <- unique.select.ca$rowcoord[,1]
samples.y <- unique.select.ca$rowcoord[,2]

dim.1.var <- round(unique.select.ca.var$variance.percent[1], 1)
dim.2.var <- round(unique.select.ca.var$variance.percent[2], 2)

plot(species.x, species.y,
ylab = paste0('Dim 2: ', dim.2.var, '%'),
xlab = paste0('Dim 1: ', dim.1.var, '%'),
pch = 3,
col = 'red')

points(samples.x, samples.y,
pch = 19)

legend('topleft',
legend = c('Samples', 'Unique reads'),
pch = c(19, 3),
col = c('black', 'red'))

At this point you're ready to crack open the unique.select.ca object and start doing some hypothesis testing. There's one more visualization, however, that can help with initial interpolation; a heatmap of the top unique edges contributing to the first two dimensions (which account for nearly all of the variance between samples).

species.contr <- unique.select.ca.res$contrib[,1:2]
species.contr.ordered <- species.contr[order(rowSums(species.contr), decreasing = T),]
species.contr.top <- species.contr.ordered[1:10,]

species.contr.lab <- unique.lab.Row[order(rowSums(abs(species.contr)), decreasing = T)]

heatmap(species.contr.top,
scale = 'none',
col = heat.col,
Colv = NA,
margins = c(10, 20),
labRow = species.contr.lab[1:10],
labCol = c('Dim 1', 'Dim 2'),
cexCol = 1.5)

From this plot we see that quite a few different taxa are contributing approximately equally to Dim 1 (which accounts for much of the variance between samples), including several different Pelagibacter and Rhodobacteracea strains. That makes sense as the dominant environmental gradient in the study was inside vs. outside of San Diego Bay and we would expect these strains to be organized along such a gradient. Dim 2 is different with unique reads associated with Tropheryma whipplei and Rhodoluna lacicola contributing most. These aren't typical marine strains, and if we look back at the original data we see that these taxa are very abundant in just two samples. These samples are the obvious outliers along Dim 2 in the CA plot.

In this tutorial we covered just the community structure output from paprica, but of course the real benefit to using paprica is its estimation of metabolic potential. These data are found in the *.ec_tally.csv and *path_tally.csv files, and organized in the same way as the edge and unique read abundance tables. Because of this they can be plotted and analyzed in the same way.


Posted By: Jeff on February 25, 2019

We have a new paper out today on the impacts of coastal seagrasses on the microbial community structure of San Diego Bay.  I’m excited about this paper as the first student-led study to come out of my lab.  The study was conceived by Tia Rabsatt, an undergraduate from UVI, during a SURF REU in 2017.  Tia carried out the sample collection, DNA extractions, and flow cytometry, then handed the project off to Sahra Webb.  Sahra carried out the remainder of the project as her Masters thesis.


Tia filters water just outside the mouth of San Diego Bay.  Coronado Island is in the background.

Why the interest in seagrass?  Unlike kelp, seagrasses are true flowering plants.  They’re found around the world from the tropics to the high latitudes and perform a number of important ecosystem functions.  Considerable attention has been given to their importance as nursery habitat for a number of marine organisms.  More recently we’ve come to appreciate the role they play in mediating sediment transport and pollution.  Recent work in Indonesia (which inspired Tia to carry out this study) even showed that the presence of seagrass meadows between inhabited beaches and coral reefs reduced the load of human and coral pathogens within the reefs.


Seagrass, barely visible on a murky collection day.  Confirming seagrass presence/absence was a considerable challenge during the field effort, and one we hadn’t anticipated.  There’s always something…

There are a number of good papers out on the seagrass microbiome – epibionts and other bacteria that are physically associated with the seagrass (see here and here) – but not so many on water column microbes in the vicinity of seagrass meadows.  In this study we took paired samples inside and outside of seagrass beds within and just outside of San Diego Bay.  I’ll be the first to admit that our experimental design was simple, with a limited sample set, and we look forward to a more comprehensive analysis at some point in the future.  Regardless, it worked well for a factor-type analysis using DESeq2; testing for differentially present microbial taxa while controlling for the different locations.

What we found was that (not surprisingly) the influence of seagrass is pretty minor compared to the influence of sample location (inside vs. outside of the bay).  There were, however, some taxa that were more abundant near seagrass even when we controlled for sample location.  These included some expected copiotrophs including members of the Rhodobacteraceae, Puniceispirillum, and Colwellia, as well as some unexpected genera including Synechococcus and Thioglobus (a sulfur oxidizing gammaproteobacteria).  We spent the requisite amount of time puzzling over some abundant Rickettsiales within San Diego Bay.  We usually take these to mean SAR11 (though our analysis used paprica, which usually picks up Pelagibacter just fine), but didn’t look like SAR11 in this case.  An unusual coastal SAR11 clade?  A parasite or endosymbiont with a whonky GC ratio?  TBD…

Posted By: Jeff on December 20, 2018

I’m happy to report that I have a new paper out this week in Frontiers in Microbiology titled Identification of Microbial Dark Matter in Antarctic Environments. I thought that it would be interesting to see how well different Antarctic environments are represented by the available completed genomes (not very was my initial guess), got a little bored at the ISME meeting this summer, and had a go at it.

My approach was to find as many Antarctic 16S rRNA gene sequence datasets as I could on the NCBI SRA (Illumina MiSeq only), reanalyze them using consistent QC and denoising (dada2), and apply our paprica pipeline to see how well the environmental 16S rRNA sequence reads match the full-length reads in a recent build of the paprica database.

First things first, however, it was interesting to see 1) how poorly distributed the available Illumina libraries were around the Antarctic continent, and 2) just how many bad, incomplete, and incorrect submissions exist in SRA. 90 % of the effort on this project was invested in culling my list of projects, tracking down incorrect or erroneous lat/longs, sequence files that weren’t demultiplexed, etc. The demultiplexing issue is particularly irritating as I suspect it results purely from laziness. Of course the errors extend to some of my own data and I was chagrined to see that the accession number in our 2017 paper on microbial transport in the McMurdo Sound region is incorrect. Clearly we can all do better.

The collection locations for 16S rRNA libraries available on the NCBI SRA. From Bowman, 2018. Note the concentration of samples near major research bases along the western Antarctic Peninsula, in Prydz Bay, and at McMurdo Sound.

In the end I ended up with 1,810 libraries that I felt good about, and that could be loosely grouped into the environments shown in the figure above. To get a rough idea of how well each library was represented by genomes in the paprica database I used the map ratio value calculated within paprica by Guppy. The map ratio is the fraction of bases in a query read that match the reference read within the region of alignment. This is a pretty unrefined way to assess sequence similarity, but it’s fast and easy to interpret. My analysis looked at the map ratio value for 1) individual unique reads, 2) samples, and 3) environments. One way to think about #1 is represented by the figure below:

Read map ratio as a function of read abundance for A) Bacteria and B) Archaea, calculated individually for all libraries. The orange lines define an arbitrary cutoff for reads that are reasonably abundant, but have very low map ratios (meaning we should probably think about getting those genomes).

What these plots tell us is that most unique reads were reasonably well represented by the 16S rRNA genes associated with complete genomes (> 80 % map ratio, which is still pretty distant genetically speaking!), however, there are quite a lot of reasonably abundant reads with much lower map ratios (looking at this now it seems painfully obvious that I should have used relative abundance. Oh well).

I didn’t make an effort to track down all the completed genomes associated with Antarctic strains – if that’s even possible – but there is a known deficit of psychrophile genomes. Given that Antarctica tends to be chilly I’ll hazard a guess that there aren’t many complete bacterial or archaeal genomes from Antarctica isolates or metagenomes. Given the novelty of many Antarctic environments, and the number of microbiologists that do work in Antarctica, I’m a little surprised by this. Also kind of excited, however, thinking about how we might solve this for the future…

Posted By: Jeff on December 14, 2018

Abstract submissions are open for AbSciCon 2019!  You can check out the full selection of sessions here, however, I’d like to draw your attention toward the session Salty Goodness: Understanding life, biosignature preservation, and brines in the Solar System.  This session targets planetary scientists and microbiologists (and everyone in between), and we welcome submissions on any aspect of brines and habitability.  Full text follows, help us out by sharing this post widely!

Pure liquid water is only stable in a small fraction of the Solar System; however, salty aqueous solutions (i.e., brines) are more broadly stable. These brine systems however, prove to be some of the most challenging environments for microorganisms, where biology must overcome extreme osmotic stresses, low water activities, chemical toxicity, and depending on the location of the environment, temperature extremes, UV radiation, and intense pressure. Despite these stressors, hypersaline environments on Earth host an astounding diversity of micro- and macroorganisms. With worlds like Mars, Ceres, and outer Solar System Ocean worlds showing the potential for present-day brines, and with upcoming missions to Europa, it is timely to elucidate the potential for such aqueous systems to sustain and support life as well as the stability of these systems on host worlds.

This session is intended to encourage multidisciplinary and cross planetary discussions focused on the phase space of habitability within brines. We seek to discuss 1) the potential and stability of brines on host worlds through both laboratory and modeling experiments, 2) microbial ecology and adaptations to brines, 3) the effects of water activity and chaotropicity on habitability, 4) the ability of hypersaline systems to preserve biomolecules and 5) techniques and technology needed to detect biosignatures in these unique systems.

Posted By: Sabeel Mansuri on December 07, 2018
Introduction

Hi! I’m Sabeel Mansuri, an Undergraduate Research Assistant for the Bowman Lab at the Scripps Institute of Oceanography, University of California San Diego. The following is a tutorial that demonstrates a pipeline used to assemble and annotate a bacterial genome from Oxford Nanopore MinION data.

This tutorial will require the following (brief installation instructions are included below):

  1. Canu Assembler
  2. Bandage
  3. Prokka
  4. DNAPlotter (alternatively circos)

Software Installation
Canu

Canu is a packaged correction, trimming, and assembly program that is forked from the Celera assembler codebase. Install the latest release by running the following:

git clone https://github.com/marbl/canu.git
cd canu/src
make
Bandage

Bandage is an assembly visualization software. Install it by visiting this link, and downloading the version appropriate for your device.

Prokka

Prokka is a gene annotation program. Install it by visiting this link, and running the installation commands appropriate for your device.

Dataset

Download the nanopore dataset located here. This is an isolate from a sample taken from a local saline lake at South Bay Salt Works near San Diego, California.

The download will provide a tarball. Extract it:

tar -xvf nanopore.tar.gz

This will create a runs_fastq folder containing 8 fastq files containing genetic data.

Assembly

Canu can be used directly on the data without any preprocessing. The only additional information needed is an estimate of the genome size of the sample. For the saline isolate, we estimate 3,000,000 base pairs. Then, use the folliowing Canu command to assemble our data:

canu -nanopore_raw -p test_canu -d test_canu runs_fastq/*.fastq genomeSize=3000000 gnuplotTested=true

A quick description of all flags and parameters:

  • -nanopore_raw – specifies data is Oxford Nanopore with no data preprocessing
  • -p – specifies prefix for output files, use “test_canu” as default
  • -d – specifies directory to run test and output files in, use “test_canu” as default
  • genomeSize – estimated genome size of isolate
  • gnuplotTested – setting to true will skip gnuplot testing; gnuplot is not needed for this pipeline

Running this command will output various files into the test_canu directory. The assembled contigs are located in the test.contigs.fasta file. These contigs can be better visualized using Bandage.

Assembly Visualization

Opening Bandage and a GUI window should pop up. In the toolbar, click File > Load Graph, and select the test.contigs.gfa. You should see something like the following:

This graph reveals that one of our contigs appears to be a whole circular chromosome! A quick comparison with the test.contigs.fasta file reveals this is Contig 1. We extract only this sequence from the contigs file to examine further. Note that the first contig takes up the first 38,673 lines of the file, so use head:

head -n38673 test_canu/test_canu.contigs.fasta >> test_canu/contig1.fasta

NCBI BLAST

We blast this Contig using NCBI’s nucleotide BLAST database (linked here) with all default options. The top hit is:

Hit: Halomonas sp. hl-4 genome assembly, chromosome: I
Organism: Halomonas sp. hl-4
Phylogeny: Bacteria/Proteobacteria/Gammaproteobacteria/Oceanospirillales/Halomonadaceae/Halomonas
Max score: 65370
Query cover: 72%
E value: 0.0
Ident 87%

It appears this chromosome is the genome of an organism in the genus Halomonas. We may now be interested in the gene annotation of this genome.

Gene Annotation

Prokka will take care of gene annotation, the only required input is the contig1.fasta file.

prokka --outdir circular --prefix test_prokka test_canu/contig1.fasta

The newly created circular directory contains various files with data on the gene annotation. Take a look inside test_prokka.txt for a quick summary of the annotation. We can take a quick look at the annotation using the DNAPlotter GUI.  For a more customized circular plot use circos.

Summary

The analysis above has taken Oxford Nanopore sequenced data, assembled contigs, identified the closest matching organism, and annotated its genome.

Posted By: Jeff on November 16, 2018

This is a quick post of a few photos from our trip to the South Bay Saltworks earlier this week.  Thanks to PhD students Natalia, Emelia, and Srishti for getting up early to go play in the mud, and to Jesse Wilson and Melissa Hopkins for lab-side support!


Getting an early start at one of the lower salinity lakes.


A high salinity lakes with the pink pigmentation clearly visible.  Biology is happening!


A high salinity MgCl2 dominated lake.  It isn’t clear whether anything is living in these lakes – the green pigmentation could be remnants of microbes that lived in a happier time.  Our new OAST project will be further investigating these and other lakes to improve life detection technologies, and better constrain the chemical conditions that are compatible with life.


Srishti and Emelia working very hard at filtering.


Hey Srishti, I think you forgot something!

It will be a long time before we’re done with our analysis for these lakes, but here are a couple of teaser microscope images that reflect the huge difference between an NaCl and MgCl2 dominated lake.


Big, happy bacteria from an NaCl lake at near-saturation.


Same prep applied to an MgCl2 lake.  No sign of large bacterial cells.  There could be life there but it isn’t obvious…

Posted By: Jeff on November 06, 2018

Sometimes working weekends can be a lot of fun.  Last Saturday morning we carried out the second Scripps Institution of Oceanography visit by undergraduate biology majors from National University for our NSF-funded project CURE-ing Microbes on Ocean Plastics.  We recovered a plastic colonization experiment that we started last month, installed the next iteration of the experiment, and finally replaced the pump intake for our continuous flow membrane inlet mass spec (MIMS).  Many thanks to PhD students Natalia Erazo, Srishti Dasarathy, and Emelia Chamberlain for taking the time to work with the the National University undergraduates, and to Kasia Kenitz in the Barton Lab for the diving assist!  Here are a couple of photo/video highlights from the day.

A short video of the plastic colonization experiment after one month of incubation.  Though there has been some swell it hasn’t been a particularly stormy month.  Despite that the cages that hold our plastic wafers were hanging by a thread!  I need to come up with a better system before the winter storms hit…

Chasing a school of baitfish under the pier after installation. At the end of the video you can see the shiny new cage with the next set of plastic wafers, and to the right our newly installed pump intake for the MIMS.


Natalia and Srishti tell it like it is to National University students on the SIO pier.


Checking out microbes in the lab after field sampling on the pier.

Posted By: Jeff on November 02, 2018

This post has been a long time in coming!  I’m happy to announce that our Oceans Across Space and Time (OAST) proposal to the NASA Astrobiology Program has been funded, launching a 5 year, 8+ institution effort to improve life detection strategies for key extraterrestrial environments.  We submitted this proposal in response to the NASA Astrobiology Institute call (NAI; a key funding opportunity within the Astrobiology Program), however, OAST was ultimately funded under a new research coordination network titled Network for Life Detection (NfoLD).  Research coordination networks are a relatively new construct for NASA and provide a better framework for exchanging information between teams than the old “node” based NAI model.  NfoLD will eventually encompass a number of NASA projects looking at various aspects of life detection and funded through a variety of different opportunities (Exobiology, PSTAR, Habitable Worlds, etc).

OAST is led by my colleague Britney Schmidt, a planetary scientist at Georgia Tech (click here for the GT press release).  Joining us from SIO is Doug Bartlett, a deep sea microbial ecologist.  Other institutions with a major role include Stanford, MIT, Louisiana State University, Kansas University, University of Texas at Austin, and Blue Marble Space Institute of Science.

The OAST science objectives are structured around the concept of contemporary, remnant, and relict ocean worlds, and predicated on the idea that the distribution of biomass and biomarkers is controlled by different factors for each of these ocean “states”.  Understanding the distribution of biomass, and the persistence of biomarkers, will lead us to better strategies for detecting life on Mars, Europa, Enceledus, and other past or present ocean worlds.

Earth is unique among the planets in our solar system for having contemporary, remnant, and relict ocean environments.  This is convenient for us as it provides an opportunity to study these environments without all the cost and bother of traveling to other planets just to try unproven techniques.  For OAST, we’ve identified a suite of ocean environments that we think best represent these ocean states.  For contemporary oceans worlds (such as Europa and Enceledus) we’re studying deep hypersaline anoxic basins (DHABs – I might have hit the acronym limit for a single post…) which may be one of the most bizarre microbial habitats on Earth.  These highly stratified ecosystems are energetically very limited and contain an extreme amount of environmental stress through pressure and high salinity.  These are very much like the conditions we’d expect on a place like Europa.  The below video from the BBC’s latest Blue Planet series provides some idea of what these environments are like.

For remnant ocean worlds we will study a number of hypersaline lake systems, such as were likely present on Mars as it transitioned from a wet to a dry world.  Unlike the contemporary Europan ocean, the remnant Martian ocean would have had lots of energy to support life, a condition shared by many saline lake environments on Earth.  This is illustrated by this photo of me holding a biomass-rich filter at the Great Salt Lake in Utah, way back in my undergraduate days.


Sunlight provides an abundance of energy to many hypersaline lake environments.  Despite the challenging conditions imposed by salt, these systems often have high rates of activity and abundant biomass.  Photo: Julian Sachs.

Relict ocean worlds are a smaller component of OAST.  This isn’t for lack of relevance – Mars is a relict ocean world after all – but you can’t do everything, even in five years and with an awesome team.  Nonetheless we will carry out work on what’s known as the Permian mid-continental deposits, to search for unambiguous biomarkers persisting from when the region was a remnant ocean.  OAST will be a big part of what we’re up to for the next five years, so stay tuned!

Posted By: Jesse Wilson on October 22, 2018

Weighted gene correlation network analysis (WGCNA) is a powerful network analysis tool that can be used to identify groups of highly correlated genes that co-occur across your samples. Thus genes are sorted into modules and these modules can then be correlated with other traits (that must be continuous variables).

Originally created to assess gene expression data in human patients, the authors of the WGCNA method and R package  have a thorough tutorial with in-depth explanations (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/). More recently, the method has been applied to microbial communities (Duran-Pinedo et al., 2011; Aylward et al., 2015; Guidi et al., 2016; Wilson et al., 2018)–the following is a walk though using microbial sequence abundances and environmental data from my 2018 work (https://www.ncbi.nlm.nih.gov/m/pubmed/29488352/).

Background: WGCNA finds how clusters of genes (or in our case abundances of operational taxonomic units–OTUs) correlates with traits (or in our case environmental variables or biochemical rates) using hierarchical clusters, novel applications of weighted adjacency functions and topological overlap measures, and a dynamic tree cutting method.

Very simply, each OTU is going to be represented by a node in a vast network and the adjacency (a score between 0 and 1) between each set of nodes will be calculated. Many networks use hard-thresholding (where a connection score [e.g. a Pearson Correlation Coefficient] between any two nodes is noted as 1 if it is above a certain threshold and noted as 0 if it is below it). This ignores the actual strength of the connection so WGCNA constructs a weighted gene (or OTU) co-occurrence adjacency matrix in lieu of ‘hard’ thresholding. Because our original matrix has abundance data the abundance of each OTU is also factored in.

For this method to work you also have to select a soft thresholding power (sft) to which each co-expression similarity is raised in order to make these scores “connection strengths”. I used a signed adjacency function:

  • Adjacency = 0.5*(1+Pearson correlation)^sft

because it preserves the sign of the connection (whether nodes are positively or negatively correlated) and this is recommendation by authors of WGCNA.

You pick your soft thresholding value by using a scale-free topology. This is based on the idea that the probability that a node is connected with k other nodes decays as a power law:

  • p(k)~ k^(-γ)

This idea is linked to the growth of networks–new nodes are more likely to be attached to already established nodes. In general, scale-free networks display a high degree of tolerance against errors (Zhang & Horvath, 2005).

You then turn your adjacency matrix into a Topological Overlap Measure (TOM) to minimize the effects of noise and spurious associations. A topological overlap of two nodes also factors in all of their shared neighbors (their relative interrelatedness)–so you are basically taking a simple co-occurrence between two nodes and placing it in the framework of the entire network by factoring in all the other nodes each is connected to. For more information regarding adjacency matrices and TOMs please see Zhang & Horvath (2005) and Langfelder & Horvath (2007 & 2008).

Start: Obtain an OTU abundance matrix (MB.0.03.subsample.fn.txt) and environmental data (OxygenMatrixMonterey.csv).

The OTU abundance matrix simply has all the different OTUs that were observed in a bunch of different samples (denoted in the Group column; e.g. M1401, M1402, etc.). These OTUs represent 16S rRNA sequences that were assessed with the universal primers 515F-Y (5′-GTGYCAGCMGCCGCGGTAA) and 926R (5′-CCGYCAATTYMTTTRAGTTT) and were created using a 97% similarity cutoff. These populations were previously subsampled to the smallest library size and all processing took place in mothur (https://www.mothur.org/). See Wilson et al. (2018) for more details.

The environmental data matrix tells you a little bit more about the different Samples, like the Date of collection, which of two site Locations it was collected from, the Depth or Zone of collection. You also see a bunch of different environmental variables like several different Upwelling indices (for different stations and different time spans), community respiration rate (CR), Oxygen Concentration, and Temperature. Again, see Wilson et al. (2018) for more details.

Code–Initial Stuff:

Read data in:

data<-read.table("MB.0.03.subsample.fn.txt",header=T,na.strings="NA")

For this particular file we have to get rid of first three columns since the OTUs don’t actually start until the 4th column:

data1 = data[-1][-1][-1]

You should turn your raw abundance values into a relative abundance matrix and potentially transform it. I recommend a Hellinger Transformation (a square root of the relative abundance)–this effectively gives low weight to variables with low counts and many zeros. If you wanted you could do the Logarithmic transformation of Anderson et al. (2006) here in stead.

library("vegan", lib.loc="~/R/win-library/3.3")
HellingerData<-decostand(data1,method = "hellinger")

You have to limit the OTUs to the most frequent ones (ones that occur in multiple samples so that you can measure co-occurance across samples). I just looked at my data file and looked for where zeros became extremely common. This was easy because mothur sorts OTUs according to abundance. If you would like a more objective way of selecting the OTUs or if your OTUs are not sorted you then this code may help:

lessdata <- data1[,colSums(data1) > 0.05]

(though you will have to decide what cutoff works best for your data).

Code–Making your relative abundance matrix:

You have to reattach the Group Name column:

RelAbun1 = data.frame(data[2],HellingerData[1:750])

Write file (this step isn’t absolutely necessary, but you may want this file later at some point):

write.table(RelAbun1, file = "MontereyRelAbun.txt", sep="\t")

Code–Visualizing your data at the sample level:

Now load the WGCNA package:

library("WGCNA", lib.loc="~/R/win-library/3.3")

Bring data in:

OTUs<-read.table("MontereyRelAbun.txt",header=T,sep="\t")
dim(OTUs);
names(OTUs);

Turn the first column (sample name) into row names (so that only OTUs make up actual columns):

datExpr0 = as.data.frame((OTUs[,-c(1)]));
names(datExpr0) = names(OTUs)[-c(1)];
rownames(datExpr0) = OTUs$Group;

Check Data for excessive missingness:

gsg = goodSamplesGenes(datExpr0[-1], verbose = 3);
gsg$allOK

You should get TRUE for this dataset given the parameters above. TRUE means that all OTUs have passed the cut. This means that when you limited your OTUs to the most common ones above that you didn’t leave any in that had too many zeros. It is still possible that you were too choosy though. If you got FALSE for your data then you have to follow some other steps that I don’t go over here.

Cluster the samples to see if there are any obvious outliers:

sampleTree = hclust(dist(datExpr0), method = "average");

sizeGrWindow(12,9)

par(cex = 0.6);
par(mar = c(0,4,2,0))

plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)

The sample dendrogram doesn’t show any obvious outliers so I didn’t remove any samples. If you need to remove some samples then you have to follow some code I don’t go over here.

Now read in trait (Environmental) data and match with expression samples:

traitData = read.csv("OxygenMatrixMonterey.csv");
dim(traitData)
names(traitData)

Form a data frame analogous to expression data (relative abundances of OTUs) that will hold the Environmental traits:

OTUSamples = rownames(datExpr0);
traitRows = match(OTUSamples, traitData$Sample);
datTraits = traitData[traitRows, -1];
rownames(datTraits) = traitData[traitRows, 1];
collectGarbage()

Outcome: Now your OTU expression (or abundance) data are stored in the variable datExpr0 and the corresponding environmental traits are in the variable datTraits. Now you can visualize how the environmental traits relate to clustered samples.

Re-cluster samples:

sampleTree2 = hclust(dist(datExpr0), method = "average")

Convert traits to a color representation: white means low, red means high, grey means missing entry:

traitColors = numbers2colors(datTraits[5:13], signed = FALSE);

Plot the sample dendrogram and the colors underneath:

plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits[5:13]),
main = "Sample dendrogram and trait heatmap")

Again: white means a low value, red means a high value, and gray means missing entry. This is just initial stuff… we haven’t looked at modules of OTUs that occur across samples yet.

Save:

save(datExpr0, datTraits, file = "Monterey-dataInput.RData")

Code–Network Analysis:

Allow multi-threading within WGCNA. This helps speed up certain calculations.
Any error here may be ignored but you may want to update WGCNA if you see one.

options(stringsAsFactors = FALSE);
enableWGCNAThreads()

Load the data saved in the first part:

lnames = load(file = "Monterey-dataInput.RData");

The variable lnames contains the names of loaded variables:

lnames

Note: You have a couple of options for how you create your weighted OTU co-expression network. I went with the step-by-step construction and module detection. Please see this document for information on the other methods (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/Simulated-05-NetworkConstruction.pdf).

Choose a set of soft-thresholding powers:

powers = c(c(1:10), seq(from = 11, to=30, by=1))

Call the network topology analysis function:
Note: I am using a signed network because it preserves the sign of the connection (whether nodes are positively or negatively correlated); this is recommendation by authors of WGCNA.

sft = pickSoftThreshold(datExpr0, powerVector = powers, verbose = 5, networkType = "signed")

Output:

pickSoftThreshold: will use block size 750.
pickSoftThreshold: calculating connectivity for given powers...
..working on genes 1 through 750 of 750
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.0299 1.47 0.852 399.0000 400.0000 464.00
2 2 0.1300 -1.74 0.915 221.0000 221.0000 305.00
3 3 0.3480 -2.34 0.931 128.0000 125.0000 210.00
4 4 0.4640 -2.41 0.949 76.3000 73.1000 150.00
5 5 0.5990 -2.57 0.966 47.2000 44.0000 111.00
6 6 0.7010 -2.52 0.976 30.1000 27.1000 83.40
7 7 0.7660 -2.47 0.992 19.8000 17.2000 64.30
8 8 0.8130 -2.42 0.986 13.3000 11.0000 50.30
9 9 0.8390 -2.34 0.991 9.2200 7.1900 40.00
10 10 0.8610 -2.24 0.992 6.5200 4.8800 32.20
11 11 0.8670 -2.19 0.987 4.7000 3.3700 26.20
12 12 0.8550 -2.18 0.959 3.4600 2.3300 21.50

This is showing you the power (soft thresholding value), the r2 for the scale independence for each particular power (we shoot for an r2 higher than 0.8), the mean number of connections each node has at each power (mean.k), the median number of connections/node (median.k), and the maximum number of connections (max.k).

Plot the results:

sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;

Scale-free topology fit index (r2) as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red");

This line corresponds to using an R^2 cut-off of h:

abline(h=0.8,col="red")

Mean connectivity as a function of the soft-thresholding power:

plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

I picked a soft thresholding value of 10 because it was well above an r2 of 0.8 (it is a local peak for the r2) and the mean connectivity is still above 0.

So now we just calculate the adjacencies, using the soft thresholding power of 10:

softPower = 10;
adjacency = adjacency(datExpr0, power = softPower, type = "signed");

Then we transform the adjacency matrix into a Topological Overlap Matrix (TOM) and calculate corresponding dissimilarity:

Remember: The TOM you calculate shows the topological similarity of nodes, factoring in the connection strength two nodes share with other “third party” nodes. This will minimize effects of noise and spurious associations:

TOM = TOMsimilarity(adjacency, TOMType = "signed");
dissTOM = 1-TOM

Create a dendogram using a hierarchical clustering tree and then call the hierarchical clustering function:

TaxaTree = hclust(as.dist(dissTOM), method = "average");

Plot the resulting clustering tree (dendrogram):

sizeGrWindow(12,9)
plot(TaxaTree, xlab="", sub="", main = "Taxa clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04);

This image is showing us the clustering of all 750 OTUs based on the TOM dissimilarity index.

Now you have to decide the optimal module size for your system and should play around with this value a little. I wanted relatively large module so I set the minimum module size relatively high at 30:

minModuleSize = 30;

Module identification using dynamic tree cut (there a couple of different ways to figure out your modules and so you should explore what works best for you in the tutorials by the authors):

dynamicMods = cutreeDynamic(dendro = TaxaTree, distM = dissTOM,
deepSplit = 2, pamRespectsDendro = FALSE,
minClusterSize = minModuleSize);
table(dynamicMods)

Convert numeric labels into colors:

dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)

dynamicColors
black blue brown green red turquoise yellow
49 135 113 71 64 216 102

You can see that there are a total of 7 modules (you should have seen that above too) and that now each module is named a different color. The numbers under the colors tells you how many OTUs were sorted into that module. Each OTU is in exactly 1 module, and you can see that if you add up all of the numbers from the various modules you get 750 (the number of OTUs that we limited our analysis to above).

Plot the dendrogram with module colors underneath:

sizeGrWindow(8,6)
plotDendroAndColors(TaxaTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Taxa dendrogram and module colors")

Now we will quantify co-expression similarity of the entire modules using eigengenes and cluster them based on their correlation:
Note: An eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community.

Calculate eigengenes:

MEList = moduleEigengenes(datExpr0, colors = dynamicColors)
MEs = MEList$eigengenes

Calculate dissimilarity of module eigengenes:

MEDiss = 1-cor(MEs);

Cluster module eigengenes:

METree = hclust(as.dist(MEDiss), method = "average");

Plot the result:

sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")

Now we will see if any of the modules should be merged. I chose a height cut of 0.30, corresponding to a similarity of 0.70 to merge:

MEDissThres = 0.30

Plot the cut line into the dendrogram:

abline(h=MEDissThres, col = "red")

You can see that, according to our cutoff, none of the modules should be merged.

If there were some modules that needed to be merged you can call an automatic merging function:

merge = mergeCloseModules(datExpr0, dynamicColors, cutHeight = MEDissThres, verbose = 3)

The merged module colors:

mergedColors = merge$colors;

Eigengenes of the new merged modules:

mergedMEs = merge$newMEs;

If you had combined different modules then that would show in this plot:

sizeGrWindow(12, 9)

plotDendroAndColors(TaxaTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

If we had merged some of the modules that would show up in the Merged dynamic color scheme.

Rename the mergedColors to moduleColors:

moduleColors = mergedColors

Construct numerical labels corresponding to the colors:

colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;

Save module colors and labels for use in subsequent parts:

save(MEs, moduleLabels, moduleColors, TaxaTree, file = "Monterey-networkConstruction-stepByStep.RData")

Code–Relating modules to external information and IDing important taxa:

Here you are going to identify modules that are significantly associate with environmental traits/biogeochemical rates. You already have summary profiles for each module (eigengenes–remeber that an eigengene is 1st principal component of a module expression matrix and represents a suitably defined average OTU community), so we just have to correlate these eigengenes with environmental traits and look for significant associations.

Defining numbers of OTUs and samples:

nTaxa = ncol(datExpr0);
nSamples = nrow(datExpr0);

Recalculate MEs (module eigengenes):

MEs0 = moduleEigengenes(datExpr0, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);

Now we will visualize it:

sizeGrWindow(10,6)

textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));

labeledHeatmap(Matrix = moduleTraitCor,
xLabels = names(datTraits),
yLabels = names(MEs),
ySymbols = names(MEs),
colorLabels = FALSE,
colors = greenWhiteRed(50),
textMatrix = textMatrix,
setStdMargins = FALSE,
cex.text = 0.5,
zlim = c(-1,1),
main = paste("Module-trait relationships"))

Each row corresponds to a module eigengene and each column corresponds to an environmental trait or biogeochemical rate (as long as it is continuous–notice that the categorical variables are gray and say NA). Each cell contains the corresponding Pearson correlation coefficient (top number) and a p-value (in parentheses). The table is color-coded by correlation according to the color legend.

You can see that the Brown module is positively correlated with many indices of upwelling while the Black module is negatively correlated with many indices of upwelling. For this work I was particularly interested in CR and so I focused on modules the positively or negatively correlated with CR. The Red module was negatively associated with CR while the Blue module was positively associated with CR.

Let’s look more at the Red module by quantifying the associations of individual taxa with CR:

First define the variable we are interested in from datTrait:

CR = as.data.frame(datTraits$CR);
names(CR) = "CR"

modNames = substring(names(MEs), 3)
TaxaModuleMembership = as.data.frame(cor(datExpr0, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(TaxaModuleMembership), nSamples));
names(TaxaModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
TaxaTraitSignificance = as.data.frame(cor(datExpr0, CR, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(TaxaTraitSignificance), nSamples));
names(TaxaTraitSignificance) = paste("GS.", names(CR), sep="");
names(GSPvalue) = paste("p.GS.", names(CR), sep="");

module = "red"
column = match(module, modNames);
moduleTaxa = moduleColors==module;
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(TaxaModuleMembership[moduleTaxa, column]),
abs(TaxaTraitSignificance[moduleTaxa, 1]),
xlab = paste("Module Membership in", module, "module"),
ylab = "Taxa significance for CR",
main = paste("Module membership vs. Taxa significance\n"),
cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

This graph shows you how each taxa (each red dot is an OTU that belongs in the Red module) correlated with 1) the Environmental trait of interest and 2) how important it is to the module. The taxa/OTUs that have high module membership tend to occur whenever the module is represented in the environment and are therefore often connected throughout the samples with other red taxa/OTUs. In this module, these hubs (Red OTUs that occur with other Red OTUs) are also the most important OTUs for predicting CR.

Now lets get more info about the taxa that make up the Red module:

First, merge the statistical info from previous section (modules with high assocation with trait of interest–e.g. CR or Temp) with taxa annotation and write a file that summarizes these results:

names(datExpr0)
names(datExpr0)[moduleColors=="red"]

You will have to feed in an annotation file–a file listing what Bacteria/Archaea go with each OTU (I am not providing you will this file, but it just had a column with OTUs and a column with the Taxonomy).

annot = read.table("MB.subsample.fn.0.03.cons.taxonomy",header=T,sep="\t");
dim(annot)
names(annot)
probes = names(datExpr0)
probes2annot = match(probes, annot$OTU)

Check for the number or probes without annotation (it should return a 0):

sum(is.na(probes2annot))

Create the starting data frame:

TaxaInfo0 = data.frame(Taxon = probes,
TaxaSymbol = annot$OTU[probes2annot],
LinkID = annot$Taxonomy[probes2annot],
moduleColor = moduleColors,
TaxaTraitSignificance,
GSPvalue)

Order modules by their significance for weight:

modOrder = order(-abs(cor(MEs, CR, use = "p")));

Add module membership information in the chosen order:

for (mod in 1:ncol(TaxaModuleMembership))
{
oldNames = names(TaxaInfo0)
TaxaInfo0 = data.frame(TaxaInfo0, TaxaModuleMembership[, modOrder[mod]],
MMPvalue[, modOrder[mod]]);
names(TaxaInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
paste("p.MM.", modNames[modOrder[mod]], sep=""))
}

Order the OTUs in the geneInfo variable first by module color, then by geneTraitSignificance:

TaxaOrder = order(TaxaInfo0$moduleColor, -abs(TaxaInfo0$GS.CR));
TaxaInfo = TaxaInfo0[TaxaOrder, ]

Write file:

write.csv(TaxaInfo, file = "TaxaInfo.csv")

Here is a bit of the output file I got:

Taxon
TaxaSymbol
LinkID
moduleColor
GS.TotalRate
p.GS.TotalRate
MM.red
p.MM.red
MM.blue
p.MM.blue
MM.green
p.MM.green
MM.brown
p.MM.brown
MM.turquoise
p.MM.turquoise
MM.black
p.MM.black
MM.yellow
p.MM.yellow

Otu00711
Otu00711
Otu00711
Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);Surface_4(100);
black
0.461005
0.00111
0.005028
0.973244
0.243888
0.098526
-0.07719
0.606075
-0.25274
0.086535
0.058878
0.694233
0.502027
0.000324
0.132412
0.374947

Otu00091
Otu00091
Otu00091
Bacteria(100);Bacteroidetes(100);Flavobacteriia(100);Flavobacteriales(100);Flavobacteriaceae(100);Formosa(100);
black
0.378126
0.008778
-0.17243
0.246462
0.446049
0.001676
0.34467
0.017667
-0.55057
6.08E-05
0.492517
0.000437
0.615168
4.20E-06
0.367211
0.011115

Otu00082
Otu00082
Otu00082
Bacteria(100);Bacteroidetes(100);Flavobacteriia(100);Flavobacteriales(100);Flavobacteriaceae(100);NS5_marine_group(100);
black
-0.35649
0.013911
0.222515
0.132755
-0.06428
0.667734
0.175654
0.237601
-0.45502
0.001312
0.421756
0.003151
0.750195
1.28E-09
0.126362
0.397349

Otu00341
Otu00341
Otu00341
Bacteria(100);Bacteroidetes(100);Cytophagia(100);Cytophagales(100);Flammeovirgaceae(100);Marinoscillum(100);
black
-0.28242
0.054435
0.023927
0.873162
-0.07555
0.613762
0.144688
0.331879
-0.03144
0.833838
0.035147
0.814565
0.209255
0.158058
-0.06565
0.661083

Otu00537
Otu00537
Otu00537
Bacteria(100);Verrucomicrobia(100);Verrucomicrobiae(100);Verrucomicrobiales(100);Verrucomicrobiaceae(100);Persicirhabdus(100);
black
-0.23668
0.109211
0.123673
0.40755
-0.17362
0.243171
-0.05738
0.701628
-0.26399
0.072961
0.264411
0.072493
0.425082
0.002897
0.040045
0.789278

Otu00262
Otu00262
Otu00262
Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);Surface_1(100);Candidatus_Pelagibacter(90);
black
-0.23615
0.110023
0.327396
0.02468
-0.22748
0.12411
-0.13779
0.355699
-0.23708
0.108594
0.271968
0.064409
0.554592
5.23E-05
0.036113
0.809563

Otu00293
Otu00293
Otu00293
Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);SAR11_clade(100);SAR11_clade_unclassified(100);
black
0.223427
0.131133
0.142016
0.34098
0.209327
0.157912
0.234713
0.112274
-0.53032
0.000126
0.529907
0.000128
0.627937
2.30E-06
0.390714
0.006621

Otu00524
Otu00524
Otu00524
Bacteria(100);Actinobacteria(100);Acidimicrobiia(100);Acidimicrobiales(100);OM1_clade(100);Candidatus_Actinomarina(100);
black
-0.20559
0.165629
0.28312
0.053809
0.016758
0.910982
0.148756
0.318316
-0.34758
0.016669
0.376043
0.009188
0.494903
0.000406
0.377597
0.00888

Otu00370
Otu00370
Otu00370
Bacteria(100);Verrucomicrobia(100);Opitutae(100);MB11C04_marine_group(100);MB11C04_marine_group_unclassified(100);
black
-0.20397
0.169074
0.303984
0.037771
-0.06655
0.656707
0.009401
0.949991
-0.25451
0.084275
0.097595
0.514
0.642409
1.13E-06
0.224711
0.128875

 

NOTES on output:

moduleColor is the module that the OTU was ultimately put into

GS stands for Gene Significance (for us it means taxon significance) while MM stands for module membership
GS.Environmentaltrait = Pearson Correlation Coefficient for that OTU with the trait
Output: p.GS.Environmentaltrait = P value for the preceeding relationship

MM column gives the module membership correlation (correlates expression with module eigengene of a given module). If close to 0 then the taxa is not part of that color module (since each OTU has to be put in a module you may get some OTUs that are close to 0, but they aren’t important to that module). If it is close to 1 or -1 then it is highly connected to that color module genes/taxa.
MM.color = Pearson Correlation Coefficient for Module Membership–i.e. how well that OTU correlates with that particular color module (each OTU has a value for each module but only belongs to one module)
p.MM.color = P value for the preceeding relationship

GS allows incorporation of external info into the co-expression network by showing gene significance. The higher the absolute value of GS the more biologically significant the gene (or in our case taxa).
Modules will be ordered by their significance for weight, with the most significant ones to the left.
Each of the modules (with each OTU assigned to exactly one module) will be represented for the environmental trait you selected
You will have to rerun this for each environmental trait you are interested in

 

Posted By: Jeff on October 11, 2018

A new NASA Postdoctoral Program (NPP) opportunity was posted today for our Oceans Across Space and Time (OAST) initiative.  What’s OAST?  I can’t tell you yet, because the relevant press releases have been stuck in purgatory for several weeks.  Hopefully I can write that post next week.  Nonetheless you can figure out what we’re all about based on the NPP opportunity here, and pasted below.  If you’re interested act fast, NPP proposals are due November 1!

Past and present oceans are widely distributed in our solar system and are among the best environments to search for extant life. New tools, techniques, and strategies are required to support future life detection missions to ocean worlds. Oceans Across Space and Time (OAST) is a new project under the Network for Life Detection (NfoLD) research coordination network. OAST seeks to understand the distribution, diversity, and limits of life on contemporary, remnant, and relict ocean worlds to facilitate the search for extant life. Central to this effort is the development of an Ocean Habitability Index that characterizes life in the context of the physicochemical stressors and metabolic opportunities of an ocean environment. OAST will be developing the OHI based on field efforts in surficial hypersaline environments and deep hypersaline anoxic basins, and from laboratory-based experimental studies.

Postdoctoral fellows are sought to support OAST activities across three themes: characterizing ocean habitability, detecting and measuring biological activity, and understanding diversity and the limits of evolution. Postdoctoral fellows are expected to interact across two or more institutions within OAST. Participating institutions are Georgia Institute of Technology (Schmidt, Glass, Ingall, Reinhardt, Stewart, Stockton), Scripps Institution of Oceanography at UC San Diego (Bowman, Bartlett), Massachusetts Institute of Technology (Pontrefact and Carr), Louisiana State University (Doran), University of Kansas (Olcott), Stanford University (Dekas), Blue Marble Space Institute of Science (Som), and the University of Texas at Austin (Soderlund). Candidates should contact the PI (Schmidt) and two potential mentors early in the proposal process to scope possible projects in line with the main directions of OAST and NFoLD.

Posted By: Jeff on September 30, 2018

We have an opening for a postdoctoral scholar in the area of bioinformatics and predictive analytics.  The ideal candidate will have demonstrated expertise in one of these two areas and a strong interest in the other.  Possible areas of expertise within predictive analytics include the development and application of neural networks and other machine learning techniques, or nonlinear statistical modeling techniques such as empirical dynamical modeling.  Possible areas of expertise within bioinformatics include genome assembly and annotation, metagenomic or metatranscriptomic analysis, or advanced community structure analysis.  Candidates should be proficient with R, Python, or Matlab, familiar with scientific computing in Linux, and fluent in English.

The successful candidate will take the lead in an exciting new industry collaboration to predict process thresholds from changes in microbial community structure.  The goal of this collaboration is not only to predict thresholds, but to fully understand the underlying ecosystem dynamics through genomics and metabolic modeling.  The candidate will be in residence at Scripps Institution of Oceanography at UC San Diego, but will have the opportunity to work directly with industry scientists in the San Diego area.  The candidate is expected to take on a leadership role in the Bowman Lab, participate in training graduate students and undergraduates, represent the lab at national and international meetings, and publish their work in scholarly journals.  The initial opportunity is for two years with an option for a third year contingent on progress made.  The position starts January of 2019.

Interested candidates should send a CV and a brief, 1-page statement of interest to Jeff Bowman at jsbowman at ucsd.edu no later than November 1, 2018.

Posted By: Jeff on September 07, 2018

One of the most popular primer sets for 16S rRNA gene amplicon analysis right now is the 515F/806R set. One of the advantages of this pair is that it amplifies broadly across the domains Archaea and Bacteria. This reduces by half the amount of work required to characterize prokaryotic community structure, and allows a comparison of the relative (or absolute, if you have counts) abundance of bacteria and archaea.  However, paprica and many other analysis tools aren’t designed to simultaneously analyze reads from both domains.  Different reference alignments or covariance models, for example, might be required.  Thus it’s useful to split an input fasta file into separate bacterial and archaeal files.

We like to use the Infernal tool cmscan for this purpose.  First, you’ll need to acquire covariance models for the 16S/18S rRNA genes from all three domains of life.  You can find those on the Rfam website, they are also included in paprica/models if you’ve downloaded paprica.  Copy the models to new subdirectory in your working directory while combining them into a single file:

mkdir cms
cat ../paprica/models/*cm cms/all_domains.cm
cd cms

Now you need to compress and index the covariance models using the cmpress utility provided by Infernal.  This takes a while.

cmpress all_domains.cm

Pretty simple.  Now you’re ready to do some work.  The whole Infernal suite of tools has pretty awesome built-in parallelization, but with only three covariance models in the file you won’t get much out of it.  Best to minimize cmscan’s use of cores and instead push lots of files through it at once.  This is easily done with the Gnu Parallel command:

ls *.fasta | parallel -u cmscan --cpu 1 --tblout {}.txt cmscan/all_domains.cm {} > /dev/nul

Next comes the secret sauce.  The command above produces an easy-to-parse, easy-to-read table with classification stats for each of the covariance models that we searched against.  Paprica contains a utility in paprica/utilities/pick_16S_domain.py to parse the table and figure out which model scored best for each read, then make three new fasta files for each of domains Bacteria, Archaea, and Eukarya (the primers will typically pick up a few euks).  We’ll parallelize the script just as we did for cmscan.

ls *.fasta | parallel -u python pick_16S_domain_2.py -prefix {} -out {}

Now you have domain-specific files that you can analyze in paprica or your amplicon analysis tool of choice!

 

 

Posted By: Jeff on August 11, 2018


The predicted drift track of Polarstern during MOSAiC, taken from the MOSAiC website.

This has been a (rare) good week for funding decisions.  A loooooong time ago when I was a third year PhD student I wrote a blog post that mentioned the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) initiative.  The Arctic has changed a lot over the last couple of decades, as evidenced by the shift from perennial (year long) to seasonal sea ice cover.  This is a problem for climate modelers whose parameterizations and assumptions are based on observational records largely developed in the “old” Arctic.  MOSAiC was conceived as a coupled ocean-ice-atmosphere study to understand key climatic and ecological processes as they function in the “new” Arctic.  The basic idea is to drive the German icebreaker Polarstern into the Laptev Sea in the Fall of 2019, tether it to an ice flow, and allow it to follow the circumpolar drift (Fram style) for a complete year.  This will provide for continuous time-series observations across the complete seasonal cycle, and an opportunity to carry out a number of key experiments.

I first attended a MOSAiC workshop in 2012 (when I wrote that first blog post).  It only took six years and two tries, but we’ve officially received NSF support to join the MOSAiC expedition.  Co-PI Brice Loose at URI and I will be carrying out a series of experiments to better understand how microbial community structure and ecophysiology control fluxes of O2 (as a proxy for carbon) and methane in the central Arctic Ocean.  The central Arctic Ocean is a weird place and we know very little about how these processes play out there. Like the subtropical gyres it’s extremely low in nutrients, but the low temperatures, extreme seasonality, (seasonal) sea ice cover, and continental shelf influences make it very different from the lower-latitude oceans.


The German icebreaker Polarstern, which will host the MOSAiC field effort.

Posted By: Jeff on August 09, 2018

I was excited to learn today that our proposal CURE-ing microbes on ocean plastic, led by collaborators Ana Barral and Rachel Simmons at National University, was just funded by the National Science Foundation through the Hispanic Serving Institutions (HSI) program.  The Bowman Lab will play a small but important role in this project and it should be quite a bit of fun.  As a private, non-profit college National University is a somewhat unusual organization that serves a very different demographic than UC San Diego.  A large number of the students at National University are non-traditional, meaning that they aren’t going straight to college from high school.  A significant number are veterans, and National University is classified as an HSI.  So working with National University is a great opportunity to interact with a very different student body.


A microbial colonization pilot experiment attached to our mounting bracket on the SIO Pier in May.  If you look closely you can see the biofilm developing on the plastic wafers inside the cage.  The composition of the biofilm and its potential to degrade plastics will be explored by students at National University.

For this project Ana and Rachel developed a course-based undergraduate research experience (CURE) around the topic of ocean plastics.  This is an issue that is getting quite a bit of attention in the popular press, but has largely fallen through the cracks as far as research goes, possibly because it’s too applied for NSF and to basic (and expensive) for NOAA.  A lot of somewhat dubious work is going on in the fringe around the theme of cleaning up marine plastics, but we lack a basic understanding of the processes controlling the decomposition of plastics, and their entry into marine foodwebs (topics that do less to stoke the donor fires than clean up).  This CURE is a way to get students (and us) thinking about the science of marine plastics, specifically the colonization and degradation of plastics by marine microbes, while learning basic microbiology, molecular biology, and bioinformatic techniques.  The basic idea is to have the students deploy plastic colonization experiments in the coastal marine environment, isolate potential plastic degraders, sequence genomes, and carry out microbial community structure analysis.  Different courses will target different stages in the project, and students taking the right sequence of courses could be hands-on from start to finish.

Our role in the project is to provide the marine microbiology expertise.  Lab members will have an opportunity to give lectures and provide mentoring to National University students, and we’ll handle the deployment and recovery of plastic-colonization experiments on the SIO Pier (yay, more diving!).  We’ll also play a role in analyzing and publishing the data from these experiments.  Many thanks to lab alumnus Emelia DeForce (formerly with MoBio and now with Fisher Scientific) for bringing us all together on this project!

The

Ana Barral’s (center) B100A: Survey of Biosciences Lab at National University after recovery of the microbial colonization pilot experiment in June.  Thanks Allison Lee (at left) for coming in to dive on a Saturday morning!

Posted By: Jeff on July 28, 2018

This week Alyssa Demko from the Jenkins Lab and I dove to make repairs on our sample pump intake on the SIO pier.  Very soon the pump will supply water to our membrane inlet mass spectrometer and biological sampling manifold, so we’re eager to keep things in good working condition.  Our pump intake is secured to the pier by a very heavy stainless steel metal bracket.  When we first installed the metal bracket we opted for silicon bronze hardware; silicon bronze is pricey but among the most corrosion resistant alloys available.  When we last dove I noticed the hardware was corroding very rapidly, to the point that a good storm would have ripped the whole contraption off the pier.  Fortunately it’s summer!  Here’s some of the hardware that we recovered from our dive:


Silicon

When installed these were 5 x 0.5 inch bolts, the head was 3/4 inch.  This is some serious corrosion for a 4 month deployment!  Silicon bronze is supposed to be corrosion resistant, so what happened?

The problem is that when two or more metals are in contact in the presence of a electrolyte (like seawater) they interact.  Specifically, some metals like to donate electrons to other metals.  The metal doing the donating (called the anode) corrodes more quickly than the metal that receives them (called the cathode).  Because this transfer replaces electrons that the cathode loses to seawater, the presence of the anode actually slows the corrosion of the cathode.  This is a well known process that we planned for when we designed the system, and we included a zinc plate called a sacrificial anode that serves no purpose other than to donate electrons to the stainless steel bracket.  Enter silicon bronze.

How readily one metal donates electrons to another metal is indicated by their location on the galvanic series.  The further apart two metals are, the more readily electrons flow from the anodic to the cathodic metal.  Silicon bronze is more anodic that stainless steel, particularly the 316 alloy we are using, but I figured it was close enough.  Apparently not, however, and we didn’t account for other factors that can influence the rate of electron transfer.  An important one is the surface area of the cathode relative to the anode.  Remember that the cathode is losing electrons to seawater, this is what is driving the flow of electrons from the anode to the cathode in the first place (if you put them in contact in say, air, mineral oil, or some other non-conductive medium nothing will happen).  So the more surface of the cathode is in contact with seawater, the more electrons will flow from the cathode to seawater, and from the anode to the cathode.  In our system the relatively small bronze bolts were attached to the a very large stainless steel bracket, and I think this accounts for the rapid corrosion.

There is one thing that I still don’t understand, which is why the zinc anode in the system didn’t protect the bronze and stainless steel.  Bronze will sacrifice itself for stainless steel (as we’ve clearly demonstrated), but zinc should sacrifice to bronze and stainless steel.  However, our sacrificial zinc anode looks almost as good as it did when I installed it.  In the meantime here’s a video of the impressive lobster party taking place on the piling where our sampling system is located (don’t even think about it, it’s a no take zone!). At the end of the video you can see our shiny new 316 stainless steel bolts. Hopefully these last!

Posted By: Jeff on May 10, 2018

The Simons Foundation Early Career Investigator in Marine Microbial Ecology and Evolution (ECIMMEE) awards were announced today and I’m thrilled that the Bowman Lab was selected.  Our project is centered on using the SIO Pier as a unique platform for collecting ecological data at a high temporal resolution.  Consider that marine heterotrophic bacteria often have cell division times on the order of hours – even less under optimal conditions – and that entire populations can be decimated by grazing or viral attack on similarly short timescales.  A typical long term study might sample weekly; the resulting time-series is blind to the dynamics that occur on shorter time-scales.  This makes it challenging to model key ecological processes, such as the biogeochemical consequences of certain microbial taxa being active in the system.

Over the past year we’ve been slowly developing the capability to conduct high(er)-resolution time-series sampling at the SIO Pier.  This award will allow us to take these efforts to the next level and really have some fun, from an ecological observation and modeling point of view.  Our goal is to develop a sampling system that is agnostic to time, but instead observes microbial community structure and other parameters along key ecological gradients like temperature and salinity.  Following the methods in our 2017 ISME paper, we can model key processes like bacterial production from community structure and physiology data, allowing us to predict those processes for stochastic events that would be impossible to sample in person.


The SIO Pier provides a unique opportunity to sample ocean water within minutes of the lab. 

Sometimes it’s useful to visualize all that goes into scientific observations as a pyramid.  At the tip of this pyramid is a nice model describing our ecological processes of interest.  Way down at the base is a whole lot of head-scratching, knuckle-scraping labor to figure out how to make the initial observations to inform that model.  One of our key challenges – which seems so simple – is just getting water up to the pier deck where we can sample it in an automated fashion.  The pier deck is about 30 feet above the water, and most pumps designed to raise water to that height deliver much too much for our purposes.  We identified a pneumatic pumping system that does the job nicely, but the pump intake requires fairly intensive maintenance and a lot of effort to keep the biology off it.  Here’s a short video of me attempting to clean and reposition the (kelp covered) pump intake on Monday, shot by Gabriel Castro, a graduate student in the Marine Natural Products program at SIO (thanks Gabriel for the assist!).  Note the intense phytoplankton bloom and moderate swell, not an easy day!

Posted By: Jeff on April 18, 2018

We have a paper “Recurrent seascape units identify key ecological processes along the western Antarctic Peninsula” that is now available via advance-online through the journal Global Change Biology.  I place full blame for this paper on my former postdoctoral advisor and co-author Hugh Ducklow.  Shortly after I arrived for my postdoc at the Lamont-Doherty Earth Observatory, Hugh suggested using all the core Palmer LTER parameters since the start of the start of that program, and “some kind of multivariate method” to identify different years.  The presumption was that different years would map to some kind of recognizable ecological or climatic phenomenon.

At the time I knew nothing about seascapes or geospatial analysis.  However, I had been playing around with self organizing maps (SOMs) to segment microbial community structure data.  I thought that similarly segmenting geospatial data would yield an interesting result, so we gave it a go.  This involved carefully QC’ing all the core Palmer LTER data since 1993 (sadly discarding several years with erroneous or missing parameters), interpolating the data for each year to build 3 dimensional maps of each parameter (you can find these data here), then classifying each point in these maps with a SOM trained on the original data.  After a lot of back and forth with co-authors Maria Kavanaugh and Scott Doney, we elected to use the term “seascape unit” for different regions of the SOM.  Our classification scheme ultimately maps these seascape units to the original sampling grid.  By quantifying the extent of each seascape unit in each year we can attempt to identify similar years, and also identify climatic phenomena that exert controls on seascape unit abundance.

If you’re scratching your head at why it’s necessary to resort to seascape units for such an analysis it’s helpful to take a look at the training data in the standard T-S plot space.


Fig. 2 from Bowman et al., 2018.  Distribution of the training data in a) silicate-nitrate space and b) T-S space.  The color of the points gives the seascape unit.

The “V” distribution of points is characteristic of the western Antarctic Peninsula (WAP), and highlights the strong, dual relationship between temperature and salinity.  The warmest, saltiest water is associated with upper circumpolar deepwater (UCDW) and originates from offshore in the Antarctic Circumpolar Current (ACC).  The coldest water is winter water (WW), which is formed from brine rejection and heat loss during the winter.  Warm, fresh water is associated with summer surface water (SW).  Note, however, that multiple seascape unites are associated with each water mass.  The reason for this is that nutrient concentrations can vary quite a bit within each water mass.  My favorite example is WW, which we usually think of as rich in nutrients (nutrient replete); it is, but WW associated with SU 2 is a lot less nutrient rich than that associated with SU 3.  Both will support a bloom, but the strength, duration, and composition of the bloom is likely to differ.

To evaluate how different climatic phenomena might influence the distribution of seascape units in different years we applied elastic-net regression as described here.  This is where things got a bit frustrating.  It was really difficult to build models that described a reasonable amount of the variance in seascape unit abundance.  Where we could the usual suspects popped out as good predictors; October and January ice conditions play a major role in determining the ecological state of the WAP.  But it’s clear that lots of other things do as well, or that the tested predictors are interacting in non-linear ways that make it very difficult to predict the occurrence of a given ecosystem state.

We did get some interesting results predicting clusters of years.  Based on hierarchical clustering, the relative abundance of seascape units in a core sampling area suggests two very distinct types of years.  We tested models based on combinations of time-lagged variables (monthly sea ice extent, fraction of open water, ENSO, SAM, etc.) to predict year-type, with June and October within-pack ice open water extent best predicting year-type.  This fits well with our understanding of the system; fall and spring storm conditions are known to exert a control on bloom conditions the following year.  In our analysis, when the areal extent of fall and spring within-pack ice open water is high (think cold but windy), chlorophyll the following summer is associated with a specific seascape (SU 1 below) that is found most frequently offshore.  When the opposite conditions prevail, chlorophyll the following summer is associated with a specific seascape (SU 8) that is found most frequently inshore.  Interestingly, the chlorophyll inventory isn’t that different between year-types, but the density and distribution of chlorophyll is, which presumably matters for the higher trophic levels (that analysis is somewhere on my to-do list).


Fig. 3 from Bowman et al., 2018.  Clustering of the available years into different year-types.  The extent of within-pack ice open water in June and October are reasonable predictors of year-type.  Panel A shows the relative abundance of seascape unit for each year-type.  Panel B shows the fraction of chlorophyll for each year that is associated with each seascape unit.

One of our most intriguing findings was a steady increase in the relative abundance of SU 3 over time.  SU 3 is one of those seascapes associated with winter water; it’s the low nutrient winter water variant.  That steady increase means that there has been a steady decrease in the bulk silicate inventory in the study area with time.  I’m not sure what that means, though my guess is there’s been an increase in early season primary production which could draw down nutrients while winter water is still forming.


Fig. 5 from Bowman et al., 2018.  SU 3, a low-nutrient flavor of winter water, has been increasing in relative abundance.  This has driven a significant decrease in silicate concentrations in the study area during the Palmer LTER cruise.

Posted By: Natalia Erazo on January 29, 2018

Hi! I’m Natalia Erazo, currently working on the Ecuador project aimed at examining biogeochemical processes in mangrove forest. In this tutorial, we’ll learn the basics of (free) QGIS, how to import vector data, and make a map using data obtained from our recent field trip to the Ecological Reserve Cayapas Mataje in Ecuador!  We’ll also learn standard map elements and QGIS function: Print Composer to generate a map.

Objectives:

I. Install QGIS

II. Learn how to upload raster data using the Plugin OpenLayers and QuickMap services.

III. Learn how to import vector data: import latitude, longitude data and additional data. Learn how to select attributes from the data e.g., salinity values and plot them.

IV. Make a map using Print Composer in QGIS.

I. QGIS- Installation

QGIS is a very powerful tool and user friendly open source geographical system that runs on linux, unix, mac, and windows. QGIS can be downloaded here . You should follow the instructions and install gdal complete.pkg, numpy.pkg, matplotlib.pkg, and qgis.pkg.

II.Install QGIS Plug-in and Upload a base map.

  1. Install QGIS Plug-in

Go to Plugins and select Manage and Install plugins. This will open the plugins dialogue box and type OpenLayers Plugin and click on Install plugin.

This plugin will give you access to Google Maps, openStreet map layers and others, and it is very useful to make quick maps from Google satellite, physical, and street layers. However, the OpenLayers plugin could generate zoom errors in your maps.   There is another plug in: Quick Map Service which uses tile servers and not the direct api for getting Google layers and others. This is a very useful plugin which offers more options for base maps and less zoom errors. To install it you should follow the same steps as you did for the OpenLayers plugin except this time you’ll type QuickMap Service and install the plugin.

Also, If you want to experiment with QuickMap services you can expand the plugin: Go to Web->Quick Map Services->Settings->More services and click on get contributed pack. This will generate more options for mapping.

2. Add the base layer Map:

I recommend playing with the various options in either OpenLayers like the Google satellite, physical, and other maps layers, or QuickMap Service.

For this map, we will use ESRI library from QuickMap services. Go to–> Web- ->QuickMapServices–> Esri–> ESRI Satellite

You should see your satellite map.

You can click on the zoom in icon to adjust the zoom, as shown in the map below where I  zoom in the Galapagos Islands. You’ll also notice that on the left side you have a Layers panel box, this box shows all the layers you add to your map. Layers can be raster data or vector data, in this case we see the layer: ESRI Satellite. At the far left you’ll see a list of icons that are used to import your layers. It is important to know what kind of data you are importing to QGIS to use the correct function.

III. Adding our vector data.

We will now add our data file which contains latitude and longitude of all the sites we collected samples, in addition to values for salinity, temperature, and turbidity. You can do this with your own data by creating a file in excel  and have a column with longitude and latitude values and columns with other variables  and save it as a csv file. To input data you’ll go to the icons on the far left and click on “Add  Delimited Text Layer”. Or you can click on Layer-> Add Layer-> Add Delimited Text Layer.

You’ll browse to the file with your data. Make sure that csv is selected for File format. Additionally, make sure that X field represents the column for your longitude points and Y field for latitude. QGIS is smart enough to recognize longitude and latitude columns but double check! You can also see an overview of the data with columns for latitude, longitude, Barometer mmHg, conductivity, Salinity psu and other variables. You can leave everything else as default and click ok.

You’ll be prompt to select the coordinate reference system selector, and this is very important because if you do not select the right one you’ll get your points in the wrong location. For GPS coordinates, as the data we are using here, you need to select WGS 84 ESPG 43126.

Now we can see all the points where we collected data!

As we saw earlier, the data contains environmental measurements such as: salinity, turbidity, temperature and others. We can style the layer with our sampling points based on the variables of our data. In this example we will  create a layer representing salinity values.

You’ll right click on the layer with our data in the Layer Panel, in this case our layer: 2017_ecuador_ysi_dat.. and select properties.

The are many styles you can choose for the layer and the styling options are located in the Style tab of the Properties dialogue. Clicking on the drop-down bottom in the Style dialogue, you’ll see there are five options available: Single Symbol, Categorized, Graduated, Rule Based and Point displacement. We’ll use Graduated which allows you to break down the data in unique classes. Here we will use the salinity values and will classify them into 3 classes: low, medium, and high salinity. There are 5 modes available in the Graduated style to do this: Equal interval, Quantile, Natural breaks, Standard deviation and Pretty breaks. You can read more about these options in qgis documentation.

In this tutorial, for simplicity  we’ll use the Quantile option. This method will decide the classes such that number of values in each class are the same; for example, if there are 100 values and we want 4 classes, the quantile method decide the classes such that each class will have 25 values.

In the Style section: Select->Graduated, in Column->salinity psu, and in color ramp we’ll do colors ranging from yellow to red.

In the classes box write down 3 and  select mode–>Quantile. Click on classify, and QGIS will classify your values in different ranges.

Now we have all the data points color in the 3 different ranges: low, medium, and high salinity.

However, we have a lot of points and it is hard to visualize the data points. We can edit the points by right clicking on the marker points and select edit symbol.

Now, I am going to get rid of the black outline to make the points easy to visualize. Select the point by clicking on Simple Marker and in Outline style select the No Pen. Do the same for the remaining two points.

Nice, now we can better see variations in our points based on salinity!

IV. Print Composer: making a final map

We can start to assemble the final version of our  map. QGIS has the option to create a Print composer where you can edit your map. Go to Project -> New Print composer

You will be prompted to enter a title for the composer, enter the title name and hit ok. You will be taken to the Composer window.

In the Print composer window, we want to bring the map view that we see in the QGIS canvas to the composer. Go to Layout-> Add a Map. Once the Add map button is active, hold the left mouse and drag a rectangle where you want to insert the map. You will see that the rectangle window will be rendered with the map from the main QGIS canvas.

You can see in the far right end the Items box; this  shows you the map you just added. If you want to make changes, you’ll select the map and edit it under item properties. Sometimes it is useful to edit the scale until you are happy with the map.

We can also add a second map of the location of Cayapas Mataje in South America as a  geographic reference. Go to the main qgis canvas and zoom out the map until you can see where in South America the reserve is located.

Now go back to Print Composer and add the map of  the entire region. You’ll do the same as with the first map. Go to Layout–> Add map. Drag a rectangle where you want to insert the map. You will see that the rectangle window will be rendered with the map from the main QGIS canvas. In Items box, you can see you have Map 0 and Map 1. Select Map 1, and add a frame under Item properties, click on Frame to activate it and adjust the thickness to 0.40mm.

We can add a North arrow to the map. The print composer comes with a collection of map related images including many North arrows. Click layout–> add image.

Hold on the left mouse button, draw a rectangle on the top-right corner of the map canvas.

On the right-hand panel, click on the Item Properties tab and expand the Search directories and select the north arrow image you like the most. Once you’ve selected your image, you can always edit the arrow under SVG parameters.

Now we’ll add a scale bar. Click on Layout–> Add a Scale bar. Click on the layout where you want the scale bar to appear. Choose the Style and units that fit your requirement. In the Segments panel, you can adjust the number of segments and their size. Make sure Map 0 is selected under main properties.

I’ll add a legend to the map. Go to Layout–> add a Legend. Hold on the left mouse button, and draw a rectangle on the area you want the legend to appear. You can make any changes such as adding a title in the item properties, changing fonts and renaming your legend points by clicking on them and writing the text you want.

It’s time to label our map. Click on Layout ‣ Add Label. Click on the map and draw a box where the label should be. In the Item Properties tab, expand the Label section and enter the text as shown below. You can also make additional changes to your font, size by editing the label under Appearance.

Once you have your final version, you can export it as Image, PDF or SVG. For this tutorial, let’s export it as an image. Click Composer ‣ Export as Image.

Here is our final map!

Now you can try the tutorial with your own data. Making maps is always a bit challenging but put your imagination to work!

Here is a list of links that could help with QGIS:

-QGIS blog with various tutorials and new info on functions to use: here.

-If you want more information on how QGIS handles symbol and vector data styling: here  is a good tutorial.

-If you need data, a good place to start is Natural Earth: Free vector and raster basemap data used for almost any cartographic endeavor.

If you have specific questions please don’t hesitate to ask.

 

Posted By: Melissa Hopkins on December 16, 2017

Hello! My name is Melissa Hopkins. I just finished my first quarter as an undergraduate researcher in the Bowman Lab. The project I am working on involves the diversity of halophiles from the South Bay Saltworks lakes in San Diego. The South Bay Saltworks is an active solar salt harvesting facility that is part of the San Diego Bay National Wildlife Refuge. The goal of this project is to use 16S/18S community structure to identify microbial taxa that are currently poorly represented in existing genomes. We want to contribute to halophile genomes to learn more about halophiles, and this helps us decide which new genomes to sequence.

Aharan Oren’s open access review Microbial life at high concentrations” phylogenetic and metabolic diversityexplains the different classes and orders that contain halophiles, as well as the similarities of strategies used by these halophiles. Here, he uses the definition of halophiles as microbes that are able to tolerate 100 g/L salt but grow optimally at 50 g/L salt (seawater contains 35 g/L salt). Extreme halophiles (including the haloarchaea) are defined as growing best at salt concentrations of 2.5-5.2 M, and moderate halophiles as growing optimally at salt concentrations of 1.5-4.0 M. This assumes that the salt is sodium chloride however, different salts such as magnesium chloride can present additional challenges to life.

Halophiles are able to survive these high salt concentration environments in two different ways: pumping salt ions into their cells from the surrounding environment, or synthesizing organic solutes to match the concentration of their surrounding environment. Synthesizing organic solutes is more energetically expensive because it requires energy to make the high concentration of organic solutes needed, thus keeping salt ions out of the cytoplasm. But, this strategy is actually found widely across halophiles species. Pumping specific salt ions into their cells that don’t interfere with biological processes is less energetically expensive, but requires proteins in the cell to be specially adapted to high salt conditions. Because of this that strategy is not seen as often across the different species of halophiles. Different families and orders of halophiles use variations of these strategies to survive. There have been some new halophile species, such as Salinbacter, that are outliers, as in they use a different survival strategy then other halophiles they are related to. Sequencing many more halophile genomes will give us new information on how these adaptive strategies across different halophiles.

For this project, Jeff, Natalia and I spent the day sampling lakes from South Bay Saltworks on October 6. Out goal was to sample from 3 different points in each of several lakes of different salt concentrations, and to sample as many lakes as possible.

We started out at the lower salinity lakes to see how the equipment would function and get used to the sampling process. At each point, we took unfiltered samples using a peristaltic pump for respiration tests, bacterial abundance, measurements of photosynthetic efficiency and chlorophyll concentration, turbidity, and salinity.

We then placed a GF/F filter on the housing of the pump to collect a coarsely filtered sample for ion composition analysis, FDOM, and dissolved inorganic nutrients. Finally, we placed a 0.2 micron filter on the housing to collect bacteria, archaea, and phytoplankton for DNA analysis.

In all, we sampled 7 lakes: one low salt concentration lake, one medium salt concentration lake, 2 high salt concentration lakes, and 3 magnesium chloride lakes. Unfortunately, due to time constraints and the high viscosity of the magnesium chloride lakes (its like trying to filter maple syrup), we were only to sample from 1 point for each of the magnesium chloride lakes.

Natalia and I setting up for sampling on one of the lower salinity lakes.  You can see the large piles of harvested salt in the background.

One of the magnesium chloride lakes we sampled. Due to the high salt concentration and extremely high attraction between water and magnesium chloride these lakes have an oily texture and high viscosity, making them difficult to sample.

One of the saltier sodium chloride lakes that we sampled from.  The pink color comes from pigments in the different microorganisms that we’re studying.

Posted By: Jeff on November 26, 2017

Today we took the last sample of our Ecuador field effort, though we have a few days left in-country. Right now we are in the town of Mompiche, just down the coast from our second field site near Muisne. Tomorrow we’ll be sorting out gear and getting ready for a few days of meetings in Guayaquil. Then its time to fly home and start working up some data! I’m too tired to write a coherent post on the last two (intensive!) weeks of sampling, but here’s a photo summary of our work in the Cayapas-Mataje Ecological Reserve, where we spent the bulk of our time. Enjoy!


Pelicans congregating along a front in the Cayapas-Mataje estuary.


The town of Valdez, near the mouth of the Cayapas-Mataje estuary.  The Reserve is right on the border with Columbia, and up until a few years ago Valdez had a reputation as a trafficking hub.  Drug trafficking is still a problem throughout the Reserve, but with the conflict with FARC more or less over I understand that tension in the local communities has gone way down.  Valdez seems okay, and the people we met there were friendly.


Another view of Valdez.


Shrimp farm in the Cayapas-Mataje Ecological Reserve.  You can’t build a new farm in the Reserve, but old shrimp farms were grandfathered in when the Reserve was created.


The R/V Luz Yamira, at its home port of Tambillo.  Tambillo was a vibrant, friendly little town where we spent a bit of time.  The town is struggling to hold onto its subsistence fishing lifestyle in the face of declining fish stocks.


ADCP, easy as 1-2-3


Birds of a feather…


Morning commute from the city of San Lorenzo.


The Cayapas-Mataje Ecological Reserve has the tallest mangrove trees in the world.


I took this picture just for the good people at UCSD risk management.


Team Cayapas-Mataje.  From left; Jessie, Natalia, and Jeff.  We are standing in front of Jessie’s house in Tambillo.  Many thanks to Jessie and his wife for letting us stay a night and get away from the craziness of San Lorenzo!


A very full car ready to head to Muisne.  Its a good thing Natalia and I are both fairly short.


A large shrimp farm near Muisne.  The area around Muisne has been almost entirely deforested for shrimp aquaculture.  By comparing this area with the more pristine Cayapas-Mataje, we hope to better understand the biogeochemical consequences of coastal land-use change.

Posted By: Jeff on November 14, 2017

As any reader of this blog will know, most of the research in the Bowman Lab is focused on polar microbial ecology. Although focusing a research program on a set of geographically-linked environments does have advantages, primarily the ability to spend more time thinking in depth about them, there is I think something lost with this approach. Insights are often gained by bringing a fresh perspective to a new study area, or applying lessons learned in such an area to places that one has studied for years. With this in mind lab member Natalia Erazo and I are launching a new field effort in coastal Ecuador. Natalia is an Ecuadorean native, and gets credit for developing the idea and sorting out the considerable logistics for this initial effort. Our first trip is very much a scouting effort, but will carry out some sampling in the Cayapas-Mataje Ecological Reserve and near the town of Muisne. Depending on funding we hope to return during the rainy season in January-February for a more intensive effort.


Dense coastal forest in the Cayapas-Mataje Ecological Reserve.  Photo Natalia Erazo.

Our primary objective is to understand the role of mangrove forests in coastal biogeochemical cycling. Mangroves are salt-tolerant trees that grow in tropical and sub-tropical coastal areas around the world. They are known to provide a range of positive ecosystem functions; serving as fish habitat, stabilizing shorelines, and providing carbon and nutrient subsidies to the coastal marine environment. Globally mangroves are under threat. The population density of many tropical coastal areas is increasing, and that inevitably leads to land-use changes (such as deforestation) and a loss of economic services – the social and economic benefits of an ecosystem – as economic activity ramps up. The trick to long-term sustainability is to maintain ecosystem services during economic development, allowing standards of living to increase in the short term without a long-term economic loss resulting from ecological failure (such as the collapse of a fishery or catastrophic coastal flooding). This is not easily done, and requires a much better understanding of what functions exactly, specific at-risk components of an ecosystem provide than we often have.

One particular land-use threat to the mangrove ecosystem is shrimp aquaculture. Mangrove forests in Ecuador and in other parts of the world have been deforested on a massive scale to make room for shrimp aquaculture ponds. In addition to scaling back any ecosystem functions provided by the mangrove forest, the shrimp aquaculture facilities are a source of nutrients in the form of excrement and excess feed. On this trip we will try to locate estuaries more and less perturbed by aquaculture. By comparing nutrient and carbon concentrations, sediment load, and microbial community structure between these areas, we will gain a preliminary understanding of what happens to the coastal ecosystem when mangroves are removed and aquaculture facilities are installed in their place.

Our first stop on this search will be San Lorenzo, a small city in the Cayapas-Mataje Ecological Reserve near the border with Columbia. I’m extremely excited to visit the Reserve, which has the distinction of hosting the tallest mangrove trees anywhere on Earth. We may some meager internet access in San Lorenzo, so I’ll try to update this blog as I’m able. Because of the remote nature of some of our proposed field sites we’ll have a Garmin InReach satellite messenger with us. We plan to leave the device on during our field outings, you can track our location in real time on the map below! The Garmin map interface is a bit cludgey; you should ignore the other Scripps Institution of Oceanography users listed on the side panel as I can’t seem to make them disappear.

Posted By: Jeff on October 03, 2017


Davos, Switzerland, site of the POLAR2018 conference.  Image from https://www.inghams.co.uk

With colleagues Maria Corsaro, Eric Collins, Maria Tutino, Jody Deming, and Julie Dinasquet I’m convening a session on polar microbial ecology and evolution at the upcoming POLAR2018 conference in Davos, Switzerland.  Polar2018 is shaping up to be a unique and excellent conference; for the first time (I think) the major international Arctic science organization (IASC) is joining forces with the major international Antarctic science organization (SCAR) for a joint meeting.  Because many polar scientists specialize in one region, and thus have few opportunities to learn from the other, a joint meeting will be a great opportunity to share ideas.

Here’s the full-text description for our session.  If your work is related to microbial evolution, adaption, or ecological function in the polar regions I hope you’ll consider applying!

Unable to display PDF
Click here to download

Posted By: Jeff on September 21, 2017


The eastern oyster.  Denitrification never tasted so good!  Photo from http://dnr.sc.gov.

I’m happy to be co-author on a study that was just published by Ann Arfken, a PhD student at the Virginia Institute for Marine Science (VIMS).  The study evaluated the composition of the microbial community associated with eastern oyster Crassostrea virginica to determine if oysters play a role in denitrification.  Denitrification is an ecologically significant anaerobic microbial metabolism.  In the absence of oxygen certain microbes use other oxidized compounds as electron acceptors.  Nitrate (NO3–) is a good alternate electron acceptor, and the process of reducing nitrate to nitrite (NO2), and ultimately to elemental nitrogen (N2), is called denitrification.  Unfortunately nitrate is needed by photosynthetic organisms like plants and phytoplankton, so the removal of nitrate can be detrimental to ecosystem health.  Oxygen is easily depleted in the guts of higher organisms by high rates of metabolic activity, creating a niche for denitrification and other anaerobic processes.


Predicted relative abundance (paprica) as a function of measured (qPCR) relative abundance of nosZI genes.  From Arfken et al. 2017.

To evaluate denitrification in C. virginica, Arfken et al. coupled actual measurements of denitrification in sediments and oysters with an analysis of microbial community structure in oyster shells and digestive glands.  We then used paprica with a customized database to predict the presence of denitrification genes, and validated the predictions with qPCR.

I was particularly happy to see that the qPCR results agreed well with the paprica predictions for the nosZ gene, which codes for the enzyme responsible for reducing nitrous oxide (N2O) to N2.  I believe this is the first example of qPCR being used to validate metabolic inference – which currently lacks a good method for validation.  Surprisingly however (at least to me), denitrification in C. virginica was largely associated with the oyster shell rather than the digestive gland.  We don’t really know why this is.  Arfken et al. suggests rapid colonization of the oyster shell by denitrifying bacteria that also produce antibiotic compounds to exclude predators, but further work is needed to demonstrate this!

Posted By: Jeff on August 11, 2017

We recently got our CyFlow Space flow cytometer in the lab and have been working out the kinks.  From a flow cytometry perspective the California coastal environment is pretty different from the western Antarctic Peninsula where I’ve done most of my flow cytometry work.  Getting my eyes calibrated to a new flow cytometer and a the coastal California environment has been an experience.  Helping me on this task is Tia Rabsatt, a SURF REU student from the US Virgin Islands.  Tia will be heading home in a couple of weeks which presents a challenge; once she leaves she won’t have access to the proprietary software that came with the flow cytometer.  To continue analyzing the data she collected over the summer as part of her project she’ll need a different solution.

To give her a way to work with the FCS files I put together a quick R script that reads in the file, sets some event limits, and produces a nice plot.  With a little modification one could “gate” and count different regions.  The script uses the flowCore package to read in the FCS format files, and the hist2d command in gplots to make a reasonably informative plot.

library('flowCore')
library('gplots')

#### parameters ####

f.name <- 'file.name.goes.here' # name of the file you want to analyze, file must have extension ".FCS"
sample.size <- 1e5 # number of events to plot, use "max" for all points
fsc.ll <- 1 # FSC lower limit
ssc.ll <- 1 # SSC lower limit
fl1.ll <- 1 # FL1 lower limit (ex488/em536)

#### functions ####

## plotting function

plot.events <- function(fcm, x.param, y.param){
hist2d(log10(fcm[,x.param]),
log10(fcm[,y.param]),
col = c('grey', colorRampPalette(c('white', 'lightgoldenrod1', 'darkgreen'))(100)),
nbins = 200,
bg = 'grey',
ylab = paste0('log10(', y.param, ')'),
xlab = paste0('log10(', x.param, ')'))

box()
}

#### read in file ####

fcm <- read.FCS(paste0(f.name, '.FCS'))
fcm <- as.data.frame((exprs(fcm)))

#### analyze file and make plot ####

## eliminate values that are below or equal to thresholds you
## defined above

fcm$SSC[fcm$SSC <= ssc.ll|fcm$FSC <= fsc.ll|fcm$FL1 == fl1.ll] <- NA
fcm <- na.omit(fcm)

fcm.sample <- fcm

if(sample.size != 'max'){
try({fcm.sample <- fcm[sample(length(fcm$SSC), sample.size),]},
silent = T)
}

## plot events in a couple of different ways

plot.events(fcm, 'FSC', 'SSC')
plot.events(fcm, 'FSC', 'FL1')

## make a presentation quality figure

png(paste0(f.name, '_FSC', '_FL1', '.png'),
width = 2000,
height = 2000,
pointsize = 50)

plot.events(fcm, 'FSC', 'FL1')

dev.off()

And here’s the final plot: