The next day we went out again for resistivity and augering. Céline picked out two alternative sites that might be drier. We drove through the abandoned valley to the site. We took the direct route and found the local road to be in a terrible state of disrepair. The vans could barely make it through. Then we hit a spot where slumping off each side of the road narrowed it too much. The villagers helped make a temporary road with bricks and wood, but it was still too narrow. Then they filled a sandbag and together with the bricks, wood and other
handy items we got across. It turned out that since the Upazila (county) voted for the opposition party, they have not had their roads repaired for over a decade. This level of politicization of everything in Bangladesh really hurts the country. When we reached the location of the line, we found that ponds between the road and the fields limited our access. We walked around and found a site next to a brick factory. The line was along an irrigation ditch. Fine to walk on either side, but submerged to mid-shin if you
stepped in the middle. The data looked very good after processing. We may have found the top of the Pleistocene as relatively shallow depths consistent with the site being the top of a buried anticline (folded hill).
The delays from the bad road, site searching, and a longer distance to lug the equipment meant that we couldn’t do augering. We came to the conclusion that we have to alternate days of resistivity and drilling. Not enough time in a day to do both properly. That meant
the next day was for augering. We went back to the soccer field site, officially BNGTi1, and started augering with all six of us. We hurried past the section we had already described. To minimize hole collapse, we switched between two augers and tried to work quickly on the descriptions. It took all of us all morning to make it to 4.8 meters. The mud was too hard. We needed to go to plan B. We would drill tube wells and sample inside the wells. Alamgir and Basu went off to the village to find a driller. The rest of us
cleaned off the equipment and ourselves at a nearby pond and well and had lunch. After several attempts, they found a driller, but he couldn’t come until 3 p.m. I like to use all the available time I have here, but we now had a few hours break.
The three-person drill team arrived right at 3, unusual in this part of the world. I have seen the drilling technique before, but never the initial set up. In 20 minutes they set two vertical bamboo poles in the ground, tied on the cross piece to make a large H, attached a lever arm and the drill pipe, dug a mud pit for water and a
channel to the actual well location. Then they started drilling. It was so much faster and easier than augering! In 10-20 minutes they were past the depth we reached. We don’t get continuous samples described every 10 cm (4 in.), but the lithology averaged every 5 ft. Muds come up as solid cylinders that we collect, sands as a slurry that we decant. We subdivide the 5 ft. sections if there is a lithology change. The driller caught on quickly to what we wanted and kept us informed of all changes in sediment type, which he could easily feel. Céline and Basu, an experienced logger of tube wells, did most of the sediment work,
with some help from the rest of us. As expected, the section was primarily mud with some silt. We reached the sands from the abandoned channel at 42 ft., a little deeper than I expected but reasonable. It was still early enough for us to do another. Alamgir and I scouted a second location as they finished and packed up the equipment. We completed that one, with the sands at only 20 ft. North of our transect looks like there was an island splitting the channel in two. Here would have been downstream of the island, so we
expected it to be shallow. Finally, things were going well. Using tubewells, we should have plenty of time to drill several stratigraphic wells and then pick one for sampling. We celebrated with dinner at the local Chinese restaurant.
A couple of months ago I published paprica v0.11, a set of scripts for conducting a metabolic inference from a collection of 16S rRNA gene reads. This approach allows you to estimate the functional capabilities of a microbial community if you don’t have access to a metagenome or metatranscriptome. Paprica started as a method for a paper I was writing but eventually became complex enough to warrant it’s own publication. Paprica v0.11 reflected this origin – it produced nice results but was cludgy and cumbersome.
Over the last couple of weeks I’ve given paprica a complete overhaul and am happy to introduce v0.20. There are a number of major differences between v0.11 and v0.20, but the most significant difference is a more clear division between construction of the database for those who want full control (and access to the PGDBs) and sample analysis, which can proceed with only the provided, light-weight database (however you will not have access to the PGDBs). Executing paprica v0.20 is as easy as (from your home directory, for the provided file test.fasta):git clone https://github.com/bowmanjeffs/genome_finder.git cd genome_finder chmod a+x paprica_run.sh ./paprica_run.sh test
One really important distinction between this version and v0.11 is that metabolic pathways are NOT predicted directly on internal nodes. This was done for reasons of organization and efficiency, but I’m not sure that it made much sense to do this anyway. Instead the pathways likely to be found for an internal node are inferred from their appearance in terminal daughter nodes (that is, the completed genomes that belong to the clade defined by the internal node). If a given pathway is present in some specified fraction (0.90 by default) of the terminal daughters it is included in the internal node. You can change this value by modifying the appropriate variable in pathway_profile.txt. Some (including myself) might like to have a PGDB for an internal node for purposes of visualization or modeling. In the near future I’ll release a utility to create a PGDB for an internal node on demand.
Some other major improvements…
- Fewer dependencies. For the scripts called in paprica_run.sh you need pplacer, seqmagick, infernal, and some Python modules that you should probably have anyway.
- Improved reference tree. I’m still working on this, but the current method uses RAxML for phylogenetic inference and Infernal for aligment, which seems to work much better than the previous (albeit much faster) combo of Fasttree and Mothur. Thanks to Eric Matsen for helpful suggestions in this regard.
- More genome parameters. I have a particular interest in how genome parameters (e.g. length, coding density, etc.) are distributed in the environment. Paprica gives you a whole list of interesting metrics for the terminal and internal nodes.
Paprica is still in heavy development and I have a lot of improvements planned for future versions. If you try v0.20 I’d love to know what you think – good, bad, or otherwise! You can create an issue on Github or email me.
Along with colleagues from New Zealand, Argentina, and Malaysia I’m convening a session on microbial ecology and evolution at the upcoming biennial SCAR meeting in Kuala Lumpur (because there’s no better place to talk about ice than the tropics). If this sounds like your sort of thing check it out!S23. Microbes, diversity, and ecological roles Walter MacCormack, Argentina; Charles Lee, New Zealand; Chun Wie Chong, Malaysia; Jeff Bowman, USA
The ecology of Antarctica is largely shaped by microbes, with microbial life, including prokaryotes and unicellular eukaryotes, serving as the main drivers of ecosystem function. Given this, it is perhaps surprisingly that our current understanding of Antarctic biota has been derived primarily from studies of metazoans. Despite major advances in the field of Antarctic microbiology in recent years there remains a knowledge gap in our understanding of the distribution, functions, and adaptations of Antarctic microbes. There is a general consensus that Antarctic microorganisms are highly diverse, and in many cases encompass endemic gene pools with unique physiological and genetic adaptations to the extreme conditions of their environment. Relatively recently, the advent of ‘omics platforms has allowed researchers to observe these processes in great detail. This session welcomes submissions on all aspects of microbial ecology and evolution in Antarctica and the Southern Ocean. This includes ‘omics-based approaches to understanding prokaryotic and unicellular eukaryotic diversity, function, adaptation, as well as laboratory and field-based studies of microbial and ecological physiology. Special consideration will be given for abstracts addressing the following issues: (1) Microbial biogeography, functional redundancy, and ecosystem services; (2) Trophic connectivity between prokaryotes and eukaryotes; (3) Cold adaptation strategy and evolution; and (4) Multiple ‘omics integration addressing systems biology of Antarctic ecosystems.
Six of us headed out on Oct. 8 for Brahmanbaria, northeast of Dhaka. Our target is a large winding abandoned river valley that we believe used to be the course of the Meghna River. Currently, the much smaller Titas River flows northward in the channel. Why would a river in the world’s largest delta flow the wrong way? We think that an earthquake uplifted the Comilla District area to the south. That caused the Meghna River to shift westward to its present channel and the Titas to flow up the old channel. A well drilled in the channel in 2012 shows a layer of muds overlying coarser sands.
We think the sands represent sediments from the old Meghna and the muds are sediments filling up the channel. We will be using resistivity to image the channel and an auger to first sample and describe the sediments and then to collect samples for dating.
Finding organic matter to date by carbon 14 is rare, so we plan to use a technique called OSL dating. OSL stands for Optically Stimulated Luminescence. Electrons from the radioactivity of all rocks get trapped in defects in quartz grains. However, they
are so weakly trapped that sunlight can release them. When traveling down the river, the electrons are released and then start accumulating when they are buried. By measuring the light released by the sample when optically stimulated, we can calculate the time since the sample last was exposed to light. By sampling the top of the sands and the bottom of the muds, we can date the time the river switches, or avulsed. The details of the procedure to get an OSL age are pretty complicated, but if this works, we
will date the earthquake that caused the river avulsion.
This technique is new to me. I helped with some sampling the last time I was here, but I have not been in charge of doing it. I am also more comfortable with the quantitative data from the resistivity than the qualitative geologic descriptions we will make of the sediments. Luckily I have a good team with me, Céline, my postdoc, Matt, my former teaching assistant, and Alamgir, Atik and Basu from Dhaka University. I have spent time in the field with Alamgir and
Atik before. Alamgir has conducted his own resistivity surveys. Basu was recommended to me as someone with a lot of experience in describing sediments.
We set out early in the morning for the four-hour drive. However, when we reached the river valley, we found it was almost completely flooded. We walked out on an elevated road and there was pani—the Bangla word for water—everywhere. The abandoned valley is still slightly lower in elevation than the surrounding land. Even that land has the rice fields flooded with shallow water, although the
boundaries between the fields are above water. But our main target is submerged! In the winter this will be dry land, but we are a month and a half too early. A number of scheduling issues required me to come now, although I knew it was too soon after the monsoon, but I didn’t expect so much of the land to still be flooded. Time to come up with an alternative plan.
For the resistivity, we need long straight stretches of dry land. We decided to
do it west of the valley to try to image the thickness of the entire Holocene (last 10,000 years) section. It should vary because of the folding of the strata from the tectonics. Mapping the thickness will help us to map the position of the buried fold. For augering, we only need a small patch of land to stand on. To find it we headed south towards where the valley was uplifted more and might be drier. Not as ideal as the original location, but possible. The next morning we headed farther south and crossed the river valley. It was drier and we noted some potential augering sites. We continued to a location for resistivity. The six of us set up the >350 m long resistivity line, then Céline, Basu and I headed back to try augering while the resistivity data was collected. The augering proved very difficult. We were very slow describing the core that the auger brought up, and while we were doing it the hole would start to collapse. The muddy sediment was very stiff, and we had to hammer the auger in. We only got to 2.7 m when we stopped, nowhere near the depth we needed. Things were pretty discouraging.
I’m really excited (and relieved) to report that my review on the taxonomy and function of sea ice microbial communities was recently published in the journal Elementa. The review is part of a series on biological exchange processes at the sea ice interface, by the SCOR working group of the same name (BEPSII). I’m deeply appreciative of Nadja Steiner, Lisa Miller*, Jaqueline Stefels, and the other senior members of BEPSII for letting (very) junior scientists take such an active role in the working group. I conceived the review in a foggy haze last year while writing my dissertation, when I assumed that there would be “plenty of time” for that kind of project before starting my postdoc. Considering that I didn’t even start aggregating the necessary data until I got to Lamont I’m also deeply appreciative of my postdoctoral advisor for supporting this effort…
The review is really half review, half meta-analysis of existing sea ice data. The first bit, which draws heavily on the introduction to my dissertation, describes some of the history of sea ice microbial ecology (which goes back to at least 1918 for prokaryotes). From there the review moves into an analysis of the taxonomic composition of the sea ice microbial community, based on existing 16S rRNA gene sequence data, takes a look at patterns of bacterial and primary production in sea ice, and then uses PAPRICA to infer metabolic function for the observed microbial taxa (after 97 years we still don’t have any metagenomes for sea ice – let alone metatranscriptomes – and precious few isolates).
There is a lot of info in this paper but I hope a few big points make it across. First, we have a massive geographical bias in our sea ice samples. This is to be expected, but I don’t think we should just accept it as what has to be. More disconcerting, there has been very little effort to integrate physiological measures in sea ice (such as bacterial production) with analyses of microbial community structure. A major exception is the work of the Kaartokallio group at the Finnish Environmental Group, but their work has primarily taken place in the Baltic Sea (an excellent system, but very different from the high Arctic and coastal Antarctic). This all translates into work that needs to be done however, which is a good thing… we are just barely at the point where we can make reasonable hypothesis regarding the functions of these communities.
*This image of Lisa pops up a lot. If you can identify what, exactly, is going on in this picture I’ll buy you a beer.
I am heading back to Bangladesh, but this time I am stopping in New Delhi before heading to Bengal (West Bengal and Bangladesh). It is the first time that I will be in a part of India that is not adjacent to Bangladesh. Several of us are meeting there to plan for a new project that will span Bangladesh to India to Myanmar. I arrived a few hours before Nano Seeber and Paul Betka and used the time to get a new Indian SIM for my phone. After meeting up, we headed to the guesthouse of the Ministry of Earth Sciences, where we will be staying. If only the U.S. had a cabinet level department for earth sciences. It was difficult to find at night without a Hindi speaker, but we managed.
Over the next few days we had meetings about the project, but also some time for sightseeing, while
discussing the project in the car. Most of our meals were vegetarian, and Gandhi’s birthday, which occurred while we were there, is celebrated by eating vegetarian. When two more scientists arrived from Singapore, we started the day by visiting the Qutub Minar, dating back to the 1200s and the arrival of the Muslim Delhi Sultanate, followed by the Mughal Empire in the 1500s. In the Quwwat-ul-Islam mosque, there is the famous Iron Pillar originally erected by Chandragupta in the 4th century, probably at Patna, and brought here much later. Near the beginning of the inscription it says: “in battle with the Vanga countries, he kneaded (and turned) back with (his) breast the enemies who, uniting together came against (him).” Vanga is Bengal, now split into West Bengal in India and Bangladesh.
After mostly finishing discussions, the others decided to take a day trip to Agra to see the Taj Mahal. I was able to change my flight to Kolkata to the following morning and joined them, continuing to talk science on the 4-hour drive. We had to buy the expensive tickets at 750 rupees rather than the 10 rupees the Indians were paying. However, the premium ticket lets us bypass the long lines. The Taj Mahal is the tomb of Mumtaz Mahal,
the beloved wife of Shah Jahan, the Mughal Emperor. It was built over 17 years from 1631-1648. She died in childbirth of her 14th child. He was buried there as well when he died in 1668, after being overthrown by his son. I have seen many pictures but was not expecting how enormous the structure is. The entire place is beautiful and enormous with flanking buildings, gardens and gateways. I kept wondering about the cost of building it and how many man-years of India’s peasants financed it. Perhaps this excess was why this was the peak of the Mughal Empire. Within a 100 years, the British were
taking over. Afterwards we went to Agra Fort, which is similarly gigantic, and another seat of the Mughals. There are palaces and a throne inside the red fort with views of the Taj. There are 30 buildings left, the rest having been leveled by the British to erect barracks for their troops. We didn’t get back to our hotel until 11.
I left early the next morning for Kolkata, the British Indian capital until 1911, when they moved it to Delhi. It was done to punish the Bengalis for opposing the
splitting of the Bengal Presidency into more manageable size, which would have cut Bengal in two. I spent the day at Calcutta University then headed back to the airport to fly to Dhaka. At my usual hotel, I met up with Jenn Pickering, a student at Vanderbilt University, and Céline Grall, my postdoc. They were teaching a short course at Dhaka University. I spent the next few days in multiple meetings and making arrangements for a week of fieldwork. It will be good to get out into the countryside.
Completing an ‘Ice Station’ means collecting samples over a wide range of Arctic water and ice conditions. Each station means a major orchestration of people and resources. The teams gather, equipment is assembled, and the trek off the ship begins. After the first off ship exodus the sample teams are well practiced in moving equipment and setting up work areas so as not to interfere with the other stations. There is no shortage of space so spreading out is not a challenge!
Collecting a wide range of samples at multiple Arctic locations allows GEOTRACES to get an integrated look at the trace elements moving through the Arctic ocean ecosystem, and to better understand how these elements connect to the larger global ocean. Each is carefully collected. Whether the elements are ‘contaminants’ or essential nutrients there is a specific protocol in order to quantify the inputs without ‘dirtying’ the sample. It may seem odd to think of ‘dirtying’ something we label a contaminant, but in order to fully understand the concentrations and methods of transport for each element, every sample is handled with the same amount of care.
The following photo essay showcases the various ice/water sampling stations and reviews what is being collected at each.
Snow Samples: The snow collected at this station is being used in part to determine the presence/absence of contamination related to the March 11, 2011 Fukushima event.
Both the snow samples and the ice core sections will be analyzed and examined along with the information collected from seawater, suspended particulates, and bottom sediments, in order to better understand the influence of processes specific to the Arctic on the transport and distribution of several anthropogenic radionuclides.
Ice core samples: The ice cores are sections of sea ice, and again are being collected to determine the presence/absence of contamination related to Fukushima. In general the samplers were able to obtain 1.5 – 2 meters of ice in the cores.
Melt Ponds: Surface melt ponds form on the sea ice in the long says of the Arctic summer. The warmth of the sun creates ponds that sit on top of the ice. The water collected in these ponds carries different properties than the either the sea ice from which it melted, or the ocean water from which the sea ice formed. Most often these ponds have a frozen surface layer that needs to be drilled through before water is pumped out for collection.
Beryllium-7 (7Be) Samples: Produced in the atmosphere when cosmic rays collide with nitrogen atoms, 7Be is constantly being added to the surface of the water, and therefore is a great surface water tracer. With its very short half-life, ~ 53 days, 7Be can be used to track water parcel circulation as it moves between surface and deep water (which has no significant source of the 7Be isotope). The surface water pulls the 7Be with it as it moves down deeper into the ocean, allowing us to track and time the mixing process.
Dirty Ice Samples: The dirty ice work is more opportunistic, and therefore is not be part of each ice station. If dirty ice is spotted it will be sampled, and while it may not be part of each ice station, it is part of the overall GEOTRACES protocol. While most of the stations sample for quantification, i.e. grams of sediment/ml ice, the dirty ice samples are used more for characterization, i.e. composition or mineralogy. For Tim’s work the collection of dirty ice is used to look at sediments originating from continental shelves bordering the Arctic, with the goal of evaluating or characterizing dirty ice as a transport vector for anthropogenic radionuclides.
Minimal Processing of the samples collected at the stations will occur on the Healy. The snow and Ice gets melted and the seawater acidified. The focus of the trip is to collect as much material as possible. There will be plenty of time for processing when the researchers are back at their home institutions.
Margie Turrin is blogging for Tim Kenna, who is reporting from the field as part of the Arctic GEOTRACES, a National Science Foundation-funded project.
For more on the GEOTRACES program, visit the website here.
A quick post on an excellent review published last week by Antje Boetius and co-authors (including Jody Deming, my PhD advisor) in Nature Reviews Microbiology, titled Microbial ecology of the cryosphere: sea ice and glacial habitats. The review, focused on viral, bacterial, and archael microbes, provides an excellent overview of the major habitats within the cryosphere (broadly glacial ice, sea ice, and snow), the challenges and opportunities for microbial life, and the observed distribution of taxa and genes (to the extent that we know it). Like most Nature Reviews it is written for a broad audience and assumes no deep knowledge of microbial ecology or the cryosphere.
Plenty of reviews have been written on microbial life at low temperature, what makes this one stand out to me is the ecological focus. Although discussions of biogeography (i.e. what taxa are where) and metabolism are woven throughout the review, the emphasis is on habitats, including newly recognized habitats like frost flowers and saline snow. Check it out!
In preparation for their Arctic work GEOTRACES linked with “Float Your Boat”, an education program with a unique concept. ‘Float Your Boat’ blends the themes of historic Arctic drift studies, modern GPS technology and hands on science, to engage local communities with work in remote science locations. Scientists currently onboard the Research Vessel Healy spent time last spring recruiting and meeting with school groups to share information about the Arctic, their upcoming science cruise and collecting small student decorated wooden boats that would become part of the project.
For over a month the science team has been anticipating the deployment of these small wooden vessels since this builds a direct connection to their families and communities back home.
The student boats are deployed in a 100% biodegradable box lowered carefully onto an iceberg along with an iridium satellite tracking buoy. The tracker is activated ‘calling home’ so that it can be used to track the circulation of the ice. Over time the ice is expected to melt and the box will biodegrade sending these small floating wooden boats into the high seas of the Arctic Ocean.
Once the box degrades the boats will be separated from the tracker, but each boat has been identified by the students with their school and their own name and stamped with the project contact information. If any of the boats wash up onshore there is enough information for the locator to contact ‘Float Your Boat’ with a date and location. Through online tracking of the iridium satellite this project provides opportunities for students to learn about Arctic change, marine circulation, marine debris transit and maritime careers.
The ‘Float Your Boat’ project concept comes from early Arctic science, when drifting ice floes were used to track Arctic circulation. In the International Geophysical Year (1957-58) Lamont scientist Ken Hunkins resided for two 6 month stints on Ice Station Alpha, a station built on top of the Arctic sea ice. Science teams were flown in by plane and dropped, along with their equipment, about 500 miles north of Alaska. There they studied a range of ocean parameters, including tracking their own progress as they moved along with the ice drift. The 18 months of operations tracked the ice floe movement as it shifted ~2000 miles around the Arctic in a clockwise manner until it was just north of Ellesmere Island, Canada. (map below)
Somehow the rigid presence of the Healy seems infinitely more secure than a few tents and rigs set directly on the mile long by half-mile wide section of sea ice under station Alpha.
But even earlier than the science drift experiments were the expeditions of early Arctic explorers, like Fritdjof Nansen, who froze his ship the “Fram” into the northern icepack during his voyage of 1893-1896 in hopes of drifting to the North Pole. He did not succeed, however he did learn about Arctic drift and spurred additional research on this topic, perhaps leading to these young Arctic researchers and their ‘vessels’.
Margie Turrin is blogging for Tim Kenna, who is reporting from the field as part of the Arctic GEOTRACES, a National Science Foundation-funded project.
For more on the GEOTRACES program, visit the website here.
…for something completely different. My wife and I are expecting our first child in a few months, which is wonderful and all, but means that we are faced with the daunting task of coming up with a name. Being data analysis types (she much more than me), and subscribing to the philosophy that there is no problem that Python can’t solve, we decided to write competing scripts to select a good subset of names. This is my first crack at a script (which I’ve titled BAMBI for BAby naMe BIas), I’ve also posted the code to Github. That will stay up to date as I refine my method (in case you too would like Python to name your child).
My general approach was to take the list of baby names used in 2014 and published by the Social Security Agency here, bias against the very rare and very common names (personal preference), then somehow use a combination of our birth dates and a random number generator to create a list of names for further consideration. Okay, let’s give it a go…
First, define some variables. Their use will be apparent later. Obviously replace 999999 with the real values.get = 100 # how many names do you want returned? wife_bday = 999999 my_bday = 999999 due_date = 999999 aatc = 999999 # address at time of conception size = (wife_bday + my_bday) / (due_date / aatc) start_letters = ['V','M'] # restrict names to those that start with these letters, can leave as empty list if no restriction desired sex = 'F' # F or M
Then import the necessary modules.import matplotlib import numpy as np import matplotlib.pyplot as py import math import scipy.stats as sps
Define a couple of variables to hold the names and abundance data, then read the file from the SSA.p =  # this will hold abundance names =  # this will hold the names with open('yob2014.txt', 'r') as names_in: for line in names_in: line = line.rstrip() line = line.split(',') if line == sex: if len(start_letters) > 0: if line in start_letters: n = float(line) p.append(float(n)) names.append(line) else: n = float(line) p.append(float(n)) names.append(line)
Excellent. Now the key feature of my method is that it biases against both very rare and very common names. To take a look at the abundance distribution run:py.hist(p, bins = 100)
Ignore the ugly X-axis. Baby name abundance follows a logarithmic distribution; a few names are given to a large number of babies, with a long “tail” of rare baby names. In 2014 Emma led the pack with 20,799 new Emmas welcomed into the world. My approach – I have no idea if it’s at all valid, so use on your own baby with caution – was to fit a normal distribution to the sorted list of names. I got the parameters for the distribution from the geometric mean and standard deviation (as the arithmetic mean and SD have no meaning for a log distribution). The geometric mean can be calculated with the gmean function, I could not find a ready-made function for the geometric standard deviation:geo_mean = sps.mstats.gmean(p) print 'mean name abundance is', geo_mean def calc_geo_sd(geo_mean, p): p2 =  for i in p: p2.append(math.log(i / geo_mean) ** 2) sum_p2 = sum(p2) geo_sd = math.exp(math.sqrt(sum_p2 / len(p))) return(geo_sd) geo_sd = calc_geo_sd(geo_mean, p) print 'the standard deviation of name abundance is', geo_sd ## get a gaussian distribution of mean = geo_mean and sd = geo_sd ## of length len(p) dist_param = sps.norm(loc = geo_mean, scale = geo_sd) dist = dist_param.rvs(size = sum(p)) ## now get the probability of these values print 'wait for it, generating name probabilities...' temp_hist = py.hist(dist, bins = len(p)) probs = temp_hist probs = probs / sum(probs) # potentially max(probs)
At this point we have a list of probabilities the same length as our list of names and preferencing names of middle abundance. The next and final step is to generate two pools of possible names. The first pool is derived from a biased-random selection that takes into account the probabilities, birth dates, due date, and address at time of conception. The second, truly random pool is a subset of the first with the desired size (here 100 names).possible_names = np.random.choice(names, size = size, p = probs, replace = True) final_names = np.random.choice(possible_names, size = get, replace = False)
And finally, print your list of names! I recommend roulette or darts to narrow this list further.with open('pick_your_kids_name.txt', 'w') as output: for name in final_names: print name print >> output, name
We are closing in on a week of intense focus and excitement for GEOTRACES and for the United States around the Arctic. It was barely a week ago (Aug. 31) that President Obama became the first sitting president to visit Alaska, refocusing the other 49 states on the fact that we are indeed an Arctic Nation. This historic first was followed closely by another, the Sept. 5 arrival of the U.S. Coast Guard Cutter Healy with the U.S. GEOTRACES scientists on board at the North Pole, completing the first U.S. surface vessel transit to the pole unaccompanied by another icebreaker. Combined with this, U.S. GEOTRACES became the first group ever to collect trace metals at the North Pole. You might assume these three items are unrelated, but they are in fact tightly linked.
In convening the GLACIER Conference (Global Leadership in the Arctic: Cooperation, Innovation, Engagement & Resilience) in Alaska, President Obama focused on a region that is fast changing due to its fragility and vulnerability to climate change. The meeting timing aligned nicely with the U.S. assuming chairmanship of the Arctic Council, and was a perfect platform for the president to address climate change, an issue that he has tackled aggressively. Conference sessions on the global impacts of Arctic change, how to prepare and adapt to a changing climate, and on improved coordination on Arctic issues all align with the work of Arctic GEOTRACES, although tackled from a different angle.
It was while he was in Alaska that President Obama announced a commitment to push ahead the schedule for adding to the U.S. icebreaker fleet. The “fleet” has dwindled to just 3 U.S. vessels at present, and limits our ability to work in the Arctic. The goal of adding another icebreaker by 2020 will help to address this. “Working” in the Arctic for this Coast Guard cutter includes supporting the research that is critical to our being able to develop a baseline understanding of conditions and more accurately predict the future changes.
Evidence for change in the Arctic is found in the ability of the U.S. Coast Guard Cutter Healy to cross the Arctic ocean along its longest axis (the Bering Strait route) and penetrate deep into the sea ice to make it to the North Pole unaccompanied. The ice has been thinner than expected and experiencing a much higher degree of melt. Ice stations, where the science team gets out onto the ice to sample, have been postponed because of safety concerns from the thin ice conditions. Everyone, including the captain, has been surprised by the conditions. The thin ice has increased the speed of travel. Although some thick (up to 10 feet) and solid ice has been encountered, much of the cruise has been spent traveling at up to 6 knots, and much less fuel has been used than expected because of this.
The last week has been action packed for all 145 people on the Healy. First. a “superstation” was run, a 57-hour sampling stop with a large number of samples collected in the ~4,000-meter-deep water. A super station includes additional hydrocasts and pump sampling for the groups like Tim Kenna’s, that require large volumes of sample water. This was also a crossover station with the German GEOTRACES cruise on the Polarstern. Crossover means some of the extra samples collected can be used to do intercalibration (check to see that the results compare) between the science teams on the two ships. The German ship will collect at the exact same location. With large sampling projects using multiple labs and sampling teams, intercalibration becomes extremely important for interpreting the results.
After our long superstation, the team went almost immediately into a dirty-ice station (ice that entrains sediment as it freezes). This ice can form in several ways: during the spring thaw when ice dams in Arctic streams force sedimented water out onto the ice, where it refreezes; during cold storms that churn up sediments in the shallow shelf regions to refreeze on the surface ice; and when shallow areas freeze solid, collecting sediment at the base, and later break away. Once the ice is formed, it moves into the Arctic circulation pattern, so identifying the source of the sediment can help us better understand the temporal and spatial nature of Arctic circulation. This type of ice has high value for Tim’s research, since short-lived radioactive isotopes are frozen into the ice with the sediments, providing a timer for the formation of the ice.
The dirty ice station was followed by an ice-algae station. Both of these entail stopping the ship and craning over two people in a “man-basket” where they can get out and sample (see image). This was followed closely by two full ice stations, where many groups went out on the ice to do their sampling; some for over 12 hours (brr). The second ice station had wind chills of -14 C.
Field time, especially in the polar regions, is expensive and limited, so while in the field it is critical to complete as much science as possible. Sleep happens later when the team is back home.
Lamont Note: As part of the Healy’s instrument package, they standardly carry a CO2 instrument from Lamont’s Taro Takahashi. This was onboard when the Healy reached the North Pole (89.997 °N). The partial pressure of CO2 (pCO2) in seawater was found to be 343.3 micro-atmospheres at the water temperature of -1.438 °C. This is about 50 micro-atmospheres below the atmospheric pCO2 of 392.7 micro-atmospheres, and indicates that the Arctic Ocean water is rapidly absorbing CO2 from the air. The measurements confirm that the Arctic Ocean is helping to slow down the accumulation of the green house gas in air and hence the climate warming.
Margie Turrin is blogging for Tim Kenna, who is reporting from the field as part of the Arctic GEOTRACES, a National Science Foundation-funded project.
For more on the GEOTRACES program, visit the website here.
People are sometimes startled
By falcons perched on balconies, raccoons slinking through the park,
Bluefish blitzing herring up the river, coyotes tracing train tracks.
Isn’t it amazing, or isn’t it disturbing, we say,
A creature’s daring foray into our hard-paved empire.
I prefer the long view – that of Manhattan Schist, let’s say,
Having been buried in mile-thick ice,
Thoroughly sculpted and scoured,
Recolonized by green things and red-blooded things
Over and over again, with each ephemeral ice age.
From that vantage, it is we who are the curious invaders, an encrusting colony
Of organisms with a stunning talent for creating habitat for ourselves.
Diggers of ditches, un-earthers of bones, surveyors of history
All tell a tale of an earlier island of Eden,
Teeming with silver-backed, feather-tipped, vibrant-green life
Not so long ago.
The Schist, sparkling darkly in the park, is not surprised
By ‘coons and hawks, toothed and clawed neighbors,
Nor by the eels, pipers, moths, terrapins, raptors, seals, spiders,
By great trees ripping upwards through pavement.
You might think that I am about to lament all the changes we have wreaked
On this landscape, but I refuse to despise my own species.
I refuse to accept the conservationist’s guilt,
To draw boxes around wildness and around civilization,
And ignore the reality that these two can never truly be separated.
Instead, I am in awe of the spectacular forces that shape my world,
From grinding ice sheet to pulverizing jackhammer,
From rising skyscraper to ascending oak.
I live my animal life deliberately,
Knowing that we can never extract ourselves from Nature,
And that the boundaries we draw are not real.
This is one in a series of posts by Katherine Allen, a researcher in geochemistry and paleoclimate at the Lamont-Doherty Earth Observatory and the School of Earth & Climate Sciences at the University of Maine.
Sediment coring the bottom of the world’s oceans is something that Lamont knows a lot about. Since 1947 Lamont has been actively collecting and archiving sediment from around the world. Currently our Core Repository contains sediment cores from every major ocean and sea in the world, some 18,700 cores. This is in large part due to Lamont’s first director, Maurice Ewing, who instilled a philosophy of “a core a day” for all ocean research vessels. Ewing firmly believing that if we had the sediment, we would be able to piece together patterns and stories about our planet, so every day at noon, or thereabouts, the ship would collect a core.
Scientists from around the world have requested slivers of mud from the cores in the repository to unlock Earth’s mysteries and secrets. The cores in Lamont’s Core Repository are no stranger to revealing stories of Earth systems, including those of climate cycles. Almost 40 years have passed since the groundbreaking work of the CLIMAP group that used the cores to connect the start of Earth’s glacial cycles to changes in eccentricity, precession and tilt. (Hayes, Imbrie and Shackleton, 1976) . Collecting sediment on this Arctic GEOTRACES cruise will help us understand more of the stories locked in the oceans.
The length of a core is dictated by the goal of the collection. Early Lamont cores were more about collecting just to gather the material because the ship was there. These early cores were generally 6 to 9 meters long, although one incredibly long 28.2m core was collected from the Central Pacific. Locally cores have been collected on the Hudson River and local marshes that are closer to 1 or 2 meters in length.
For the sampling GEOTRACES is doing in the Arctic, there is a specific goal of collecting just the top few dozen centimeters of sediment and the water just above it, yet at a depth of ~2,200 meters. This will require a much different technique than what was used for the Central Pacific core.
The sediment in this region is soft, so the plan was to drop a small, general-purpose device called a mono-corer over the side of the ship with a few small weights on top to help drive the core tube in straight. The corer would hang below the bottom of the rosette of water samplers, far enough below that the rosette would remain mud-free but still able to collect near-bottom water samples. The mud in the mono-corer would be held in place by a spring-loaded door that snapped closed once the mud was inside and the tube began its return trip to the ship. All sounded good.
Although the plan was good, things don’t always go perfectly. Making sure the corer actually penetrated the sediment without tipping over or over-penetrating and compressing the top sediments proved challenging, as did ensuring the sample made it back to the ship intact. After several attempts a special “cone-of-silence” (any Get Smart fans out there?) was rigged up by the two Lamonters, Tim and Marty Fleischer, to avoid interference with the communications that were connecting with the rosette altimeter, controlling the lowering of the device. The cone was installed and the speed of the core lowering was slowed. Success! ‘Houston we have mud!’
Now to unpack its secrets.
For more on the GEOTRACES program, visit the website here.
Sounds like the basis for a great scifi thriller… “scientists scour Arctic, hunting for traces of nuclear fallout and ejections from cosmic ray impacts.” In reality this thriller theme is the actual core of the GEOTRACES mission. Let’s break it apart a bit to better understand it.
Fukushima and Other Nuclear Fallout
The project Tim is focused on is the human introduced (anthropogenic) radionuclides that are released into the environment as a result of nuclear industrial activities, things like weapons production and testing, as well as nuclear power generation and fuel reprocessing. This includes isotopes of plutonium, neptunium, cesium, strontium, iodine and uranium that are not normally found in the environment. The major sources of these nuclides include fallout from atmospheric weapons testing and liquid releases from European nuclear fuel reprocessing.
One goal of our project is to determine the budgets (overall input and export) of these contaminants. Samples collected along our cruise track combined with those collected on the European GEOTRACES cruise taking place on the Polarstern will allow us to do this.
We are also collecting samples to evaluate for the presence and distribution of contamination related to Fukushima. Two cesium isotopes were released into the environment as a result of Fukushima; Cesium 137, with a half-life of 30 years, and Cesium 134, with a much shorter half life of two years, so little is left from past nuclear testing. Fallout from Fukushima is an excellent tracer to help us learn more about ocean circulation and transport models.
Cosmic Ray Interactions
Another part of the GEOTRACES team is measuring Beryllium-7 (Be-7), a cosmogenic nuclide that is created when a cosmic ray breaks apart heavier atoms into smaller atoms. Be-7 is a short-lived isotope with a half-life of 53 days. We can use this short half-life to tell us something about water circulation and exchange rates under the ice. Currently the team is measuring Be-7 in the marginal ice zone. Once the ship reaches a section of ice that is large and thick enough for the scientists to work on, we will drill through to measure under the ice as well.
Yes We Have a Bubble Room!
When we said “trace” elements we weren’t kidding! Jess and Sara are part of the team working on contamination-prone trace elements. Their work is done in an inflatable bubble to keep it ultra clean. The bubble is inflated using high-efficiency particulate arresting (HEPA) filtered blowers. Trying to measure very small trace elements without contamination is extremely difficult, and it is a testament to their skills that they can measure elements such as zinc and iron, which are extremely low in seawater but very common on the ship (rust never sleeps!). Getting an accurate measure means not picking up any of that ship input.
In order to run all these great experiments, we need samples, so we are collecting and filtering water at as many stations as we can. Sampling in the ice pack is very different than sampling in an open ocean. Station locations must be very carefully selected to reduce the risks of the equipment getting entangled in the ice and ending up either crushed or ripped away. Even in less dense ice, we caught the hydrowire on an ice floe (above).
Everything is supersized on a ship like the Healy, from the large metal A-frame support that is used to lower collection equipment (yellow/buff colored) to the circular metal rosette which is filled with niskin collection bottles for gathering water samples. The deployment of a rosette for sampling is called a “hydrocast.” This allow scientists to collect water at a variety of depths. The images below are from a few days ago, before we hit denser pack ice.
The rosettes can hold up to 36 bottles. Each bottle can be programmed to snap closed at a specific depth, so in one deployment, water can be collected at up to 36 different depths. This is extremely valuable for teasing apart circulation through tracking small particles entrained in the water column at different depths. The water collected in these sampling bottles will be used for a range of studies.
This sequence of the retrieval of this hydrocast involves four people to collect and stabilize the rosette, as well as the personnel up above operating the winch to lower the equipment, and several people in a console monitor verifying both the depth of the rosette and that the cable on the equipment is sending up the necessary data. Operating the equipment on a ship is labor intensive, but each deployment retrieves enough sample material for not only the team on board the Healy, but for colleagues and partners waiting back at their home institutions for samples.
For more on the GEOTRACES program, visit the website here.
The Healy has now moved off of the shallow continental shelf that extends around the Arctic land border (shown in white in the map below) into the deeper center of the Arctic Ocean. In our last blog we noted that some of the questions Arctic GEOTRACES is addressing include quantifying the fluxes of trace elements and isotopes into and out of the Arctic Basin from the two oceans through choke points like the Bering Strait, as well as characterizing how much comes from rivers. Arctic GEOTRACES is also studying what regulates the Arctic shelf to deep basin exchange, and the role of sea ice in the transport of trace elements and isotopes. (Follow the expedition here.)
The oval shaped blue area in the map above is the basin of the Arctic Ocean, ranging from ~3,500 meters to ~5,000 meters at its deepest. The Healy is currently over a ridgeline named the Mendeleev Ridge, after a Russian chemist and inventor, Dmitri Mendeleev, long dead when the ridge was first discovered by fellow Soviets in 1948. Mendeleev Ridge is about 1,000 meters shallower than the deep Arctic, bottoming out at ~2,500 meters in depth. The Russians maintain that the ridge, with its long reach into the Arctic basin, gives them claim to large sections of the ocean stretching out to the North Pole. The claim remains unresolved, in part because there are so many questions that still remain about the Arctic. As we move into the basin, we will be sampling to try and better constrain what happens at the shelf/basin interface.
When we venture into the Arctic for research, for most of us there is the lingering hope that a polar bear will appear on our watch; at least as long as we are safely outside of its reach. Several polar bear have been spotted by the watchful eyes of the crew as we have moved into the more tightly packed heavy ice away from the marginal ice zone. However, today a very large bear (yes the alert text says “huge”!) was spotted, and it seemed to have us under thoughtful consideration. The following is a string of images that relay the majesty of this incredible creature in its natural environment, moving with great agility over the sea ice.
Polar bear live only in the Arctic and rely almost entirely on the marine sea ice environment for their survival. They use the ice in every part of their daily life, for travel, for hunting ringed seal, their favorite food, for breeding and in some cases for locating a birthing den. Their wide paws, which you might be able to see in these photos, distribute their weight when they walk on the sea ice, which late in the season can be quite thin in the annual ice region, melting down to only a thin crust over the water. Their large size, clearly visible in these photos, belies the fact that they are excellent swimmers, helped by their hollow fur, which traps air to keep them buoyant, as well as the stiff hair and webbing on their feet. For all their cuddly appearance, they are strong hunters. Currently polar bear range in conservation status from Vulnerable internationally, to Threatened in the U.S., primarily the result of a warming climate that is melting their habitat…sea ice.
The Arctic is approaching the annual low for sea ice extent, which occurs each year in September. An image of sea ice extent for today (shown in white) against an average of the last thirty years (outlines in yellow) shows how our annual sea ice cover has dropped. Today’s cover is 2.24 million square miles (5.79 million square kms), which is 521,200 sq. miles (1.35 million square kms) below the last 30 year average period. Aside from being of concern to the polar bear, this is part of why Arctic GEOTRACES is so important. We need to understand the role of sea ice in current circulation patterns and delivery of trace elements and isotopes in the Arctic, and then bring this more complete understanding forward to our careful examination of the changing Arctic.
For more on the GEOTRACES program, visit the website here.
roaming the hallways and the parking lot was full of SUV’s washed in clay, sand and
mud. When most of the second phase of the SUGAR project had come to a halt, there
was still work to be completed by the Seismic Source Team (SST). In order to
understand why, let me take you through the work schedule of the SST.
Dr. Harder and I drove to Atlanta on July 1st after completion of the ENAM
project in North Carolina and began scouting the shot-holes we would need to drill, load
and stem i.e. fill before the shot dates, which were scheduled for August 7th and 8th for
Line 2 and August 14th for Line 3. When scouting, you want to ensure that the shot-hole
locations selected have good, accessible roads and enough space for the drillers as well as
work crew to move in and out of easily. However beforehand, you want to ensure that
you have the permits to access different properties and have the correct keys for the
property entrance/exit gates, which Donna took care of. Scouting holes took 4 days
before drilling began on July 7th until July 29th.
An example of a good, accessible road for the drillers and SST to use.Pick a lock, any lock. One of the entrance/exit gates to a shot location. Thankfully, we
had the key. I just had to test it on each lock to open the gate. A typical workday would consist of waking up at 6:30 am, eating breakfast at 7
am and leaving to work at 7:30/8 am. We would arrive on site about an hour later and the
drillers would set up and begin drilling. This would take about 2-3 hours at some holes
and 3-4 hours at others. The last hole composed of hard rock took about 14 hours to
complete. That does not include the time it took for us to stem the hole. We would
prepare the charges to load into the hole when the drillers had ~20 ft left to drill. They
drilled up to ~80 ft at the 2 shot-holes on the ends of Line 2 and ~70 ft for the remaining
13 shot-holes. For Line 3, they drilled all 11 holes to ~60 ft. After drilling and loading
the charges into the ground, Dr. Harder would lead the drillers to the next shot-hole while
Galen, Yogi and I would stay behind to stem the hole with gravel, sand and plug it with
bentonite. We would also check the detonators to make sure they worked before heading
off to the next shot-hole to repeat the process. On average, we would drive anywhere
from 100 – 200 miles per day depending on what we were doing and where we needed to
Yogi (Victor Avila, left) and Galen preparing 2 charges to be lowered into the shot-hole.
Each charge contains 2 detonators attached to 2 boosters indicated by the sets of wires.The drillers lowering the charge into the hole with Yogi carefully holding the detonator (orange wire) chords.
On the left is the water truck and to the right is the drill rig."The Beast" with a 1.1 Explosives placard after transporting the source materials to the shot location.Galen taking a GPS waypoint of the loaded shot-hole while Ashley tests the detonators to ensure that they are working.Dr. Harder (left) and Kent splicing the wires at one of the shot-holes to connect the detonators in order to shoot. The routine changed once drilling was complete. We made our way to Vidalia
where we met with Donna, Dan and everyone at the instruments center and began
preparing our equipment for the nights we were going to shoot. Shots would start at 11
pm and last until as late/early as sunrise depending on the weather conditions as well as if
the detonators would connect. The days that the deployment team members were
flagging and deploying instruments, we were busy driving to shot-holes and cleaning the
ones that blew out. The idea is that you make the shot-hole location look the way it did
before the shot took place.
Shot-hole 7 on Line 3. It looks like a regular hole, but it is actually about 5ft deep and has a 5ft diameter cavity.Using the backhoe to clean up the above shot-hole.After clean up!!I can honestly say there was never a dull moment while working on the SST. I
remember Donna saying at our farewell dinner something along the lines, “We do all this
work for just a disk of data, but it’s all worth it.” She could not have summed it up any
better than that.
Here’s to another successful project….salud!
Ashley Nauer - UTEP
Dutch Harbor Alaska is located on that long spit of land that forms the Aleutian Islands of Western Alaska. Research vessels launch from this location and head northeast into the Bering Sea on their way to the Bering Strait, the gateway to the Arctic.
Our research cruise is part of the international Arctic GEOTRACES program, which this summer has three separate ships in the Arctic Ocean. The Canadian vessel headed north in early July, and the German vessel will follow a week behind the Healy. Each will be following a different transect in the Arctic Ocean to collect samples. The U.S. vessel has 51 scientists on board, each with a specific sampling program. We will focus our time in the western Arctic, entering at the Chukchi Sea. (Follow the expedition here.)
What is GEOTRACES studying? The program goal is to improve our understanding of ocean chemistry through sampling different trace elements in the ocean waters. Trace elements can be an asset or a liability in the marine system, providing either essential nutrients for biologic productivity, or toxic inputs to a rapidly warming system. This part of the larger program is focused on the Arctic Ocean, the smallest and shallowest of the world’s oceans and the most under siege from climate change. Results from this cruise will contribute to our understanding of the processes at work in the Arctic Ocean, providing both a baseline of contaminants for future comparisons as well as insights into what might be in store for our future.
The land surrounding the Arctic Ocean is like a set of cradling arms, holding the ocean and the sea ice in a circular grasp. Within that cradle is a unique mix of waters, including freshwater from melting glacial ice and large rivers, and a salty mix of relatively warm Atlantic water and cooler Pacific water. Our first sample station lasts over 24 hours and focuses on characterizing the chemistry of the water flowing into the Arctic from the Pacific Ocean. This is critical for locking down the fluxes and totals of numerous elements in the Arctic.
In the past the “embrace” of the Arctic land has served as a barrier, holding in the sea ice, which is an important feature in the Arctic ecosystem. In 2007, however, winds drove large blocks of sea ice down the Fram Stait and out of Arctic. In recent years the Arctic sea ice has suffered additional decline, focusing new attention on the resource potential of this ocean.
Unexpectedly this year, the sea ice is projected to be thick along the proposed cruise track, thick enough that it might cause the ship to adjust her sampling plan.
The walrus in the above image are taking advantage of the Arctic sea ice. Walrus use the ice to haul out of the water, rest and float to new locations for foraging. Walrus food of preference is mollusks, and they need a lot of them to keep themselves satisfied, eating up to 5,000 a day, using the sea ice as a diving platform. As the ship moves further from shore, we will lose their company.
For more on the GEOTRACES program, visit the website here.
I’m very excited to report that our latest paper – Microbial communities can be described by metabolic structure: A general framework and application to a seasonally variable, depth-stratified microbial community from the coastal West Antarctic Peninsula was just published in the journal PLoS one. The paper builds on two very distinct bodies of work; a growing literature on microbial community structure and function along the climatically sensitive West Antarctic Peninsula, and a family of new techniques to predict community metabolic function from 16S rRNA gene libraries, which we are calling metabolic inference.
The motivation for metabolic inference is in the large amount of time that it takes to manually curate a likely set of functions for even a small collection of 16S rRNA genes. In today’s world, where most analyses of microbial community structure consist of many thousand of reads representing hundreds of taxa, it is simply impossible to dig through the literature on each strain to see what metabolic role each is likely to be playing. Ideally a researcher would use metagenomics or metatranscriptomics to get at this information directly, but it is not advisable or desirable in most cases to sequence hundreds of metagenomes or metatranscriptomes (necessary for the kind of temporal or spatial resolution many of us want these days). Metabolic inference provides a convenient alternative.
The basic concept behind all metabolic inference techniques (e.g. PICRUSt, tax4fun, PAPRICA) is hidden state prediction (HSP) (you can find a nice paper on HSP here). In 16S rRNA gene analysis metabolic potential is a hidden state. The metabolic inference techniques propose different ways to predict this hidden state based on the information available.
Our small contribution to this effort was to develop a method (PAPRICA – PAthway PRediction by phylogenetIC plAcement) that uses phylogenetic placement to conduct the metabolic inference instead of an OTU (operational taxonomic unit) based approach. Our approach provides a more intuitive connection between the 16S rRNA analysis and the HSP (or at least it does in my mind) and can increase the accuracy of the inference for taxa that have a lot of sequenced genomes.
Most analysis of large 16S rRNA datasets rely on an OTU based approach. In a typical OTU analysis an investigator aligns 16S rRNA reads, constructs a distance matrix of the alignments, and clusters the reads at some predetermined distance. By tradition the default distance has become a dissimilarity of 0.03. This approach has some advantages. By clustering reads into discrete units it is easy to quantify the presence or absence of different OTUs, and it allows microbial ecologists to avoid problems with defining prokaryotic species (which defy most of the criteria used to define species in more complex organisms). To conduct a metabolic inference on an OTU based analyses it is possible to simply reconstruct the likely metabolism for a predefined set of OTUs based on the OTU assignments of published genomes. This works great, but it limits the resolution of the inference to the selected OTU definition (i.e. 0.03). For some taxa, such as Escherichia coli (and plenty of more interesting environmental bugs), there are many sequenced genomes that have very similar 16S rRNA gene sequences. PAPRICA provides a way to improve the resolution of the metabolic inference for these taxa.
Our approach was to build a phylogenetic tree of the 16S rRNA genes from each completed genome. For each internal node on the reference tree we determine a “consensus genome”, defined as all genomes shared by all members of the clade originating from the node, and predict the metabolic pathways present in the consensus and complete genomes using Pathway-Tools. To conduct the actual analysis we use pplacer to place our query reads on the reference tree and assign the metabolic pathways for each point of placement to the query reads. One advantage to this approach is that the resolution changes depending on genomes sequence coverage of the reference tree. For families, genera, and even species for which lots of genomes have been sequenced resolution is high. For regions of the tree where there are not many sequenced genomes resolution is poor, however, the method will give you the best of what’s available.
PAPRICA provides some additional helpful pieces of information. We built in a confidence scoring metric that takes into account both predicted genomic plasticity and the size of the consensus genome relative to the mean size for the clade (deeper branching clades will have a bigger difference), and predicts the size of the genome and number of 16S rRNA gene copies associated with each 16S rRNA gene, both of which have a strong connection to the ecological role of a bacterium
For our initial application of PAPRICA we selected a previously published 16S rRNA gene sequence dataset from the West Antarctic Peninsula (our primary region of interest). One thing that we were very interested in looking at was whether we could describe differences between microbial communities organized along ecological gradients (e.g. inshore vs. offshore, or surface vs. deep water) in terms of metabolic structure in place of the more traditional 16S rRNA gene (i.e. taxonomic) structure. Using PAPRICA to convert the 16S rRNA gene sequences into collections of metabolic pathways we found that we could reconstruct the same inter-sample relationships identified by an analysis of taxonomic structure. This means that a microbial ecologist can, if they choose, disregard the messy and sometimes uninformative taxonomic structure data and go directly to metabolic structure without losing information. Applying common multivariate statistical approaches (PCA, MDS, etc.) to metabolic structure data yields information like which pathways are driving the variance between sites, and which are correlated with what environmental parameters. This information is much more relevant to most research questions than the distribution of different microbial taxa. It is worth noting that while inter-sample relationships are well preserved in metabolic structure, the absolute distance between samples is much less than for taxonomic structure. This might have some implications for the functional resilience of microbial communities, which we get into a little bit in the paper.
PAPRICA was an outgrowth of a couple of other papers that I’m working on. At some point the bioinformatic methods reached a point where separate publication was justified. As a result, and reflecting the fact that I’m much more an ecologist than a computational biologist, PAPRICA is not nearly as streamlined as PICRUSt (which is even available through an online interface). I’ve spent quite a bit of time, however, trying to make the scripts user friendly and transportable. Anyone should be able to get them to work without too much difficulty. If you decide to give PAPRICA a try and run into an hitches please let me know, either by posting an issue in Github or emailing me directly! Suggestions for improvement are also welcome.
HUGE THANKS to all the volunteers who worked so hard to make this project such a great success. It was a pleasure working with you and getting to know you all. Also mega thanks to all the landowners who were kind enough, and trusting enough, to let us put a source on their property. None of this could have happened without your generosity and spirit of curiosity. Thanks so much.