I guess I should keep this going and write a little story for everybody about each of my manuscripts that some nice editors decide to accept.
This one here is about gene expression in whitefish embryos. Regarding the blog I wrote about the Aquatic Science paper, this could be interpreted as a continuation, a different perspective of our study system. Here we collected sperm and unfertilized eggs from whitefish that can be found in lake Geneva (Coregonus palaea). We fertilized the eggs in vitro and in a full-factorial breeding design. That means that we took the eggs of 4 females and the sperm of 4 males and then crossed them in all possible combinations. This gave us 16 little Tupperware buckets with freshly fertilized eggs. These we brought to the University where we have climate chambers at a constant temperature of 6.5° Celsius. All eggs were filled into individual wells with 2ml of standardized water. In their little microcosmos they grew and developed into embryos. At the late-eyed developmental stage when the blood circulation system is fully functional, we infected 13 replicates of each family with a nasty bacterial pathogen that can be found across many Swiss lakes and rivers Pseudomonas fluorescens. We also added some nutrients that these bacteria like. As a control we added only the nutrients without the pathogen to another 13 replicates of the same families. We already knew from previous studies that at this stage of development it matters who the father of the embryos was with regard to mortality and performance of the embryos under stress. Since the father only provided sperm and no paternal care, father effects can be interpreted as genetic effects. That means, he contributes genes to the embryos only. Once the embryo is old enough it starts expressing these genes and they have an impact on how it will perform under stress. However, what we did not know yet was which genes might be involved. So my goal for this study was to give these significant genetic effects a name.
Accordingly two days after treatment, we collected 3 embryos from each family and treatment and extracted all the gene expression products in their whole body, the so-called RNA. These are the gene products in all animals and plants that will be translated into proteins, which make up what we are. With the use of next-generation sequencing, we digitalized all this information. That means, that only messenger RNAs are filtered out, tagged according to which individual they belong to, pooled and then translated into letters. For this process we collaborated with a Swiss company in Geneva Fasteris. Months later, I received a huuuuuge text file that I could then use to satisfy my hunger for learning bioinformatics tools. With the help of a very friendly guy at Fasteris I found most overlapping sequences of text and aligned everything to a collection of longer text segments (contigs). These were then compared to an online reference of gene expression reads. The amount of data we produced was extremely big. Since I was only interested in genes that are differentially expressed between embryos in our treatment and control group, I decided to quantify the reads first and only compare the ones to a reference, that are also different between the two groups. Many of the reads did not result in a match, however, 1096 did and those could be characterized further. They told us which functional pathways are already active at this early stage of fish development and they gave us some insight into what defense mechanisms these embryos already have.
There is some background information to this paper. First, I must admit that I convinced my PhD supervisor to conduct this experiment because I wanted to get the chance to learn more about bioinformatics. I was the one responsible for the experimental treatment, the laboratory analysis of extracting RNA, and the one who did all the bioinformatics and data analyses. It was also the first manuscript that I wrote mostly on my own. It is not a very ground-breaking story. I would call it a lesson.
This project was always a side project during my PhD. We started with the fieldwork in 2010 when I just began my doctoral studies in Lausanne. The lab work was done in the end of 2011. I had serious troubles getting RNA of good quality from these embryos. We did not extract RNA directly but froze the embryos first for a couple of months at -80° Celsius. I would not recommend that to anybody. I was very busy working on different projects, some of them I considered my main projects at that time, and therefore this side project had to wait several times.
From the 16 families I managed to get a rather high amount of good quality RNA from 4 families. I made sure that I would have gene products from the same mother, crossed with 4 different fathers. I would say that this is the heart of this study. We could reduce variation in gene expression due to maternal effects. Different genotypes from the fathers were investigated against a constant background of the same mother. The whitefish external breeding system allowed us to control for host genotypes and contrast environment-induced changes in gene expression. I am discussing this strength of the study in the paper and encourage other scientists to use the same host system of fish with an external breeding system to investigate gene expression due to different treatments. It does not only present a way to reduce variation in expression due to maternal and environmental effects, it also provides the possibility to study gene expression in natural populations and their ecologically-relevant context.
I also would like to add that it was very pleasant to work with Laurent Farinelli at Fasteris. He is one of the founders of the Illumina sequencing methodology and he had the balls to start his own company. Now he is collaborating on many very exciting projects and he delivers high quality data and an exceptional service. I enjoyed my few meetings in Geneva at his company.
To end the story of this paper I have to mention that I finally sent the extracted RNA for sequencing in summer 2012. I received the data during my maternity leave and already started playing with it. In 2013 I managed to assemble all reads. I had to digest quite a bit of theory about partly assembled gene expression reads (transcriptomes) and I learned how to use a high-performance computing system (clusters at the SIB). In 2014 I did the differential gene expression analysis and compared reads of interest to different online reference databanks. Here I could rely on the theory about differential gene expression in lung cancer datasets that I was exposed to during my internship at Novartis. This project would not have been possible without the help of several co-workers at UNIL, such as Oksana Riba, Kate Ridout, Paris Veltsos and my co-author Emily Clark. At this point I would like to thank them again for their help and advice.
At the moment I am supervising a Master student who is applying the same experimental set-up in grayling (Thymallus thymallus). He is looking at sex-specific gene expression of grayling embryos under estrogen stress. In this project we also did all the steps from fieldwork until bioinformatics ourselves. However, he can rely on several collaborators who are specialized in the different aspects of the project and as a Master student he concentrates on only one project at a time. I am very excited to see him advancing so fast.