Bioinformatics
Dr. Keenan Amundsen
Author
01/07/2016
Added
179
Plays
Description
Bioinformatics presented by Dr. Keenan Amundsen
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:01.073]Today I'm gonna frame my talk
- [00:00:03.778]using buffalograss as a model,
- [00:00:06.077]and so that's the primary species that
- [00:00:07.759]our lab works with, so basically I'm gonna
- [00:00:10.372]chat a little about that.
- [00:00:11.533]Before I do, though, I think it's important
- [00:00:14.028]to revisit why buffalo,
- [00:00:16.420]or bio-informatics, there's too many "b" words in this talk,
- [00:00:19.218]so I'm gonna get confused.
- [00:00:20.785]But why bioinformatics is kind of emerging
- [00:00:23.606]as an important field.
- [00:00:25.487]And I'm gonna do that by looking at some
- [00:00:27.808]publicly available data.
- [00:00:29.666]I pulled some information that's available
- [00:00:33.427]from the National Center of Biotechnology Information,
- [00:00:36.225]which is a public repository.
- [00:00:37.827]Stores a lot of DNA sequence information,
- [00:00:40.660]along with lots of other information,
- [00:00:42.540]but for this talk right now I'm just gonna focus on
- [00:00:45.512]some of the expressed sequence tagged information
- [00:00:48.447]that they have.
- [00:00:49.795]So basically, what I'm showing here
- [00:00:51.433]is the amount of expressed sequences
- [00:00:55.113]that have been sequenced
- [00:00:56.541]and deposited in the database for wheat.
- [00:00:59.640]And so you can see that there's about
- [00:01:01.103]one and a half million, I think that's right.
- [00:01:03.541]Yeah, so about one and a half million
- [00:01:05.155]EST sequences for wheat.
- [00:01:07.012]That's a lot of data, right?
- [00:01:08.823]And so that's part of what we're talking about
- [00:01:10.545]for bioinformatics.
- [00:01:11.657]It's a recurrent theme that
- [00:01:12.875]came up with some of the talks this morning.
- [00:01:14.802]It's really a lot of data.
- [00:01:16.938]And also, to look at some of these other species
- [00:01:19.956]like soybean, it's about the same.
- [00:01:22.174]As for wheat, and then corn's up over two million.
- [00:01:25.540]But I don't have the luxury of working with those species,
- [00:01:28.814]unfortunately, so I work primarily on buffalograss.
- [00:01:31.808]As of last week when I pulled this data
- [00:01:34.885]there's zero ESTs published for buffalograss.
- [00:01:39.529]What I'm gonna do now is talk about some of the
- [00:01:41.579]challenges of using some of these tools
- [00:01:44.950]to explore, basically specialty crops
- [00:01:48.218]that don't have a lot of the same types
- [00:01:49.756]of resources as some of these bigger agronomic crops.
- [00:01:54.446]But again, to do that I think it's nice to
- [00:01:55.888]set the stage and give everybody a little background
- [00:01:58.871]on where, or how some of these tools have evolved.
- [00:02:02.563]And I know some of that discussion came out
- [00:02:04.246]this morning.
- [00:02:06.110]But this graph here I completely stole from the internet,
- [00:02:10.284]but basically it shows growth in
- [00:02:16.764]sequencing technology over the past 30 or so years.
- [00:02:20.458]And you'll notice,
- [00:02:22.258]on the horizontal axis this is showing years
- [00:02:24.719]since 1980 to the future, a future date.
- [00:02:28.097]And the vertical axis there is showing
- [00:02:32.137]the amount of sequence data that one of these
- [00:02:35.584]sequencers can generate, and so we're talking
- [00:02:37.918]really about analyzing and working with large
- [00:02:41.423]scale sequencing data sets.
- [00:02:43.583]And the other thing that it's kind of
- [00:02:45.557]important to point out is
- [00:02:47.321]you'll notice that the graph on the,
- [00:02:50.398]or the vertical axis is basically on a logarithmic scale.
- [00:02:53.857]So we've seen this huge growth
- [00:02:55.842]over recent years in particular,
- [00:02:57.340]with the amount of data that these
- [00:02:59.800]sequencers can generate.
- [00:03:01.333]So that's an important concept that we'll talk about.
- [00:03:05.242]And then, they're developing these sequencers
- [00:03:08.055]basically as fast as they can put one of these
- [00:03:10.334]graphs together,
- [00:03:11.375]so they're really coming in the market really quick.
- [00:03:13.662]One of my favorites is this little guy here.
- [00:03:15.938]It's called the MinION.
- [00:03:18.143]And it's no bigger than my cellphone,
- [00:03:20.768]but it has the ability to generate tons and tons of data
- [00:03:24.191]and it's basically just a USB key that you plug in,
- [00:03:26.722]you add your sample to it.
- [00:03:28.104]You're able to generate some data.
- [00:03:29.903]And since my kids are kind of like-minded,
- [00:03:32.504]they also do a lot of Google searching for
- [00:03:34.790]MinION, but of course they pronounce it
- [00:03:37.043]a little differently.
- [00:03:41.310]When we're talking about Big Data, I know Harkamal
- [00:03:43.230]touched on it this morning, as did Daniel
- [00:03:45.413]and some other folks, right.
- [00:03:46.978]This is just an example of some of the
- [00:03:49.453]sequence data that our lab generated.
- [00:03:51.357]Just to kinda give you an idea of how much
- [00:03:53.575]data we're really talkin' about.
- [00:03:55.683]We work with these horrible, non-graphical
- [00:03:59.053]terminal emulators when we're analyzing this data.
- [00:04:02.014]And so, on this particular slide here,
- [00:04:05.079]that screen capture is just showing
- [00:04:08.562]four sequences, which isn't much, right?
- [00:04:11.974]But this particular project, on this particular
- [00:04:16.293]data file had more than 12 million sequences,
- [00:04:19.172]just in one data file, right,
- [00:04:20.820]and so that'd be the equivalent of,
- [00:04:22.841]I think my math's about right.
- [00:04:24.013]So about three million screens of data.
- [00:04:26.810]That's a lot of information,
- [00:04:28.958]and it's just one text file,
- [00:04:30.560]and the text file itself is several gigabytes in size.
- [00:04:33.938]So, you can't even open these
- [00:04:35.818]in a lot of, you know, Microsoft Word,
- [00:04:38.094]or something like that.
- [00:04:39.232]So there's tons and tons of data.
- [00:04:42.238]And then, for this particular project
- [00:04:45.837]we had 36 such data files.
- [00:04:47.718]Harkamal talked about one that had, I think he said
- [00:04:49.715]800 or so.
- [00:04:50.609]Some of these projects get ridiculously big, and they're
- [00:04:54.010]generating ridiculous amounts of data, okay.
- [00:04:57.458]And so those are definitely some of the challenges.
- [00:05:00.197]A lot of the early drivers of this increase in technology,
- [00:05:03.913]so we're really talking about increase in technology,
- [00:05:06.153]we're able to generate a lot more sequence data
- [00:05:08.754]than we could five years ago,
- [00:05:11.029]10 years ago.
- [00:05:12.248]These are just a couple of projects that are driving
- [00:05:15.011]some of this innovation.
- [00:05:18.565]And honestly, there's hundreds or thousands more
- [00:05:21.454]of these types of projects.
- [00:05:23.079]You might imagine that we've seen
- [00:05:26.341]a lot of growth in the technology to generate this data,
- [00:05:29.139]but really the holdup is in bioinformatics
- [00:05:32.053]and computational biology.
- [00:05:33.353]There's a huge bottleneck in being able to
- [00:05:36.199]analyze the data,
- [00:05:37.858]and so that's becoming the problem.
- [00:05:39.401]And so now we're at a place where we can
- [00:05:41.178]generate as you saw on that previous slide,
- [00:05:43.674]or a couple of slides ago,
- [00:05:45.681]billions and billions of base pairs of sequence data
- [00:05:48.781]just in one sequencing run.
- [00:05:50.396]But the problem is, having the expertise,
- [00:05:52.834]the bioinformatics knowledge and
- [00:05:54.134]computational biology knowledge to be able to
- [00:05:56.060]analyze that data, and so that's really the challenge.
- [00:06:00.856]I kind of have a background in bioinformatics
- [00:06:02.503]and computational biology, so that's why
- [00:06:04.001]Roch asked me to talk about this today.
- [00:06:09.034]I do want to make one distinction, though.
- [00:06:10.723]We're kind of, I don't know that we have
- [00:06:12.371]any true bioinformatics people in the department.
- [00:06:15.911]So most of us that are on this side of things
- [00:06:18.582]would argue that we're computational biologists.
- [00:06:20.532]So, kind of a true definition
- [00:06:23.023]is that bioinformatics folks are the ones that are
- [00:06:26.499]developing new algorithms and software tools
- [00:06:29.448]to analyze some of the data, whereas
- [00:06:31.131]computational biologists are actually using the tools
- [00:06:33.534]to analyze the data sets, and so
- [00:06:35.450]I definitely fall more into that later category.
- [00:06:39.655]As I mentioned at the beginning, one of the things
- [00:06:40.980]I wanted to do is just talk about
- [00:06:42.909]some of our research in buffalograss
- [00:06:44.289]and how we can take advantage of some of these
- [00:06:46.552]modern sequencing tools to try and study
- [00:06:49.792]a plant system for which there's really no
- [00:06:53.612]up front genomic information, okay,
- [00:06:55.851]and so that's what I'm going to spend
- [00:06:57.071]a little time on.
- [00:07:00.138]And what I really want to do is just focus in on
- [00:07:01.912]just one project, we have several of these going,
- [00:07:04.268]but this is probly the one that's furthest along
- [00:07:06.974]in the process and I think is, we've have some
- [00:07:09.156]interesting data from it.
- [00:07:10.747]Leaf spot disease, it's caused by a disease complex.
- [00:07:14.774]I work with buffalograss, it's a low input, sustainable
- [00:07:17.432]turf grass species.
- [00:07:18.989]It's native to the Great Plains region.
- [00:07:20.904]Leaf spot disease is, again,
- [00:07:23.005]caused by several pathogens,
- [00:07:24.527]but it's one of the diseases that's important
- [00:07:26.558]for buffalograss.
- [00:07:30.215]Under the right conditions or, as a manager,
- [00:07:32.663]the bad conditions,
- [00:07:34.301]so under the right conditions you can get disease that's
- [00:07:37.203]severe enough to cause stand loss, and so that's a problem.
- [00:07:39.979]One of the things we did,
- [00:07:41.859]I also am a turf grass breeder, and so
- [00:07:44.575]we want to identify new sources of host resistance
- [00:07:47.883]to leaf spot disease.
- [00:07:49.380]And so I had a post doc, Sah-Jee Bah-Rom-Rah-Dah-Sah
- [00:07:52.237]that screened 84 different buffalo grass genotypes
- [00:07:56.393]in the green house, challenged 'em with
- [00:07:58.760]one of the pathogens that cause leaf spot disease
- [00:08:01.419]and as you can see on the right, we found
- [00:08:02.800]some that showed good resistance
- [00:08:05.158]to leaf spot disease in the green house
- [00:08:06.794]and others that were sensitive to it.
- [00:08:09.151]And so we picked some of those excessions
- [00:08:11.841]that were either resistant to the disease
- [00:08:14.383]or highly susceptible to the disease,
- [00:08:16.867]and in fact we picked two of each
- [00:08:18.620]and then we did one of these high-throughput
- [00:08:20.711]sequencing experiments.
- [00:08:22.150]We either grew those four plants, the two
- [00:08:24.134]resistant or two susceptible genotypes,
- [00:08:26.293]either under control conditions
- [00:08:27.629]or challenged them with the pathogen and then we
- [00:08:29.614]extracted RNA, we converted that to CDNA
- [00:08:33.492]and we sequence 'em.
- [00:08:34.386]And so that way we can basically look at,
- [00:08:36.533]take a snapshot and look at
- [00:08:37.972]all of the genes that are turned on
- [00:08:41.130]in response to the pathogen
- [00:08:42.674]at that one point in time.
- [00:08:45.089]So that's kind of the idea.
- [00:08:46.424]So what we're doing here is we're looking at,
- [00:08:48.363]basically 190,000 transcripts at one shot
- [00:08:52.508]and we're looking at how they differ between
- [00:08:55.167]the susceptible lines and the resistant lines.
- [00:08:58.346]And so one of the ways that we think is kinda creative,
- [00:09:02.746]one of the ways that we looked at this, again,
- [00:09:05.091]buffalograss doesn't have any up front information
- [00:09:07.275]and so, it limits what we can do a little bit with it.
- [00:09:10.756]So we turned to a good friend, foxtail millet,
- [00:09:13.855]and we mapped all the differentially expressed genes
- [00:09:16.271]to the foxtail millet genome.
- [00:09:18.674]And I think in this graph here the little red bars
- [00:09:22.331]and the blue bars show
- [00:09:25.409]where those differentially expressed genes
- [00:09:26.913]map to foxtail millet, and this shows the
- [00:09:29.399]nine chromosomes of foxtail millet there.
- [00:09:32.949]The length of the bars represent
- [00:09:34.817]the magnitude of change in gene expression
- [00:09:37.405]when we compare the control to the inoculated conditions.
- [00:09:40.691]And, admittedly this is a busy graph
- [00:09:43.012]and I don't expect you to take anything from this,
- [00:09:44.847]but basically we had a good distribution
- [00:09:48.029]of differentially expressed genes throughout
- [00:09:50.066]the foxtail millet genome.
- [00:09:52.869]And so if we just take a closer look
- [00:09:54.023]and look at chromosome eight, you might notice in that
- [00:09:57.021]six o'clock position there,
- [00:10:00.359]there are a handful of genes
- [00:10:01.991]that are turned on
- [00:10:04.022]in the two resistant lines shown in the red,
- [00:10:06.761]and then there are no genes in the same
- [00:10:08.700]position that are differentially expressed
- [00:10:10.511]in the susceptible lines, right, that are shown in the blue.
- [00:10:15.401]And so, to us that suggests that that's potentially
- [00:10:17.350]a genomic region of importance.
- [00:10:19.032]So even though we're using a Setaria italica,
- [00:10:22.573]this foxtail millet as a reference,
- [00:10:24.801]not buffalo grass, it gives us some insights
- [00:10:26.927]on genomic regions of potential importance.
- [00:10:29.644]So that's one of the ways that we're using
- [00:10:31.511]some of this data.
- [00:10:32.963]Another way is to do a traditional differential
- [00:10:35.529]gene expression type study, and so again I mentioned
- [00:10:38.001]that there are a hundred and I don't remember
- [00:10:39.649]how many thousand genes that we're looking at
- [00:10:43.005]under these conditions.
- [00:10:44.770]And in this graph we're looking at
- [00:10:46.744]the changes in expression, or comparing
- [00:10:49.425]the resistant lines to the susceptible lines.
- [00:10:52.407]So, Prestige on the bottom is
- [00:10:56.522]sensitive to leaf spot disease, so that's the
- [00:10:58.620]horizontal axis, anything on the right
- [00:11:00.337]is considered Up regulated in response to the pathogen.
- [00:11:03.863]And then, anything on the left is Down regulated
- [00:11:06.405]in response to the pathogen.
- [00:11:08.243]And then, 95-55 on the vertical axis is our
- [00:11:11.865]resistant line, so anything above that midpoint line
- [00:11:15.882]is being induced in response to the pathogen.
- [00:11:19.527]And, so you can imagine
- [00:11:25.589]a diagonal line there,
- [00:11:26.968]that's essentially showing that most of the genes
- [00:11:30.753]are behaving the same in both
- [00:11:32.399]the resistant and susceptible lines.
- [00:11:33.852]But the ones that really caught our eye
- [00:11:35.489]and that were most interesting to us are these ones
- [00:11:37.300]that are shown in that red circle there that were
- [00:11:39.703]Up regulated in response to the pathogen
- [00:11:42.164]in the resistant line, 95-55, and Down regulated
- [00:11:46.007]in Prestige, and so, we developed some
- [00:11:48.351]genetic markers.
- [00:11:49.431]We took a closer look and looked at the expression
- [00:11:51.556]of those, did kind of a typical heat map,
- [00:11:54.110]and so this is representing those 536 genes
- [00:11:56.954]in that little red circle and looking at
- [00:11:59.230]how they're expressed.
- [00:12:01.029]The red indicates that they're Down regulated,
- [00:12:02.874]green indicates that they're Up regulated.
- [00:12:05.172]And you can see that a lot of these genes
- [00:12:06.705]are Up regulated in the resistant lines,
- [00:12:08.597]Down regulated in the sensitive lines.
- [00:12:10.536]So that's kind of what we saw.
- [00:12:12.297]Doing these kinds of studies
- [00:12:14.157]and taking this kind of approach,
- [00:12:15.435]it gives us a lot of information
- [00:12:17.048]in buffalograss, and again we had
- [00:12:19.881]no up front information on buffalograss,
- [00:12:22.226]so in our eyes, using some of these modern
- [00:12:25.627]sequencing tools, it's pretty powerful.
- [00:12:27.845]So, again, can gain a lot of information.
- [00:12:32.829]Just kind of quickly here, we have several other
- [00:12:34.903]projects going on, looking at different traits
- [00:12:37.666]in buffalograss, we're looking at gender expression,
- [00:12:39.721]buffalograss is a diecious species,
- [00:12:41.578]so we're looking at
- [00:12:43.145]comparing male buffalograss to female buffalograss
- [00:12:45.700]and finding genes important for
- [00:12:47.942]gender expression.
- [00:12:49.242]We're looking at leaf spot disease resistance,
- [00:12:51.586]like I just showed, chinch bug resistance.
- [00:12:53.620]Just initiating a new project now where we're
- [00:12:56.184]looking at mechanisms of seed dormancy
- [00:12:58.586]and using some of these technologies to do that.
- [00:13:02.035]And then, again, I'm managing a buffalograss
- [00:13:04.972]breeding program, and so
- [00:13:07.177]my primary interest is in using all these tools
- [00:13:10.556]to develop genetic markers or learn some type
- [00:13:14.259]of information that we can apply to
- [00:13:16.569]improve the efficiency of our breeding program
- [00:13:18.613]to select, basically, either germplasm
- [00:13:22.931]that has new sources of resistance
- [00:13:25.217]or develop markers that we can track
- [00:13:27.969]resistance through our breeding population.
- [00:13:29.792]So that's kind of the idea.
- [00:13:31.196]Here's an example where we found some markers
- [00:13:33.368]that could do that.
- [00:13:34.633]These are gene expression based markers
- [00:13:36.189]that we used.
- [00:13:37.895]And again, the idea is then to be able to track those
- [00:13:41.134]and ultimately develop new buffalo grasses.
- [00:13:43.316]That's kind of the idea.
- [00:13:44.872]I promised Roch I would talk quick.
- [00:13:48.147]Try and get us back on track,
- [00:13:49.017]so that's all I have, so thanks.
- [00:13:50.294](applause)
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/4993?format=iframe&autoplay=0" title="Video Player: Bioinformatics" allowfullscreen ></iframe> </div>
Comments
0 Comments