How Can Transferable Biology and Breeding Contribute to Improving Food Systems and Climate Change?
Edward Buckler - USDA-ARS at Cornell University
Author
04/01/2021
Added
39
Plays
Description
The demands of food production, fuels, nutrition, and climate change are going to require that thousands of species undergo genomic selection over the next two decades. We are using machine learning and statistical models of chromatin structure, regulatory grammar, cis-expression, protein stability, and deleterious mutations to improve transferable genome wide predictions.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.810]The following presentation,
- [00:00:02.260]is part of the Agronomy and Horticulture Seminar Series,
- [00:00:05.850]at the University of Nebraska Lincoln.
- [00:00:09.602]So welcome everyone to the UN department
- [00:00:12.460]of agronomy and article to a seminar series.
- [00:00:15.170]I'm Marc Libault, as a professor in the department.
- [00:00:19.260]So to that, I'm very happy to introduce
- [00:00:21.140]Doctor Ed Buckler, who is a receptionist at USDA-ARS.
- [00:00:27.120]But also an adjunct professor
- [00:00:28.640]in the Department of Plant Breeding and Genetics,
- [00:00:31.450]at Cornell University.
- [00:00:33.960]Ed has expertise in molecular evolution and archeology,
- [00:00:36.770]and his research group used genomics computational
- [00:00:39.880]and the field approaches to complex trends.
- [00:00:43.640]And with a goal to accelerate breeding,
- [00:00:46.030]of bios crop species,
- [00:00:47.190]including maize sorghum and cassava.
- [00:00:50.940]It's always those kinds of technologies.
- [00:00:52.570]And now which has been applied to over 2000 species
- [00:00:57.270]the class group focuses on exploring new ways to
- [00:01:00.860]re-engineer globally char production systems,
- [00:01:04.350]to maximize food production,
- [00:01:06.400]improve nutrition,
- [00:01:07.880]and also to enhance response of plants to climate change.
- [00:01:12.140]With the USDA-ARS,
- [00:01:13.860]is leading also informatics and genomic platform,
- [00:01:17.050]to help accelerating breeding,
- [00:01:18.590]for specialty crops and animals.
- [00:01:22.160]Its contribution to consent to genetics and genomics,
- [00:01:24.530]has been recognized,
- [00:01:25.420]with his election to the US National Academy of Sciences,
- [00:01:28.860]and also as a recipient of the inaugural NAS
- [00:01:34.051]Food and Agriculture Award.
- [00:01:35.470]So today Doctor Buckler,
- [00:01:36.310]is going to talk to us about his work,
- [00:01:38.590]to maximize food security,
- [00:01:40.520]and maximize plant response to climate change
- [00:01:43.670]is taught is on title.
- [00:01:45.240]How can transferable biology and breeding contribute
- [00:01:49.063]to improving food systems and climate change.
- [00:01:51.770]Well, thank you, Marc.
- [00:01:54.400]So I want to talk a little bit about
- [00:01:57.160]this whole area of ,
- [00:01:59.430]where can some of these areas
- [00:02:00.930]in transferable biology and breeding
- [00:02:02.250]really contribute to improving the food system,
- [00:02:04.500]and climate change.
- [00:02:06.280]And first mentioned that the basic science
- [00:02:10.010]we'll be talking about,
- [00:02:10.850]it's been supported by USDA,
- [00:02:12.087]and the National Science Foundation.
- [00:02:14.390]And then we've had larger collaborative projects,
- [00:02:17.120]funded by the Gates Foundation and USAID,
- [00:02:19.060]that have really crystallize some of the needs
- [00:02:22.130]on the breeding side.
- [00:02:24.620]So the way I think a breed is
- [00:02:26.270]really about the design,
- [00:02:27.670]and selection of organisms for human purposes.
- [00:02:30.570]And I think the next decade,
- [00:02:33.030]or actually next couple of decades,
- [00:02:34.630]this question about what we should be designing for,
- [00:02:38.760]is really key.
- [00:02:40.250]And I want to start out,
- [00:02:41.460]I talk with why I think we need to
- [00:02:44.210]think hard about what we're designing for.
- [00:02:47.230]And I think we have three really big drivers of change.
- [00:02:50.630]The first is I think there's a real revolution
- [00:02:53.170]in food technologies underway.
- [00:02:55.740]The second I think everybody realizes
- [00:02:58.050]there's climate change.
- [00:02:59.710]And then third,
- [00:03:00.830]we have a real nutrition problem.
- [00:03:02.930]and it's this three drivers that I think are likely
- [00:03:06.920]to fundamentally change
- [00:03:08.670]breeding in the next 20 years.
- [00:03:12.020]And I want to mention to folks about,
- [00:03:14.810]when we think about how fast these things can happen,
- [00:03:18.960]to remind people what happened with the uptake,
- [00:03:21.600]of hybrid maize.
- [00:03:22.670]And a couple of these other really big changes.
- [00:03:25.400]How fast has change happened?
- [00:03:27.520]Well, the change from
- [00:03:30.315]open pollinated varieties to hybrids
- [00:03:32.490]in Iowa occurred in about six years.
- [00:03:36.457]And the technology,
- [00:03:38.840]provided a 20% yield increase.
- [00:03:41.700]And so when change does happen,
- [00:03:43.750]it frequently happens really fast.
- [00:03:46.650]Now, sometimes in basic science,
- [00:03:49.210]we may have been out here in 1910,
- [00:03:51.530]what we're doing maybe out here in 1910
- [00:03:53.907]and it took 20 years before we could really come to market.
- [00:03:57.640]But once it comes to market,
- [00:03:59.830]if it's a really big advance,
- [00:04:01.670]the shifts can be fast.
- [00:04:03.990]Same thing happened with the horse,
- [00:04:08.120]the car replacing the horse.
- [00:04:09.510]Happened in about a 10 year shift.
- [00:04:11.990]Took a couple of decades
- [00:04:13.130]to invent a lot of the technology behind the car,
- [00:04:15.810]but that shift happened incredibly quickly.
- [00:04:18.747]And the same thing with the smartphone.
- [00:04:21.650]Essentially, it totally changed markets from you know
- [00:04:26.000]camera makers, taxis,
- [00:04:27.530]GPS, cell phone makers and everything,
- [00:04:30.980]with essential technology,
- [00:04:33.240]that went to market dominance,
- [00:04:35.890]in about nine years.
- [00:04:38.450]And so why do I raise these issues?
- [00:04:40.690]Because I think we could be seeing some really big changes.
- [00:04:45.510]In the next 20 years.
- [00:04:47.620]Here's one example,
- [00:04:48.920]of a group in terms called Rethink.
- [00:04:52.240]They're arguing that alternative protein,
- [00:04:54.970]and what they mean here are high quality proteins
- [00:04:57.730]that can you reduce by a fermentor,
- [00:05:00.168]they argue that,
- [00:05:02.120]essentially in a couple years from now,
- [00:05:06.220]that'll be down to $10 per kilogram.
- [00:05:08.910]And by 2035, there'll be $1 per kilogram.
- [00:05:11.720]And they see this as reducing the number of cows
- [00:05:15.280]in the US by 70%.
- [00:05:17.871]Is this realistic?
- [00:05:21.550]Are these things, potentially going to happen?
- [00:05:24.380]If you were to do a linear change right now,
- [00:05:28.370]we would be actually predicting,
- [00:05:29.730]an increased number in days.
- [00:05:32.010]So you know for example,
- [00:05:33.310]in this area,
- [00:05:34.970]I asked the question,
- [00:05:36.420]whether or not those basic numbers,
- [00:05:38.450]they were talking about.
- [00:05:39.640]In terms of the cost were realistic.
- [00:05:42.930]And I plotted out there essentially three numbers here
- [00:05:48.050]of where they think that,
- [00:05:49.190]these high quality alternative proteins
- [00:05:52.130]are going to go from about $40,
- [00:05:54.260]down to $1 over the next 15 years.
- [00:05:57.928]And I plotted where other proteins lie
- [00:06:01.069]and you can see the reason why alternative proteins
- [00:06:05.580]aren't displacing animal production,
- [00:06:08.790]is because they're more expensive.
- [00:06:11.290]But that's likely to change very quickly.
- [00:06:14.120]And as you look across these numbers,
- [00:06:19.330]I've been able to kind of check out that
- [00:06:21.960]$1 per kilogram number,
- [00:06:22.913]and ask the question.
- [00:06:24.230]Is that a realistic possibility?
- [00:06:26.820]And things like yeast extract,
- [00:06:29.340]and of course, soybean meal,
- [00:06:30.900]is already less expensive than this,
- [00:06:33.440]dried distillers grains are already less expensive,
- [00:06:35.980]and ammonia and starch,
- [00:06:37.180]the basic materials to make that precision protein,
- [00:06:41.500]are also already substantially less expensive.
- [00:06:45.110]And so I think it's totally plausible,
- [00:06:47.510]You could get to a precision protein,
- [00:06:49.690]costing $1 per kilogram.
- [00:06:52.600]If that happens, that's sevenfold cheaper
- [00:06:55.660]than the protein that's in milk,
- [00:06:58.600]and it's 30 fold cheaper
- [00:06:59.860]than the protein that's in ground beef.
- [00:07:03.620]So I don't expect at all,
- [00:07:05.990]a society to all change.
- [00:07:07.850]And I'm a meat eater
- [00:07:09.900]I like to have a steak,
- [00:07:11.760]but you can see in all sorts of other markets,
- [00:07:15.400]that there's a shift when technology allows
- [00:07:19.271]to is allowing a tech,
- [00:07:21.400]something that is this much cheaper
- [00:07:23.400]than the current technologies.
- [00:07:26.300]This could have massive changes on ecosystem.
- [00:07:29.050]What we're using are amazing soybean land uses,
- [00:07:33.494]where essentially there's potentially a free enough
- [00:07:37.780]of nearly a hundred million acres for other services.
- [00:07:41.950]And if you think I know there's some really
- [00:07:44.340]great wheat breeders in the community here
- [00:07:46.700]if you were to start putting your wheat
- [00:07:48.350]on the best may soybean land in the country,
- [00:07:53.760]those wheat varieties,
- [00:07:54.640]could essentially produce the wheat
- [00:07:56.430]for 45% of the globe.
- [00:08:00.360]And essentially it caused disruptions
- [00:08:02.440]around the rest of the globe.
- [00:08:05.530]The second big driver,
- [00:08:07.490]of course is climate change.
- [00:08:09.450]And a lot of dealing with climate change
- [00:08:11.990]has to do with of course changing
- [00:08:13.810]our energy production systems.
- [00:08:17.180]And that conventional abatement,
- [00:08:18.940]but almost every thing realizes that we're also
- [00:08:22.600]going to need to have ways to do
- [00:08:25.440]negative greenhouse gas emissions
- [00:08:27.930]and plants and those amount of about 20 gigatons
- [00:08:32.540]of CO2 per year.
- [00:08:34.320]And plants are pretty good candidate
- [00:08:37.420]for pulling out a lot of that CO2.
- [00:08:40.417]They're already 80% of the biomass of the planet.
- [00:08:46.670]And so when we put those two elements together,
- [00:08:50.640]you start asking, well, what's the future?
- [00:08:54.280]Current may future may soy system looks like
- [00:08:58.100]something like this?
- [00:08:58.980]We're, we're growing at 180 million acres.
- [00:09:02.780]We're generating about $160 billion
- [00:09:05.650]of grain off of those producing these systems.
- [00:09:10.260]But at the end of the day, then those
- [00:09:12.840]all that grain and all that CO2 is being respired away.
- [00:09:16.380]And it's about 1.6 gigatons of CO2 per year.
- [00:09:23.660]Right now they estimate the societal cost
- [00:09:26.430]of CO2 is $51 per ton
- [00:09:29.220]and it's slightly rising.
- [00:09:33.260]So this is our current system.
- [00:09:34.730]And if you think about what 1.6 gigaton to CO2
- [00:09:38.090]just going back into the atmosphere,
- [00:09:39.970]is that's essentially $79 billion lost
- [00:09:44.170]to the atmosphere that we could be capturing.
- [00:09:50.530]And if you were to ask the question,
- [00:09:52.040]where would we be with yields 25 years from now?
- [00:09:55.830]And what those future products are likely to be?
- [00:09:59.330]And let's imagine we can,
- [00:10:02.184]I think where we will see definitely a lot more use
- [00:10:05.460]of that land for energy production
- [00:10:08.120]hydrogen precursors for plastics and everything else
- [00:10:12.380]maybe less used for animals.
- [00:10:14.920]And, but then the question also is how do you
- [00:10:18.090]use capture as much as that CO2 as possible?
- [00:10:21.930]If we figure out clever ways,
- [00:10:23.780]to reduce our greenhouse gas emissions on this land,
- [00:10:26.600]and capture a substantial portion of that CO2
- [00:10:29.590]let's say we can capture two thirds of it.
- [00:10:32.100]You're talking about an additional $127 billion of value,
- [00:10:37.140]added to the system.
- [00:10:40.020]And the third area where I think we're going to have lots
- [00:10:42.620]of change is in this area of calories
- [00:10:45.080]and nutrition in the last 50 years was really focused
- [00:10:49.830]on case Norman Borlaug getting enough calories
- [00:10:53.630]and protein into the hands of people.
- [00:10:55.860]And the unfortunate people with insufficient calories
- [00:10:58.510]and protein has dramatically dropped over the last 50 years.
- [00:11:03.540]But if you look at what, where we are right now
- [00:11:07.830]and as we population grows to nine or 10 billion people,
- [00:11:11.990]you notice that most of the people
- [00:11:13.820]on the planet have poor nutrition or unbalanced diets.
- [00:11:17.660]And I think that that's the third real big driver
- [00:11:21.300]we'll be having is how do we shift that?
- [00:11:25.210]I think that first group on insufficient calories
- [00:11:28.650]and protein, a lot of this is about alternative proteins
- [00:11:31.390]and adapted productive row crops for different environments.
- [00:11:36.580]But I think the other big wedge there is how do we drive
- [00:11:39.460]down in balanced diets across the US and across the globe.
- [00:11:44.150]And I know you all do a lot of thinking
- [00:11:46.860]about this also, but I think it's breeding
- [00:11:50.220]and producing accessible quality fruits and vegetables
- [00:11:53.500]and microbial foods to really address a lot of those issues.
- [00:11:57.340]So I think we have those three big drivers.
- [00:12:00.140]And if you want some more information
- [00:12:02.240]on some of the logic behind some of the arguments here
- [00:12:05.980]I pointed to my YouTube,
- [00:12:07.890]YouTube video on this
- [00:12:09.060]that we have that provides some more information.
- [00:12:12.580]But the take home from the brilliant perspective is,
- [00:12:15.016]we've done a really good job of breeding on about 10 species
- [00:12:19.980]and are using advanced technologies on those 10 species.
- [00:12:23.950]But with those three big drivers out there
- [00:12:26.730]I think we're shifting
- [00:12:27.563]to a world where we certainly need to do well
- [00:12:30.650]and continue doing well on those,
- [00:12:32.610]but we're going to be doing much more fermentation,
- [00:12:35.240]breeding intensively,
- [00:12:37.800]using advanced technologies that are fruits and vegetables,
- [00:12:40.550]fuels and polymers
- [00:12:41.750]carbon capture and assisted migration.
- [00:12:44.800]And so if you add up all the species
- [00:12:46.690]that you're starting to talk about there
- [00:12:47.793]just now say, we need to really
- [00:12:49.820]have accelerated breeding for thousands of species.
- [00:12:54.380]And so what a lot of the basic research
- [00:12:57.100]in my group that we're focused on right now,
- [00:12:59.540]is how do we build an infrastructure,
- [00:13:02.320]to have accelerated breeding,
- [00:13:04.360]for thousands of species?
- [00:13:07.110]The heart of modern breeding is essentially this cycle
- [00:13:11.710]of selection, crossing and evaluation,
- [00:13:14.650]and it works well.
- [00:13:17.960]In case of maize
- [00:13:18.793]you always see the plot we've increased yield
- [00:13:20.670]at eight fold.
- [00:13:22.000]When some of that is breeding.
- [00:13:24.080]Some of it is agronomy,
- [00:13:25.770]but this basic cycle really works.
- [00:13:28.420]That problem is of course that time cycle
- [00:13:30.550]of five to 10 years.
- [00:13:32.700]And the key engine
- [00:13:34.800]I think we need to use to make these improvements
- [00:13:38.500]across hundreds, if not thousands of species,
- [00:13:41.770]is genomic selection.
- [00:13:43.390]That is powered by genomic prediction.
- [00:13:46.700]Where essentially as soon as you make that selection
- [00:13:49.260]or that cross, you evaluate the genome
- [00:13:51.850]and you predict which ones get advanced.
- [00:13:54.546]That doesn't mean you have a new product,
- [00:13:56.960]every four months or a year,
- [00:13:58.100]but you're able to have the central engine
- [00:14:00.380]of selection and improvement
- [00:14:02.230]making the population better on that type of scale.
- [00:14:05.552]And with new approaches
- [00:14:08.480]in speed breeding and other ones
- [00:14:10.660]I think bring a lot of species can even perennials
- [00:14:13.960]and long live perennials,
- [00:14:16.320]can have their period of selection
- [00:14:19.510]that cycle time driven down to a year or two.
- [00:14:24.400]And so our challenge challenges
- [00:14:26.240]genomics selection works well
- [00:14:27.440]for these species plus valid dozen more,
- [00:14:31.200]but how do we really make it work well,
- [00:14:33.620]for all of the species?
- [00:14:39.010]So I think part of that is about building just
- [00:14:43.200]the technologies and getting them in the hands
- [00:14:46.090]of breeders who don't have large groups, you know
- [00:14:48.560]who don't have 5,000 people,
- [00:14:51.070]running a dyno genomic selection effort,
- [00:14:54.470]across the globe at one of the major seed companies.
- [00:14:57.820]And this is what we
- [00:14:59.530]in ARS and started with breeding insight.
- [00:15:02.060]And this is about essentially developing the genomic
- [00:15:05.050]and digital ecosystem for specialty species.
- [00:15:07.940]And so she is the director of this project
- [00:15:10.440]but essentially we're getting the genomic tools
- [00:15:13.090]and essentially informatics tools
- [00:15:15.410]into the hands of specialty breeders.
- [00:15:17.480]And this is an area where I think, you know, it's a big
- [00:15:20.744]and useful value add that there's a lot of commonality
- [00:15:23.620]between all these systems that we can really
- [00:15:25.700]make some useful progress.
- [00:15:28.460]And so those genomic prediction requires accurate
- [00:15:32.010]mathematical models,
- [00:15:33.740]in order to make this work.
- [00:15:35.510]And we know we can do this really well already,
- [00:15:38.120]if we have lots of observations.
- [00:15:41.150]The issue is if we have few observations,
- [00:15:43.160]how our accuracy can start dropping
- [00:15:46.171]and the question then is can we really learn
- [00:15:49.970]across bringing all those other types of information
- [00:15:53.340]the learning across species,
- [00:15:54.720]so we get high accuracy.
- [00:15:57.170]And put this in more concrete terms,
- [00:16:00.311]and if you look at genome-wide prediction,
- [00:16:02.680]you know thousands of papers have been published
- [00:16:04.550]on this in the last two decades.
- [00:16:07.810]And if you're working at a major crop
- [00:16:09.850]or animal life stalker,
- [00:16:12.300]and so on,
- [00:16:13.133]if you're doing population
- [00:16:14.680]within population prediction,
- [00:16:16.640]accuracy it can be quite hard. 30 to 90%.
- [00:16:21.306]You try to predict outside the population,
- [00:16:23.730]accuracy frequently dropped by
- [00:16:25.516]you know, can drop to zero, or at least by 50%.
- [00:16:30.940]And if you're working with a smaller species
- [00:16:32.730]you just have less data to work with.
- [00:16:34.317]And those prediction accuracies are lower.
- [00:16:37.200]And essentially there's no information that has been derived
- [00:16:40.090]from going from one species
- [00:16:41.470]into another species.
- [00:16:43.060]Because what do you know why
- [00:16:44.770]prediction is modeling,
- [00:16:45.960]is really modeling inheritance,
- [00:16:49.448]at a very it's the mechanism it's modeling,
- [00:16:51.943]it's really about modeling heritage really accurately
- [00:16:55.490]across various genomic regions.
- [00:16:58.320]And so I think the question that we
- [00:17:00.100]and many others are asking is how do we bring
- [00:17:02.340]we know a lot of these mechanisms,
- [00:17:05.530]Ken Addy mechanism,
- [00:17:08.550]molecular mechanisms to do, you know
- [00:17:11.050]my prediction models really give us a useful bump
- [00:17:14.260]and the type of increases in accuracy.
- [00:17:17.210]I'm envisioning if we did really well
- [00:17:19.480]we'd build improve these models by 20%.
- [00:17:22.920]So essentially for the free computational effort,
- [00:17:27.550]you've improved your models by 20%.
- [00:17:31.710]So why don't we do better already with
- [00:17:34.724]do you know, my prediction, other ones?
- [00:17:37.000]Well, the problem is that if you look
- [00:17:39.660]at the number of genetic variants that are
- [00:17:41.530]in a genome or in a breeding population
- [00:17:44.020]it can be on the order of tens of millions.
- [00:17:47.730]This is the example for our maze,
- [00:17:51.824]one of our maze population
- [00:17:53.170]we have about 80 million segregating genetic variance
- [00:17:56.310]and we only have about 5,000 observations to map with.
- [00:18:00.910]And so you have this massive NP problem.
- [00:18:03.400]We can do a good job of dealing with prediction,
- [00:18:05.890]but dissecting each one of those
- [00:18:07.300]down to their individual new cause of nucleotides
- [00:18:09.790]we just have too many parameters
- [00:18:11.970]for a number of observations.
- [00:18:14.480]And so the question we're asking
- [00:18:16.840]in the group a lot is can we learn more
- [00:18:19.200]by using some tall data versus Y data where
- [00:18:22.250]we have more observations than parameters?
- [00:18:25.070]And actually there's a lot of different situations.
- [00:18:27.210]Were we have farmer observations, then we have parameters
- [00:18:32.140]for example, evolutionary conservation.
- [00:18:34.410]If you're asking about how did a single base pair evolve
- [00:18:38.540]across evolutionary time, you can compare
- [00:18:40.480]that amino acid residue
- [00:18:41.760]to a thousand other species.
- [00:18:43.970]Or you chromatin structure.
- [00:18:45.740]You can model how to nucleus zone region,
- [00:18:51.270]of sequence,
- [00:18:52.510]what happens to it in terms of Chromatin structure
- [00:18:54.490]or RNA expression, same thing with transcription factors.
- [00:18:58.420]A lot of these areas we are able to
- [00:19:01.140]generate more data than we have parameters.
- [00:19:04.570]And so NSF sometimes talks about this
- [00:19:08.970]his learning the rules of life.
- [00:19:10.480]But what I would highlight this is
- [00:19:12.082]I think we actually know many of the rules of life.
- [00:19:15.730]I think it's really
- [00:19:16.563]about parameterizing the functions of life.
- [00:19:19.670]You know, how do we create mathematical models
- [00:19:22.200]that allow us to go through this whole process?
- [00:19:28.077]And you know, what I want to highlight here is
- [00:19:30.590]that what we are doing is really breaking
- [00:19:33.790]down this into essentially,
- [00:19:38.030]a few different elements.
- [00:19:40.870]One is the central thing we're trying to aim
- [00:19:43.880]towards is estimating protein activity.
- [00:19:47.410]And I kind of mean this in a fairly abstract concept
- [00:19:50.630]but being able to rank the different illegals
- [00:19:54.687]in different varieties, into their relative activity level.
- [00:20:00.288]And you can imagine protein activity,
- [00:20:04.570]is essentially the product of protein sequence
- [00:20:07.040]the product of gene regulation.
- [00:20:09.090]And if you think about for this problem
- [00:20:11.910]you know, given protein, there's only about on average
- [00:20:14.750]about 330 amino acids, we need to concern ourselves with,
- [00:20:19.050]on the protein sequence side,
- [00:20:20.950]and in gene regulation
- [00:20:22.040]there's probably a couple thousand base pairs
- [00:20:24.050]per gene that are important there.
- [00:20:26.320]So we're now not talking
- [00:20:28.130]about having to worry about 80 million things simultaneously
- [00:20:31.590]or at a smaller scale.
- [00:20:33.770]And then of course, once we, if we can break it down
- [00:20:36.710]to 30,000 proteins,
- [00:20:38.580]then we can start working through relating it
- [00:20:41.102]from that protein activity
- [00:20:43.320]to through the various levels of physiology to whole plant.
- [00:20:49.730]I want to first talk
- [00:20:50.563]about our modeling that we've kind of done and this area
- [00:20:53.260]of going through the protein sequence all the way
- [00:20:56.140]through physiology.
- [00:20:58.640]And the hypothesis we had here is
- [00:21:01.403]that essentially angiosperm evolutionary conserved
- [00:21:05.770]site should predict segregating deleterious mutations.
- [00:21:09.820]And so if something has been conserved across a wide range
- [00:21:12.570]of evolutionary time across flowering plants
- [00:21:15.340]and he gets broken and maize or sorghum,
- [00:21:20.260]then that's likely to produce the deleterious.
- [00:21:23.840]And so a few years ago, Eli Rogers Melnick and the group,
- [00:21:26.840]he lined maze up against all the other at the time, well
- [00:21:31.360]sequence genomes and found that about 10%
- [00:21:34.540]of the mace genome was alignable and constraint.
- [00:21:37.710]And then he started asking, well
- [00:21:39.710]what happens when base pair changes
- [00:21:41.850]and amino acid changes are occurring
- [00:21:43.800]at that portion of the genome?
- [00:21:45.810]Well, it turned out Angiosperm constraint,
- [00:21:49.010]did a good job of predicting
- [00:21:50.650]segregating deleterious mutations.
- [00:21:52.950]Some of the predictions that we tested there were
- [00:21:55.640]that they should be enriched in low recombination regions
- [00:21:58.810]of the genome because you aren't able to purge them.
- [00:22:01.350]That was supported.
- [00:22:02.890]An extreme version of low recombination is essentially
- [00:22:06.320]vegetative propagation will increase deleterious mutations.
- [00:22:09.940]And we saw that cassava breeding clones
- [00:22:12.040]where they're substantially vegetatively, propagated
- [00:22:15.470]massive issues having to do with build off
- [00:22:18.620]of deleterious mutations.
- [00:22:20.300]And I find those
- [00:22:23.260]in both maize hybrids
- [00:22:24.880]and sorghum deleterious mutations do explain
- [00:22:28.060]about half of the variants in the field.
- [00:22:31.100]So
- [00:22:33.020]they are,
- [00:22:35.290]as a group, even though each
- [00:22:36.800]of these deleterious mutations are rare together as a group
- [00:22:40.580]they probably explain about half
- [00:22:43.180]of a variance for a yield,
- [00:22:46.700]but Guillaume Ramstein
- [00:22:47.810]in the group really wanted to ask the question.
- [00:22:50.170]Yeah, does deleterious mutations together
- [00:22:52.220]explain a big problem
- [00:22:53.770]but can I use them to start increasing prediction accuracy
- [00:22:58.180]to make it much more actionable item?
- [00:23:01.800]And so we have large
- [00:23:03.780]by parental cross instead of hybrids and diverse hybrids.
- [00:23:08.450]And he did training validation in different directions here
- [00:23:13.040]and he created a series of models to test this out.
- [00:23:17.230]And so for traits like height,
- [00:23:18.900]adding these group scores some miles,
- [00:23:23.601]a small
- [00:23:26.150]increase in the accuracy.
- [00:23:29.160]And we've looked at both cassava and sorghum
- [00:23:33.020]we generally see this effect that yes
- [00:23:35.450]we get an increase in case of maize and yams work
- [00:23:38.580]it was significant,
- [00:23:40.770]but it was anywhere
- [00:23:42.220]from zero to 10% increase in accuracy, not
- [00:23:44.460]a really big game that we were really kind of hoping for
- [00:23:48.070]and some of these accuracies.
- [00:23:51.210]So what Guillaume did is,
- [00:23:54.820]he, again
- [00:23:55.740]hypothesis assuming that fitness effects are consistent
- [00:23:59.460]across land plant, and he took a bunch of genomic sequence
- [00:24:04.230]and he essentially
- [00:24:05.253]when you start using different annotations
- [00:24:07.220]whether protein ontology
- [00:24:08.710]various ways to predict amino acid disruption, and some
- [00:24:12.340]of the bald ones were the ones that really matter the most
- [00:24:14.660]in these models.
- [00:24:16.420]And what he did with that was different is
- [00:24:20.020]instead of essentially using evolutionary constraint
- [00:24:23.850]directly in these models to say,
- [00:24:27.160]this is a deleterious mutation.
- [00:24:28.610]He uses a machine learning model to
- [00:24:30.550]predict evolutionary constraint.
- [00:24:33.860]So using aspects of just sequence analysis
- [00:24:38.640]and training that model against evolution and constraint.
- [00:24:43.010]And so he gets out to
- [00:24:45.260]we have a couple of predictions that come from this.
- [00:24:47.810]So let me walk through this plot here.
- [00:24:50.600]The observed evolutionary constraint is the level
- [00:24:53.460]of constraint we see.
- [00:24:55.706]We observe when we align a bunch
- [00:24:57.740]of genomes against one another across the
- [00:25:00.910]the predicted is the line that he's predicted based
- [00:25:03.930]on this model that was trained based
- [00:25:05.320]on some of that data, we expect when we
- [00:25:08.630]none of this data had ever seen any of the intro
- [00:25:11.080]the variation within maize, but when you applied it
- [00:25:13.050]and what you see here is
- [00:25:15.460]that we expect deleterious mutations to be very rare.
- [00:25:19.500]And you see the predicted evolutionary constraint
- [00:25:23.820]does a better job of predicting rare
- [00:25:26.570]and deleterious mutations and the observed one does.
- [00:25:29.420]So this suggests that we,
- [00:25:31.260]these machine learning models are actually cleaning up a lot
- [00:25:35.060]of the mess of reading out what's going on in evolution.
- [00:25:40.160]And so then he took these types of models
- [00:25:43.100]and he started same type of training testing type of set
- [00:25:47.200]starts looking at grain yield.
- [00:25:49.320]And he starts looking at the most highly selected low side
- [00:25:52.570]the ones that look most deleterious.
- [00:25:54.250]And what he sees is that if you look
- [00:25:56.440]at the 0.1%, 1%, most severe deleterious mutation.
- [00:26:00.870]So now we're talking about a few hundred
- [00:26:03.900]of deleterious mutations in this 0.1% or few thousand
- [00:26:07.630]and the 1% tail we're starting to see some really
- [00:26:09.910]substantial increases in prediction accuracy.
- [00:26:14.120]So I think this is a route where we can start going
- [00:26:17.200]and making this even better by starting to pick these out.
- [00:26:23.000]The next area that I want to mention
- [00:26:25.010]that we've been doing a lot of work on is gene regulation.
- [00:26:28.500]And here the hypothesis
- [00:26:30.490]that we've been frequently been testing,
- [00:26:32.880]it started out
- [00:26:33.850]with observations that Jim Bursar made
- [00:26:37.210]over the last several decades
- [00:26:38.930]about how important balanced gene expression is
- [00:26:43.890]produces a healthy plant.
- [00:26:45.250]And if you have gene expression going all over the board
- [00:26:48.720]you essentially have all sorts of dosage issues.
- [00:26:53.200]And this can come from all sorts
- [00:26:55.830]of stock you measure things having to do with
- [00:26:57.580]to put putting together protein complexes
- [00:26:59.810]and metabolism and so on.
- [00:27:01.820]But essentially the healthy plant will be
- [00:27:06.510]if it has consistent middle
- [00:27:08.200]of the road expression everywhere.
- [00:27:11.500]And so a couple of years ago
- [00:27:12.900]we tested this by joining expression profiling
- [00:27:16.360]in 300 maize lines across seven different tissues
- [00:27:20.150]at first asked whether or not extremes allelic expression
- [00:27:24.400]what were they associated with?
- [00:27:26.170]Well, it turned up
- [00:27:27.390]that they were associated with upstream, rare alleles
- [00:27:31.770]and these rare alleles were likely deleterious alleles
- [00:27:35.590]based on population genetics,
- [00:27:38.310]Carl though then tried to take that type of data
- [00:27:42.390]on just overall dysregulation and relate that to yield.
- [00:27:46.190]And if you don't have any way to wait
- [00:27:48.570]on that ahead of time, that correlation it exists
- [00:27:52.140]but it's not particularly strong
- [00:27:53.730]and not really an actionable thing we can work on.
- [00:27:58.940]So the second way we looked at it was we looked
- [00:28:02.390]at those deviations and expression and ask, can we fit them
- [00:28:07.040]in a mixed model to use it, to predict seed weight?
- [00:28:09.950]And again, here we found that essentially those deviations
- [00:28:12.990]and expression did just explain about 20 to 30%
- [00:28:17.530]of seed weight.
- [00:28:18.870]So that suggests that we can
- [00:28:21.020]at a quantitative genetics level really model these
- [00:28:24.360]and those, that dosage model is likely to be really key
- [00:28:28.480]for these yield like traits.
- [00:28:31.690]Bashing song in the group
- [00:28:33.470]has recently done an analysis using some
- [00:28:35.920]of the same data
- [00:28:37.200]but then combining it with really careful alignment
- [00:28:40.100]across a range of different interval Guyanese species.
- [00:28:43.310]And he's finding that the conserved non-coding sequences
- [00:28:46.360]essentially these his
- [00:28:47.350]regulatory regions are highly explainable.
- [00:28:50.270]Whether they're transcription factor
- [00:28:51.700]binding sites or chromatin loop
- [00:28:53.227]and so on they were highly explainable.
- [00:28:56.090]So what he did is he was able to then use this type
- [00:29:01.050]of these well-characterized highly conserved regions to
- [00:29:05.970]ask is that what's producing those extreme phenotype
- [00:29:09.230]differences. And that is what we saw essentially
- [00:29:12.920]a low expression was being driven by disruptions
- [00:29:16.940]in these conserved non-coding sequences.
- [00:29:19.630]So we now know what some of those moved out from the effect
- [00:29:24.330]of deleterious mutation, testing proteins
- [00:29:26.490]to seeing the effects of deleterious mutations
- [00:29:29.540]in these regulatory regions.
- [00:29:32.080]And so Anju Giri and the group then said, okay
- [00:29:35.130]I want to try to take these these expression deviation
- [00:29:38.310]and make them more actionable for breeding.
- [00:29:41.240]And so the way she approached the problem is thinking
- [00:29:44.570]about it in terms of, well, we have measured expression
- [00:29:47.350]what is that a product of, well, it's a product
- [00:29:49.810]of cis expression, trans or cis effects
- [00:29:54.400]Trans regulators affecting it.
- [00:29:56.600]And of course,
- [00:29:58.190]all of the error and environmental impacts on it
- [00:30:02.640]in this experiment, we have treated environment
- [00:30:05.900]as essentially is error.
- [00:30:08.020]So obviously there's a lot more you could do in this area.
- [00:30:12.400]And she then partitioned using haplotypes.
- [00:30:16.810]She essentially partitioned out the genetic variant
- [00:30:19.410]from the measured expression.
- [00:30:21.190]You use the genome-wide haplotype relationship matrix to
- [00:30:23.610]get the trans effects.
- [00:30:25.380]And then from that was able to estimate the Cis effect
- [00:30:30.140]for each at the haplotype type level.
- [00:30:32.770]And so when we're talking haplotype here
- [00:30:34.557]we're kind of, we're talking of gene scale haplotype.
- [00:30:38.782]And the first thing she saw was
- [00:30:40.290]that essentially the Cis component was explaining
- [00:30:43.120]about a third of the variants
- [00:30:44.357]and trans was explaining the other two thirds.
- [00:30:48.620]Most of the genes had some heritable variation
- [00:30:51.530]but it wasn't necessarily a large dit one
- [00:30:53.500]but she was able to partition out that cis.
- [00:30:56.597]And that was pretty reasonable that, of course
- [00:30:58.560]transcription factors and everything else are very
- [00:31:02.050]big players in regulating the expression
- [00:31:04.550]of the given gene in a given time.
- [00:31:07.514]But then she asked the question whether or not
- [00:31:10.110]the Cis was more transferable than measured expression.
- [00:31:13.840]So this is measured expression between tissues
- [00:31:17.170]and you see that there's actually a whole bunch
- [00:31:18.890]of them that are pretty near zero.
- [00:31:21.300]There's absolutely expression that is correlated
- [00:31:23.980]across tissues, but there's a big group
- [00:31:27.560]of it that is pretty close to zero.
- [00:31:31.120]However, when you start breaking it
- [00:31:32.490]down into the just the cis haplotype effect
- [00:31:36.000]and you've removed the trans effect, then you see
- [00:31:38.200]that a much larger set of the expression between,
- [00:31:43.480]that cis component between tissues is much more consistent.
- [00:31:47.310]And I think this gave us hope that, Oh, well
- [00:31:51.210]if this is consistent
- [00:31:52.550]then maybe we can use this for genome-wide prediction.
- [00:31:57.740]And so that's what she did is she started
- [00:32:00.530]she set up, I think, careful with testing approach to see
- [00:32:06.250]whether or not we were doing better over baseline
- [00:32:08.930]haplotype structure
- [00:32:10.080]because the haplotype structure overall already
- [00:32:12.170]give you decent prediction accuracies, even
- [00:32:14.780]with random data layered on random expression lies
- [00:32:17.400]there on top of it.
- [00:32:18.720]And so we do everything over this haplotype baseline.
- [00:32:22.190]And once she was seeing was when she trained on a small set
- [00:32:25.370]tested on large set, it was getting a
- [00:32:28.357]3, 4, 5% increases in accuracy.
- [00:32:31.550]And then when she trained on a large set
- [00:32:33.710]like the name population with less diversity
- [00:32:36.020]and tested there, she was getting all the way
- [00:32:38.620]up to over 10% improvements in accuracy.
- [00:32:41.980]And this was from a wide range of different traits.
- [00:32:44.050]I think this was about 25 different traits.
- [00:32:46.130]She was testing these in.
- [00:32:47.670]So we're starting to see that
- [00:32:50.900]if you can partition out these and break it down
- [00:32:53.690]you can get some substantial improvements.
- [00:32:56.980]So we're at this stage where we're really trying to model
- [00:32:59.893]from the bottom up to meet with where we are elsewhere.
- [00:33:04.798]And, let's talk,
- [00:33:07.750]finish up here on
- [00:33:10.450]on how we're doing some of this modeling from the bottom up.
- [00:33:14.560]When you get down to the gene level
- [00:33:16.170]and we do a lot of, we've done a lot of machine learning
- [00:33:19.240]at the level of proteins.
- [00:33:20.600]And so on, early on, we were certainly making lots of errors
- [00:33:25.390]and creating models that were not transferable at all.
- [00:33:29.290]Hai Wang and Jacob Washburn
- [00:33:30.800]and the group figured out why that was.
- [00:33:32.583]And that's because we weren't
- [00:33:33.550]we were essentially memorizing evolution.
- [00:33:36.120]And so they came up with a couple
- [00:33:38.010]of approaches to set up training and test sets.
- [00:33:41.770]So the don't just memorize evolution
- [00:33:44.130]and you can actually start learning mechanism.
- [00:33:47.550]And so how do we go about that?
- [00:33:49.920]Well, one project in the group has been looking
- [00:33:53.040]at protein sequence of trying to go there
- [00:33:55.960]and try to make some estimates of activity.
- [00:33:59.310]And Sara Jensen a graduate student
- [00:34:01.680]in the group and her basic thought was, well, the biophysics
- [00:34:06.560]of how a protein is stable should be consistent in bacteria
- [00:34:11.340]in archaea all the way up to eukaryotes,
- [00:34:16.030]can she determine whether
- [00:34:18.350]or not our protein will be stable essentially
- [00:34:19.910]across 3 billion years of evolutionary time?
- [00:34:23.290]And so the first thing she needed to build was
- [00:34:25.920]essentially a way to estimate the temperature
- [00:34:28.780]of which bacteria was adapted to
- [00:34:31.850]she and Emre Cimen developed a machine learning model
- [00:34:35.580]that worked out quite good across a wide range of bacteria
- [00:34:40.277]and archaea that essentially just looking at the tRNA
- [00:34:43.964]and some aspect of the tr likely tRNA stability
- [00:34:48.364]and modeling that they're able to essentially
- [00:34:52.010]predict what any archaea
- [00:34:54.300]or bacteria is adapted to in terms of temperature.
- [00:34:58.380]Once she has those numbers
- [00:35:00.040]of what a given bacteria was adapted to
- [00:35:03.210]in terms of temperature
- [00:35:04.110]she can then take all the all bacterias PFAM domains
- [00:35:07.930]line them up and ask residue by residue, is that correlated
- [00:35:13.950]and associated with temperature adaptation.
- [00:35:17.550]And what she was finding was about 15%
- [00:35:19.670]of residues were associated
- [00:35:21.390]with temperature in Butler K in bacteria.
- [00:35:23.890]And there's a lot
- [00:35:24.770]of significant emergency shared sites there.
- [00:35:27.960]And a lot of what the basic properties
- [00:35:30.860]of what was being shared, made a lot of sense.
- [00:35:34.570]And essentially for each PFAM domain
- [00:35:37.090]you have a whole series of sites that you now
- [00:35:39.320]know that are temperature adapted under various conditions.
- [00:35:45.070]And so the question she's been working
- [00:35:46.946]on right now is whether
- [00:35:48.250]or not these GMOs conclusions can be extended to plants.
- [00:35:51.970]She's done a little bit of work
- [00:35:55.177]with groups,
- [00:35:56.010]working on memories
- [00:35:58.280]of Reuben as group and founder,
- [00:36:00.838]some of the predictions she saw
- [00:36:02.670]lined up really well with Mays.
- [00:36:04.420]And she's continuing to see some
- [00:36:05.860]of the same types of facts right now, amazing rabid offices.
- [00:36:10.670]So I think we have at least some evidence
- [00:36:13.680]that we can start making some of those connections.
- [00:36:17.050]The next area we've been working
- [00:36:18.360]on working from the bottom up is,
- [00:36:21.750]can we predict gene regulation directly
- [00:36:24.340]from sequence and without having to look,
- [00:36:27.840]measure the expression directly.
- [00:36:31.310]And so with all the ATAC seek
- [00:36:34.410]and M and S seek data that's been available
- [00:36:37.350]out there a couple of years ago, Katherine Mejia
- [00:36:39.897]and the group developed some models about whether
- [00:36:42.260]or not she could predict open chromatin accessibility.
- [00:36:45.550]And it was essentially getting some models that were
- [00:36:48.740]over 95% accurate.
- [00:36:53.610]Then working with Silin Zhong's group,
- [00:36:56.250]we get, I got a dataset with
- [00:36:57.810]over a hundred transcription factor, binding gypsy data.
- [00:37:01.970]And again,
- [00:37:03.680]they were able to make some very accurate models
- [00:37:06.013]from 80 to 95% accurate of where
- [00:37:08.700]various T astring modeling were
- [00:37:11.400]where they were binding.
- [00:37:13.270]But I think the most interesting thing
- [00:37:14.950]for me in the longterm out
- [00:37:16.450]of this paper was this phylogeny you see here
- [00:37:21.040]and I want to spend a little bit
- [00:37:22.080]of time explaining what's going on here.
- [00:37:25.650]Each node on here is essentially one transcription factor.
- [00:37:31.290]The data from one transcription
- [00:37:33.480]in factor in is gypsy data
- [00:37:37.230]Katherine then trained a machine learning model for it.
- [00:37:40.180]And it essentially what she's done here
- [00:37:43.290]then clustered all the machine learning models
- [00:37:45.710]against one another.
- [00:37:47.430]So this is really a phylogeny of machine learning models
- [00:37:53.970]and how similar they are to one another.
- [00:37:56.900]And what you see, and then the way it's colored though
- [00:38:00.770]is the transcription factor families that would be
- [00:38:05.550]recognized in Arabidopsis
- [00:38:08.170]And I want to highlight
- [00:38:09.120]that these models were trained in mace,
- [00:38:11.630]and they essentially are recapitulating
- [00:38:14.700]the known transcription factor families that are known
- [00:38:18.100]from Arabidopsis.
- [00:38:19.940]So we've been able to go
- [00:38:21.300]from machine learning in maize all
- [00:38:23.410]and essentially recapitulate knowledge and Arabidopsis.
- [00:38:27.050]And so I think this has shows over the 150 million years
- [00:38:30.650]that these two species have diverge
- [00:38:32.820]much of the basic mechanics of this gene regulation.
- [00:38:35.840]Really a large scale of share it isn't
- [00:38:38.050]just one transcription factor here.
- [00:38:40.210]There really is them involved that they're very similar.
- [00:38:44.820]And so Travis Wrightsman is taking this type of work
- [00:38:48.340]and then started to ask the question, can he,
- [00:38:51.990]train models in one species and apply them to another
- [00:38:55.280]and in case of taking a model and Arabidopsis
- [00:38:57.650]and applying it to maze, doesn't do as well
- [00:39:00.190]as going from Arabidopsis to Arabidopsis
- [00:39:03.010]but he found that going,
- [00:39:04.980]training both species
- [00:39:06.560]and applying to either one works just fine.
- [00:39:09.060]So I think that's what we're starting to see is
- [00:39:11.930]that we can start training models of all those species
- [00:39:14.760]and then applying them to everything else.
- [00:39:17.180]Similarly, he took some predictions
- [00:39:19.760]of overall expression that were trained in maize
- [00:39:22.910]applied it to Arabidopsis,
- [00:39:24.340]the models dropped by an accuracy by about 50%
- [00:39:28.210]but this is an area where we are pretty sure
- [00:39:31.210]that we now train these multiple species.
- [00:39:33.660]We can substantially increase that
- [00:39:35.160]and restore a lot of those that accuracy.
- [00:39:40.250]So I think when it comes
- [00:39:41.680]to chromatin expression translation and so on
- [00:39:44.540]we're pretty sure we can transfer a lot
- [00:39:46.640]of information across all angiosperms, animals and fungi.
- [00:39:50.810]I'm not sure yet we've tried a little bit so far.
- [00:39:53.300]It hasn't been very promising.
- [00:39:56.530]So what are our hypotheses now and transferability
- [00:40:01.910]so in terms of leveraging transferability
- [00:40:05.470]when it comes to protein sequence and protein,
- [00:40:08.410]I think alpha fall two
- [00:40:10.240]and the Google deep mind group have demonstrated,
- [00:40:15.110]the rules are pretty much the same over 3 billion years.
- [00:40:18.330]And Sarah Jensen's work has also shown that, yeah
- [00:40:22.260]we're seeing a lot of transferability of about protein just
- [00:40:25.930]over all of life.
- [00:40:28.680]When it comes to gene regulation.
- [00:40:29.940]I think we've got lots
- [00:40:30.920]of good evidence that we have transferability
- [00:40:33.470]over about 150 million year range.
- [00:40:38.302]When we think about deleterious mutations going from changes
- [00:40:41.317]in the amino acids all the way through physiology
- [00:40:44.190]because here we're predicting whole plant phenotype.
- [00:40:47.180]We're seeing some useful information and ranking
- [00:40:49.810]and transferability across about a 70 million year period.
- [00:40:54.780]When we go from the expression levels and some estimates
- [00:40:58.680]of protein activity to whole plant phenotype
- [00:41:01.700]right now we're seeing evidence
- [00:41:03.440]that it works across essentially within species
- [00:41:06.380]or which means really about thousands of years.
- [00:41:10.700]But I think the big question is
- [00:41:12.440]that I want to know is how much can we leverage to say
- [00:41:15.850]across all the grasses?
- [00:41:17.730]Is there
- [00:41:18.890]a lot of useful,
- [00:41:21.620]constraint of these networks that we can move
- [00:41:24.040]across all of the grasses to make useful prediction?
- [00:41:28.580]So where are we right now?
- [00:41:30.600]Well, I think we're at a stage in genomics.
- [00:41:33.100]We're going to be able to sequence a genome
- [00:41:34.710]for about $2,500 a genome and earth bio genome
- [00:41:38.980]wants to you actually do this
- [00:41:40.330]for nearly a million species, but functionally
- [00:41:42.970]prioritizing a single species costs anywhere
- [00:41:46.440]from five to a hundred billion dollars.
- [00:41:48.950]And so I think this whole question of transferability
- [00:41:51.390]and building models that are transferrable is really key
- [00:41:55.090]for all of plant breeding and breeding in general.
- [00:42:00.880]So when people start thinking about these questions
- [00:42:04.880]I talked the second half of this talk a lot
- [00:42:07.610]about advanced ways we could do this a lot better
- [00:42:10.360]but I want to highlight and just remind people if you had
- [00:42:13.450]or driving a breeding program today
- [00:42:16.030]just do genomic selection.
- [00:42:17.840]It's relatively you can get it off the ground.
- [00:42:22.770]And as long as you have high quality phenotypic data
- [00:42:25.010]and a few months, or six or six or nine months,
- [00:42:30.640]you can make progress.
- [00:42:32.350]And then we can think about adding in these other components
- [00:42:36.714]to really leverage a lot more information, to hopefully
- [00:42:39.510]get some improvements on those prediction accuracies.
- [00:42:43.300]So I think we, we went through the 20th century where
- [00:42:46.950]success in breeding looked a lot like this careful
- [00:42:50.660]note-taking high quality experimental design
- [00:42:53.990]and thoughtful breeding approaches.
- [00:42:56.870]But I think of course, our 21st century success where
- [00:42:59.270]we have even more incredible challenges or really
- [00:43:03.580]require the integration of lots of different communities
- [00:43:07.370]working together for this all off.
- [00:43:10.380]But I think we can do it.
- [00:43:11.570]So I'll end there and just highlight, I've mentioned a lot
- [00:43:15.440]of the postdocs and graduate students who did the work
- [00:43:17.350]I'm discussing today, but I want to mention my group leader
- [00:43:21.260]Sarah Miller, Cinta Romay and Peter Bradbury
- [00:43:23.970]who are really critical for managing the group leading
- [00:43:27.570]and collaborating with all the great collaborators we have.
- [00:43:30.740]So I'd be happy to take some questions
- [00:43:33.266]I have already one paused question
- [00:43:34.290]so from our own Alison, how would you address
- [00:43:37.180]to the production of food crops such as bananas
- [00:43:39.690]and or tomatos extremely dependent
- [00:43:43.190]on vegetative groaning, especially if they are incapable
- [00:43:45.700]of producing sexually?
- [00:43:47.960]Second question are best included
- [00:43:50.150]within these fitness models.
- [00:43:54.180]So, I mean, I think
- [00:43:56.420]if you can't do can't go through my OC's, then yes.
- [00:44:01.173]I think you have to do it
- [00:44:02.590]through editing and other technologies.
- [00:44:09.120]I think obviously I think there's a lot
- [00:44:12.230]we can editing technologies
- [00:44:15.860]can allow us to deal
- [00:44:18.010]with things like disease and some
- [00:44:20.240]some specialty traits fairly quickly
- [00:44:22.570]for getting rid of deleterious mutations.
- [00:44:25.740]When a lot of these genomes have hundreds
- [00:44:28.930]if not thousands of deleterious mutations that build
- [00:44:31.060]up through vegetative propagation over time
- [00:44:33.660]and maybe editing gets good as in
- [00:44:35.849]editing will get a lot better
- [00:44:37.460]If it gets more high-throughput
- [00:44:38.610]then you can start doing that
- [00:44:39.590]but I think you can also,
- [00:44:41.480]in some of these species have, may be worthwhile going back
- [00:44:45.230]and making it, where you had a good diploid
- [00:44:49.630]and essentially just breeding very quickly
- [00:44:52.410]on that diploid and essentially making a new crop.
- [00:44:55.250]So I'm not, I don't know enough
- [00:44:57.590]about banana to make real good recommendations on that
- [00:45:02.840]but I definitely seen in cassava,
- [00:45:06.000]genomic selection, we're dealing with issues.
- [00:45:09.920]It's, it's not, it doesn't produce seed very well
- [00:45:16.393]so vegetary propagate, but we're making progress on
- [00:45:19.360]it's just something you slept for
- [00:45:22.748]and you can make progress on that too.
- [00:45:24.100]So I think,
- [00:45:27.550]hasn't been a system I haven't seen where
- [00:45:29.500]we can't use some genomic selection well.
- [00:45:33.228]Other questions from Suzanne,
- [00:45:35.250]what are your trans thoughts
- [00:45:36.700]on the derivation of actual houses given the work
- [00:45:40.305]on military use mutation identification?
- [00:45:43.930]Yeah, so, you know, I, I think, you know, everything
- [00:45:48.890]we kind of see,
- [00:45:51.395]the vast majority
- [00:45:55.236]of heterosis is coming from deleterious mutations.
- [00:46:00.530]And now that doesn't mean it's necessarily
- [00:46:02.700]exactly a dominance or pseudo over dominance model.
- [00:46:09.210]I think there's a wide range of dosage
- [00:46:12.307]natures of dosage that are really critical.
- [00:46:16.360]And so I think the models we see
- [00:46:18.290]on these there's each protein, each protein complex
- [00:46:23.890]require and metabolite requires a different balance there.
- [00:46:27.690]And so some of those,
- [00:46:29.480]from a high level look like dominance
- [00:46:31.530]some of them look like other types
- [00:46:33.870]of dosage relationships and so I think
- [00:46:38.230]but most of the deleterious mutation, or,
- [00:46:43.300]I think heterosis is pretty much all, well, not all
- [00:46:47.970]let's say 85% if I had to guess that doctors' mutations.
- [00:46:55.566]So we have a quick question from, from Vicus with
- [00:46:58.370]with some introduction, but biggest question is
- [00:47:01.380]can you please share your thoughts on having initiatives
- [00:47:04.310]like reading insight within each university, is this needed
- [00:47:09.270]and is it possible given all the practical constraints?
- [00:47:14.290]So I don't know that we
- [00:47:16.413]I think it's really helpful to have
- [00:47:23.320]so the way Brittany insight is kind of structured
- [00:47:26.710]there's a, a software development team in the
- [00:47:32.290]but there are collaborating also with,
- [00:47:37.031]the cassava base team and other open source platform
- [00:47:41.260]teams to create those resources.
- [00:47:45.102]And, and then I do think then on the genomics
- [00:47:48.193]and science side, we essentially are hiring one coordinator
- [00:47:53.660]for about every four species.
- [00:47:56.370]And, and that type of thing, I think,
- [00:48:01.110]is something that we don't certainly don't
- [00:48:03.190]need to replicate the bioinformatics everywhere.
- [00:48:06.699]And I think it is probably useful to have one
- [00:48:09.140]person kind of helping out a team all
- [00:48:13.270]across the country for each university or so on.
- [00:48:16.280]And then it probably is something like
- [00:48:18.840]one coordinator for every four or five specialty species
- [00:48:24.370]is probably about the right type level of support
- [00:48:28.380]is necessary to get a lot of the smaller programs
- [00:48:32.986]are running and supported
- [00:48:34.110]because especially crop readers are doing just
- [00:48:36.650]a tremendous range of things
- [00:48:38.570]from the figuring out the market,
- [00:48:43.300]the economics
- [00:48:44.660]and having to know all the biology
- [00:48:46.350]of their species and everything else.
- [00:48:49.750]And I think if they can help
- [00:48:51.780]off help them offload a little bit of the genomics
- [00:48:54.850]and the mechanics of doing genomic selection
- [00:48:58.150]they can focus their efforts, what they want do best
- [00:49:03.170]because I think it's, I'm always
- [00:49:05.610]in awe of what these bridgers are doing, specialty crops
- [00:49:09.950]just an amazing range of things.
- [00:49:11.490]And so how do we,
- [00:49:14.710]do what can be common across a lot of species
- [00:49:18.010]get that into kind of like a core facility type of thing
- [00:49:21.190]and let them focus on it.
- [00:49:23.530]They don't need to go and to have sequence DNA.
- [00:49:28.290]We have a question from Phil McLean.
- [00:49:30.500]Do you have any thoughts
- [00:49:31.500]on competition for sips trait among, among enzymes
- [00:49:34.910]in a password and how has that will affect phenotype?
- [00:49:38.650]How long might the transferability be stable?
- [00:49:41.820]Right. So
- [00:49:45.800]So Phil I'll say one thing.
- [00:49:50.070]So the answer is I have not thought a lot about this.
- [00:49:54.230]One thing we're trying to think about these problems
- [00:49:59.380]at a higher level at is rather than just thinking
- [00:50:02.660]about proteins or dominance or dosage myth.
- [00:50:05.840]We think about all of those really now
- [00:50:08.090]at the level of the other group.
- [00:50:10.950]And I think
- [00:50:12.720]one of the key things
- [00:50:14.310]that we really important is once things
- [00:50:16.470]like alpha fold to come out, we'll be able to tell whether
- [00:50:20.550]or not to orthologs in paralogs are likely to
- [00:50:26.270]have similar substrate specificity.
- [00:50:31.110]And I think that will then allow us to start dividing
- [00:50:34.280]up proteins and those proteins that are pretty
- [00:50:37.500]much just differentiation.
- [00:50:39.200]Don't the biochemistry they do is the same
- [00:50:42.150]but it's mostly just differential expression.
- [00:50:44.450]And versus those that are actually
- [00:50:46.707]catalyzed different biochemistry.
- [00:50:50.276]But I think this is where things like alpha fall too
- [00:50:55.214]just, you know, and that general area
- [00:50:56.400]whether or not that's the right tool, I don't,
- [00:50:59.300]eventually will be the winner tool.
- [00:51:00.530]I don't know.
- [00:51:01.363]But I think the fact that that's possible is, yeah
- [00:51:05.370]I think that's the single most important science other
- [00:51:08.100]than coming in with COVID vaccine last year, that happens
- [00:51:11.060]whereas the alpha fall too.
- [00:51:16.495]Lots of questions.
- [00:51:17.510]So when your examine the interface
- [00:51:21.660]of cell biochemistry and plant anatomic
- [00:51:23.900]over evolutionary timeframe, do you find the seminar
- [00:51:26.930]of genomic prediction alignment
- [00:51:31.580]Cell biochemistry.
- [00:51:33.980]And plant anatomy.
- [00:51:40.520]And physiology.
- [00:51:43.810]So I
- [00:51:46.687]I think there's going to be if we break these things
- [00:51:49.810]Down correctly,
- [00:51:50.760]I think we can find lots of transferability
- [00:51:54.350]but I think we do
- [00:51:55.810]we need to get specialists who really understand that level
- [00:52:00.920]of biology to help us create the models.
- [00:52:04.900]Okay. So I think when I look at,
- [00:52:06.900]some of the crops growth models
- [00:52:13.750]The, the basic model for maize and sorghum
- [00:52:16.430]isn't that is pretty much the same.
- [00:52:18.660]It's just a different set of parameters on it.
- [00:52:22.290]And I think, you know, how do we figure
- [00:52:25.390]out the key elements of these areas and,
- [00:52:28.740]so they can be generalized
- [00:52:33.387]I think that's, and, and as we, I
- [00:52:36.580]I just don't know
- [00:52:39.150]at that level of scale
- [00:52:40.150]I'm not as good a physiologist
- [00:52:42.130]as I am some of these other things.
- [00:52:43.310]So I think that's where other people really need to
- [00:52:45.840]take the lead in it and have much more insight than I do.
- [00:52:51.007]Well, did you mentioned to me that we would like to focus
- [00:52:54.045]on the C4 versus C3 photosynthesis processes when referring
- [00:52:57.260]to his question.
- [00:52:59.610]Okay.
- [00:53:01.140]Yeah, no, I think, well, I think as we also learn more
- [00:53:04.630]about C4 plants, we see that there a lot more variation
- [00:53:08.700]even within a single individual between being C3 or C4.
- [00:53:12.710]So I think we should try to generalize some
- [00:53:16.810]of our models so that they do fit together.
- [00:53:20.524]And one of the things I've been really impressed
- [00:53:23.630]with in the last couple of years is,
- [00:53:27.420]these deep learning networks, but combining reasoning
- [00:53:30.930]with them and reasoning is really
- [00:53:33.030]about adding mechanism to deep learning models.
- [00:53:37.130]And I think we really need to start thinking
- [00:53:39.460]about what's in some of these neat crop growth models
- [00:53:44.650]and others types of models
- [00:53:46.900]and adding reasoning to the deep learning frameworks.
- [00:53:53.320]They're much more efficient.
- [00:53:55.410]You know, deep learning is much more efficient
- [00:53:57.000]if you've added reasoning to it,
- [00:53:59.600]like a hundred a thousand fold, more efficient
- [00:54:03.940]A few more questions, couple more.
- [00:54:06.190]Do you think that the right expression by adults
- [00:54:11.030]in trans are also digital use to crop fitness?
- [00:54:16.010]I mean
- [00:54:16.843]essentially everything that I think the reason why
- [00:54:19.110]this is working at the level of is right now is
- [00:54:21.400]that all heritable trans what cis somewhere.
- [00:54:25.526]Okay.
- [00:54:27.700]So you follow the trans network back at some point
- [00:54:33.090]it was either caused by a different amino acid change
- [00:54:38.160]or difference in regulations.
- [00:54:41.344]And then one more question is all we'll use
- [00:54:42.910]stricture models to help define it.
- [00:54:45.690]The erotic pools
- [00:54:47.730]That was, I mean
- [00:54:49.030]essentially the a maize had erotic pools were
- [00:54:55.714]done in such a way that their deleterious mutations
- [00:54:59.840]got distributed in a
- [00:55:01.380]in a way that they fully complemented.
- [00:55:04.640]So what would happen there was that you would,
- [00:55:08.480]frequently get one set of deleterious mutations fixing
- [00:55:13.300]in one head erotic group, and then you had a range
- [00:55:15.940]of other appetites on the other ones
- [00:55:18.000]of deleterious mutations, as long as they weren't the same.
- [00:55:21.500]You essentially had general combining ability.
- [00:55:24.760]And
- [00:55:27.190]I think we can now start thinking about these models
- [00:55:30.620]in terms of really, I've got a set of deleterious mutations.
- [00:55:35.810]Can I set up a brilliant structure that will put more
- [00:55:40.810]of those on two opposite sides?
- [00:55:43.280]And I think you can probably make some progress on that,
- [00:55:47.790]but again,
- [00:55:53.560]you can also just drive this pretty quickly
- [00:55:55.690]but generally just randomly choose two groups
- [00:55:57.730]make them and move fast with genomic selection.
- [00:56:00.630]So in certain regards
- [00:56:04.950]depending on the resources that you have in your group,
- [00:56:10.830]I would go GS, if it's small, I would go GS first
- [00:56:15.370]if you're a Cartiva then yeah.
- [00:56:17.500]Where you should worry
- [00:56:18.333]about individual deleterious mutations
- [00:56:21.460]and how to purge them off.
- [00:56:25.230]I would like to ask you one more, maybe general questions.
- [00:56:27.680]So you provide information
- [00:56:28.740]on gene expression and chromatin accessibility
- [00:56:30.760]protein sequence and function, but
- [00:56:33.460]and how do you integrate all this information to
- [00:56:35.790]recreate a model which can allow
- [00:56:37.999]for an unseen genomic prediction?
- [00:56:41.690]So essentially when we do like expression levels
- [00:56:45.420]we have for each haplotype, we have a value
- [00:56:48.010]of what know relative value against all of the other alleles
- [00:56:52.580]and that matrix is then used for general life prediction.
- [00:56:58.100]So we have 30,000 genes.
- [00:57:00.360]We may have modeled.
- [00:57:01.193]We essentially have 30,000 values
- [00:57:05.870]at a one value at each gene.
- [00:57:09.210]And it for each for each individual
- [00:57:11.940]and we use that to connect
- [00:57:14.529]to whole plant phenotype
- [00:57:15.570]and deleterious mutations
- [00:57:17.280]It's kind of the same type of thing.
- [00:57:20.150]You have a a hundred or a thousand or ten thousand
- [00:57:23.010]of these deleterious mutations across the genome
- [00:57:26.030]that you're keeping track of and each of the individuals
- [00:57:28.550]and you're adding those together.
- [00:57:31.140]What we have not done is of course
- [00:57:33.565]fused those deleterious mutations
- [00:57:36.520]at a given protein necessarily with the expression.
- [00:57:41.210]Okay. We certainly can, we can put them
- [00:57:44.000]in the same matrix that we want, but I think,
- [00:57:47.657]we'd like to be a little bit more
- [00:57:51.910]sophisticated than that.
- [00:57:54.860]Although being unsophisticated I think is fine to do also.
- [00:57:58.952]So, it gets you that works for right now and do it.
- [00:58:03.336]Thank you very much.
- [00:58:04.830]Ed, I think it's time to two hands is necessary.
- [00:58:09.330]So thanks a lot for this, for your talk.
- [00:58:11.750]It was excellent.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/16153?format=iframe&autoplay=0" title="Video Player: How Can Transferable Biology and Breeding Contribute to Improving Food Systems and Climate Change? " allowfullscreen ></iframe> </div>
Comments
0 Comments