Integrating Testing Optimization Frameworks into Soybean Breeding at UNL
Arthur Bernardeli Soybean Breeding Research PhD. Candidate in Plant Breeding and Genetics Dept of Agronomy and Horticulture - University of Nebraska-Lincoln
Author
09/08/2025
Added
1
Plays
Description
This presentation highlights innovations to enhance soybean breeding efficiency at UNL. Topics include optimizing multi-environment trials through implementing genomic sparse testing designs, TPE and hub definitions, advancing drone-based phenotyping for maturity, and software development to streamline analytical pipelines. Together, these approaches strengthen decision-making, improve resource allocation, and accelerate genetic gain.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.720]The following presentation is
- [00:00:02.090]part of the Agronomy and Horticulture
- [00:00:04.650]Seminar Series
- [00:00:05.760]at the University of Nebraska-Lincoln.
- [00:00:08.940]Welcome to the first seminar of
- [00:00:11.250]the fall 2025
- [00:00:13.040]seminar series for Agronomy and
- [00:00:15.040]Horticulture. Thank you guys
- [00:00:17.080]for joining us. This semester
- [00:00:19.050]we got a
- [00:00:19.600]the committee has put together
- [00:00:21.330]an excellent lineup of speakers.
- [00:00:23.280]We really tried to focus on
- [00:00:25.840]speakers that we thought would
- [00:00:27.350]fit the theme of science as a
- [00:00:28.670]public value, understanding
- [00:00:30.190]that, you know, we are at a
- [00:00:32.000]public university with a prerogative
- [00:00:34.490]to serve the public and trying
- [00:00:36.450]to pick speakers that we
- [00:00:38.050]thought embodied this and
- [00:00:39.760]hopefully will reflect on this
- [00:00:41.780]theme and attitude through the
- [00:00:43.810]semester.
- [00:00:44.960]So the first one is a graduate
- [00:00:47.280]student from George Graff's
- [00:00:49.600]breeding program, Arthur Bernardeli.
- [00:00:52.920]And Arthur was previously from
- [00:00:56.080]Brazil.
- [00:00:57.220]He's from Brazil, not
- [00:00:58.180]previously from Brazil.
- [00:00:59.400]He's from Brazil, where he did
- [00:01:02.180]his bachelor's and master's at
- [00:01:04.960]the University of Federal de
- [00:01:07.530]Vascoa.
- [00:01:08.620]And he joined here in 2022,
- [00:01:11.910]first as a visiting scholar and
- [00:01:14.660]then stayed on to do a PhD with
- [00:01:17.100]Dr. Graff's program, where he's
- [00:01:20.350]been spending time really
- [00:01:22.680]optimizing the program for
- [00:01:25.120]getting it set up for data
- [00:01:27.360]optimization.
- [00:01:29.140]And so today he's going to
- [00:01:31.030]present on that work. And
- [00:01:33.180]please help me in welcoming
- [00:01:35.200]Arthur.
- [00:01:36.120]Thanks.
- [00:01:38.620]Thanks Dr. Rice for introducing
- [00:01:41.160]me and thanks for inviting me
- [00:01:43.260]to present part of my PhD
- [00:01:45.000]research and part of what I do
- [00:01:47.080]in our program here in the
- [00:01:48.940]first seminar of the semester.
- [00:01:51.620]And thanks to the audience here,
- [00:01:53.600]I see a pretty diverse group in
- [00:01:55.380]terms of scientific background.
- [00:01:57.620]So I really hope that you find
- [00:01:59.360]this seminar interesting and
- [00:02:01.190]that you have a lot of
- [00:02:02.420]questions to ask at the end of
- [00:02:04.160]the presentation.
- [00:02:05.620]So, I'm Arthur Bernardeli, PhD
- [00:02:07.650]candidate and research analyst
- [00:02:09.660]in the soybean breeding program
- [00:02:11.670]led by Dr. George Graf.
- [00:02:13.500]And today I'm going to present
- [00:02:15.300]a seminar entitled Integrating
- [00:02:17.260]Testing Optimization Frameworks
- [00:02:19.360]into Soybean Breeding at UNL.
- [00:02:21.460]So, I'm going to briefly cover
- [00:02:23.580]the outline of our soybean
- [00:02:25.400]breeding program and then go
- [00:02:27.420]through the specifics of the
- [00:02:29.360]enviromics, genomics, and phenomics
- [00:02:32.150]in our program.
- [00:02:34.020]EnviroMix is the use of high
- [00:02:35.660]throughput environmental data
- [00:02:37.650]in the context of plant
- [00:02:39.040]breeding. Genomics also
- [00:02:40.730]deploying genome-wide markers
- [00:02:42.690]in the context of plant
- [00:02:44.100]breeding as well. And phenomics
- [00:02:46.220]is the use of high throughput
- [00:02:47.960]phenotyping data from drone
- [00:02:49.720]photos also to be used in our
- [00:02:51.430]soybean breeding program.
- [00:02:54.580]And my objective here is that
- [00:02:56.310]at the end of the presentation,
- [00:02:58.210]you can picture a frame of how
- [00:03:00.080]those modern tools can be
- [00:03:01.840]integrated into a, not a
- [00:03:03.660]soybean breeding pipeline, but
- [00:03:05.990]any crop species breeding
- [00:03:07.850]pipeline.
- [00:03:08.860]So talking a little bit about
- [00:03:10.660]soybean breeding, but breeding
- [00:03:12.780]in general, by definition,
- [00:03:14.670]breeding is like an endless
- [00:03:16.410]cycle of creating variability
- [00:03:18.380]and selecting variability.
- [00:03:20.460]And eventually, what is
- [00:03:22.460]selected is released as a
- [00:03:24.500]cultivar or returned to the
- [00:03:26.690]crossing block to begin a new
- [00:03:29.440]cycle of breeding, of recombination.
- [00:03:32.500]And plant breeding is a pretty
- [00:03:35.150]diverse area in terms of
- [00:03:37.270]discipline.
- [00:03:38.720]It integrates statistics, agronomy,
- [00:03:41.150]plant pathology, genetics, and
- [00:03:43.190]all sorts of other areas of
- [00:03:44.760]science to help us to achieve
- [00:03:46.460]our breeding objectives.
- [00:03:49.320]In talking about breeding
- [00:03:50.860]objectives, there are common
- [00:03:52.690]breeding objectives across many
- [00:03:54.660]different crop species.
- [00:03:56.360]For example, let's take, for
- [00:03:57.830]example, wheat and soybeans.
- [00:03:59.480]Both of those two crops, in the
- [00:04:01.180]breeding perspective, the
- [00:04:02.700]objective is to improve yield,
- [00:04:04.440]seed composition traits, and
- [00:04:06.120]resilience to stresses.
- [00:04:07.700]Here at the soybean breeding
- [00:04:09.620]program at UNL, our main
- [00:04:11.300]objective is to improve and to
- [00:04:13.300]develop high-yielding varieties
- [00:04:15.620]for farmers and soybean
- [00:04:17.220]industry,
- [00:04:18.140]and also enhance our germplasm
- [00:04:20.400]for compositional quality for
- [00:04:22.660]high protein, high oil, and
- [00:04:24.560]carbohydrates,
- [00:04:25.520]and the balance between those
- [00:04:26.950]three, and protect our improved
- [00:04:29.120]germplasm and enhanced quality
- [00:04:31.260]germplasm
- [00:04:32.040]against soybean cyst nematode,
- [00:04:34.180]phytophthora, frog eye leaf
- [00:04:36.100]spot, soybean mosaic virus,
- [00:04:38.200]gall midge, iron deficiency,
- [00:04:40.020]chlorosis, and drought stress.
- [00:04:44.200]So just a quick outline of our
- [00:04:46.240]breeding program.
- [00:04:47.820]The first half of our breeding
- [00:04:49.880]program is what we call the
- [00:04:51.690]nursery stages,
- [00:04:52.840]where we create variability and
- [00:04:54.050]where we sample variability.
- [00:04:55.300]Those are real numbers from
- [00:04:57.030]2024, where we created 262 new
- [00:05:00.040]populations.
- [00:05:01.180]We advanced to 230 populations
- [00:05:03.610]from F3 through F4.
- [00:05:05.300]And we planted more than 26,000
- [00:05:08.610]prodding rows across Nebraska
- [00:05:10.940]and South America.
- [00:05:13.380]And those project rows, they're
- [00:05:15.820]selected and they go to those
- [00:05:17.910]yield trials where we plant
- [00:05:19.940]every year and select every
- [00:05:22.280]year more than 30,000 plots. We
- [00:05:25.420]plant and we harvest more than
- [00:05:26.980]30,000 plots in a multi-location
- [00:05:29.410]and multi-year testing
- [00:05:30.790]framework.
- [00:05:31.540]And we must have that multi-location
- [00:05:34.160]testing framework because we
- [00:05:36.300]want to replicate the growing
- [00:05:38.430]conditions where we want to
- [00:05:40.530]target our recommendations and
- [00:05:42.680]our selections.
- [00:05:44.380]And that second part of our
- [00:05:46.080]pipeline, the second half of
- [00:05:47.980]our pipeline, is where most or
- [00:05:49.920]all of my work here is focused
- [00:05:51.730]on optimizing those tests from
- [00:05:53.710]preliminary trials that are
- [00:05:55.530]selected to the three years of
- [00:05:57.490]advanced trials and also to the
- [00:05:59.400]regional trials within the last
- [00:06:01.700]two years of the advanced
- [00:06:03.120]trials.
- [00:06:05.620]So back to talking about the
- [00:06:07.690]nurseries, the winter nurseries,
- [00:06:10.480]we have nurseries here in the
- [00:06:12.640]Caribbean and in South America.
- [00:06:15.400]And having those in Caribbean
- [00:06:17.100]and South America allows us to
- [00:06:18.790]do year-round breeding where
- [00:06:20.480]during summer we have crosses F1,
- [00:06:22.560]F3, and F4 stages here in
- [00:06:24.390]Lincoln.
- [00:06:25.160]And during winter, we have
- [00:06:27.150]crossing blocks, crossing block
- [00:06:29.670]F1, generation F2, and F3 in
- [00:06:32.320]Puerto Rico, and progeny road
- [00:06:34.580]stages in Chile.
- [00:06:36.140]So we can advance up to five
- [00:06:37.690]different breeding stages in
- [00:06:39.470]those winter nurseries.
- [00:06:41.180]And that really speed up our
- [00:06:43.160]breeding process by a lot,
- [00:06:45.140]actually.
- [00:06:46.160]So talking about the second
- [00:06:48.110]half, the multi-location field
- [00:06:50.460]testing.
- [00:06:52.520]We run our field tests across
- [00:06:54.590]the whole Matrida Group 2 and
- [00:06:56.830]Matrida Group 3 area because we
- [00:06:59.290]really want to target our
- [00:07:01.510]recommendations for the whole
- [00:07:03.330]Matrida Group 2 and 3 zones for
- [00:07:05.830]soybeans.
- [00:07:06.960]And to do so, we have an
- [00:07:08.400]expanded trial network inside
- [00:07:10.400]Nebraska and also outside
- [00:07:12.100]Nebraska.
- [00:07:13.020]Inside Nebraska, we carry out
- [00:07:14.770]planting, harvesting, and
- [00:07:16.450]evaluating all those plots.
- [00:07:18.420]outside Nebraska, we hire
- [00:07:20.560]private testing services and we
- [00:07:23.090]are also part of a coordinated
- [00:07:25.410]cross-institutional trial
- [00:07:27.650]network between universities
- [00:07:30.100]and USDA. And since we have
- [00:07:32.370]like 20 environments that we've
- [00:07:35.050]been sampling since 2020, it's
- [00:07:38.060]more than likely that we have
- [00:07:40.110]lots of different climatic and
- [00:07:42.340]weather variations across the
- [00:07:44.600]whole Matrida Group 2 and Matrida
- [00:07:47.270]Group 2.
- [00:07:48.420]group three region. So there's
- [00:07:49.940]a lot of difference in
- [00:07:50.920]temperature that correlates
- [00:07:52.300]with the maturity
- [00:07:53.220]group zones and also with the
- [00:07:55.160]latitude. But more interesting
- [00:07:57.510]than that, we have lots of
- [00:07:59.310]variations
- [00:08:00.220]in rainfall in the whole
- [00:08:01.420]maturity group two and three
- [00:08:02.860]area. And we have more
- [00:08:04.060]differences in
- [00:08:04.920]precipitation inside Nebraska
- [00:08:06.160]than across the whole maturity
- [00:08:07.350]group two and three area.
- [00:08:08.460]And another interesting thing
- [00:08:10.750]that I like to mention is that
- [00:08:12.840]between Lincoln and Scottsbluff,
- [00:08:15.560]the difference in elevation is
- [00:08:17.260]more remarkable than between
- [00:08:18.860]New York and Lincoln.
- [00:08:20.200]So that also contributes and
- [00:08:22.140]plays a big role in those
- [00:08:23.800]differences in weather
- [00:08:25.390]conditions
- [00:08:26.280]that we're exposed to when
- [00:08:27.980]testing our genotypes.
- [00:08:29.740]So investigating those
- [00:08:31.730]locations and characterizing
- [00:08:34.300]those locations in terms of
- [00:08:36.590]weather
- [00:08:37.340]and see if we can cluster them
- [00:08:39.040]is relevant to see how our
- [00:08:40.510]testing pipeline can leverage
- [00:08:42.360]that type of information.
- [00:08:45.560]And to do so, I'm going to
- [00:08:47.420]answer three specific questions
- [00:08:49.900]here in this, what marks the
- [00:08:52.040]first part of my PhD research.
- [00:08:54.560]And it is, how can locations be
- [00:08:56.290]grouped? Which sites can serve
- [00:08:58.190]as testing hubs? And how does
- [00:08:59.930]that impact selection?
- [00:09:01.560]And I'm going to answer that
- [00:09:02.710]mostly with the enviromics, but
- [00:09:04.110]not only with the enviromics.
- [00:09:05.560]I'm also going to integrate
- [00:09:07.670]that with phenotypes, with genotype
- [00:09:10.610]by environment interaction
- [00:09:12.800]information, to make very
- [00:09:14.920]strong outcomes that can
- [00:09:16.850]contribute to the longevity of
- [00:09:19.230]our program.
- [00:09:20.560]So the first question: how can
- [00:09:22.700]locations be grouped?
- [00:09:24.560]We propose the hybrid
- [00:09:25.710]methodology. When I say hybrid,
- [00:09:27.610]it's because we merged two
- [00:09:29.010]different things into just one
- [00:09:30.660]framework.
- [00:09:32.560]So location clustering has been
- [00:09:35.320]a topic of research for several
- [00:09:37.980]years, but researchers, they
- [00:09:40.700]either cluster those locations
- [00:09:42.620]based on phenotypes only or
- [00:09:44.230]based on environmental data
- [00:09:45.930]only.
- [00:09:46.520]So we merged those two methodologies
- [00:09:48.770]because we think that there
- [00:09:50.440]might still be strong genotype
- [00:09:52.310]by environment interaction,
- [00:09:54.210]even in environments that are
- [00:09:56.030]correlated.
- [00:09:57.140]So we really think that's
- [00:09:58.450]relevant to merge those two
- [00:09:59.910]methodologies.
- [00:10:01.260]And to do so, we're using the
- [00:10:02.850]high-yield elite germplasm from
- [00:10:04.720]the soybean breeding program of
- [00:10:06.530]the University of Nebraska that's
- [00:10:08.530]been tested since 2020 until
- [00:10:10.340]2024, until last year.
- [00:10:12.100]In all those 20 locations, and
- [00:10:14.880]that totals more than 22,000
- [00:10:18.130]total plots that were tested
- [00:10:20.350]for more than 3,000 genotypes
- [00:10:23.510]across maturity groups 2 and 3.
- [00:10:27.500]In unique 37 environments, when
- [00:10:29.460]I say environments, it's the
- [00:10:30.980]combinations of locations and
- [00:10:32.570]years. And that totals more
- [00:10:34.360]than 15,000 unique genotype by
- [00:10:37.050]environment interaction values.
- [00:10:39.380]And that's the phenotype part.
- [00:10:41.410]And the environmental part, we
- [00:10:43.420]retrieve the data from NASA,
- [00:10:45.180]from weather variables, and
- [00:10:46.990]those variables, they are geotagged
- [00:10:49.300]and they represent the whole
- [00:10:51.060]growing season on a daily basis.
- [00:10:55.440]So we're using that platform
- [00:10:57.530]called nvrtype to retrieve data
- [00:10:59.860]from NASA by providing correct
- [00:11:02.040]coordinates, longitude,
- [00:11:03.530]latitude,
- [00:11:04.200]planting and harvest days and
- [00:11:06.220]location names. And that
- [00:11:08.080]platform can retrieve a pretty
- [00:11:10.190]big data set of 29
- [00:11:11.640]environmental variables for
- [00:11:13.690]each combination of location
- [00:11:15.780]and day and year. So it's a
- [00:11:17.560]pretty big
- [00:11:19.180]data set. But since we have 24
- [00:11:22.200]variables and those 24
- [00:11:24.530]variables, they come from
- [00:11:26.940]primarily four unique variables
- [00:11:30.260]that relate to those weather
- [00:11:33.090]conditions: precipitation,
- [00:11:36.220]radiation, wind, and
- [00:11:37.630]temperature. There's strong
- [00:11:39.330]collinearity and correlation
- [00:11:41.050]between traits,
- [00:11:42.060]between variables that we have
- [00:11:44.330]to handle. If we use the data
- [00:11:46.500]as it is, like the raw data,
- [00:11:48.700]it might lead us to wrong
- [00:11:49.810]assumptions. So we have to
- [00:11:51.140]investigate collinearity and
- [00:11:52.580]try to circumvent
- [00:11:53.500]that. For example, here we have
- [00:11:56.270]some variables that between
- [00:11:58.540]those climate features that are
- [00:12:01.180]not strongly correlated. Some
- [00:12:02.880]of those are very strongly
- [00:12:04.220]correlated. For example, here
- [00:12:06.010]we have
- [00:12:06.300]almost a perfect correlation
- [00:12:07.840]between total precipitation and
- [00:12:09.560]precipitation minus evapotranspiration
- [00:12:11.820]because total precipitation is
- [00:12:13.560]part of the formula that
- [00:12:14.860]estimates precipitation and evapotranspiration.
- [00:12:17.900]And that's something that we
- [00:12:19.870]have to circumvent.
- [00:12:21.480]And what I did to handle collinearity,
- [00:12:24.440]I plotted all the variables in
- [00:12:27.450]a PCA by plot
- [00:12:28.680]where I was able to group them
- [00:12:30.250]in five different groups.
- [00:12:31.880]And those groups are water
- [00:12:33.290]availability, climatic extremes,
- [00:12:35.420]thermal and radiation stress,
- [00:12:36.720]radiation and air dryness,
- [00:12:37.900]and physiological radiation.
- [00:12:39.340]And after that, I ranked those
- [00:12:40.750]variables
- [00:12:41.240]in terms of their importance
- [00:12:43.340]relative to the variable,
- [00:12:45.420]to our response variable, which
- [00:12:46.870]is grain yield.
- [00:12:47.900]And the ones that are more
- [00:12:49.370]important to grain yield is
- [00:12:50.980]precipitation and evapotranspiration.
- [00:12:53.570]Sorry, are those highlighted in
- [00:12:55.530]red. And they are long wave
- [00:12:57.230]radiation, photosynthetically
- [00:12:59.240]active radiation, normal irradiance,
- [00:13:01.820]temperature-weighted photosynthetically
- [00:13:04.350]active radiation, minimum air
- [00:13:06.640]temperature, maximum air
- [00:13:07.780]temperature, number of clear
- [00:13:09.670]days, number of frost days, and
- [00:13:11.660]precipitation minus potential
- [00:13:13.520]evapotranspiration.
- [00:13:15.140]And I ranked those based on a
- [00:13:16.630]partial least squares
- [00:13:17.910]regression, where yield was our
- [00:13:19.710]response variable. So we came
- [00:13:21.910]from a pretty big raw data set
- [00:13:24.380]to a data set that was averaged
- [00:13:26.980]by location of 20 rows, each
- [00:13:29.930]row representing one location
- [00:13:32.650]and 10 most relevant weather
- [00:13:35.900]variables.
- [00:13:38.840]And the second part, how to
- [00:13:40.050]integrate genotype by
- [00:13:41.090]environment signals to that
- [00:13:42.320]matrix.
- [00:13:42.880]Genotype by environment signals,
- [00:13:44.640]they are from a totally
- [00:13:45.770]different nature if we compare
- [00:13:47.300]to those environmental
- [00:13:48.500]variables.
- [00:13:49.300]The genotype by environment
- [00:13:51.130]signals, we can't just average
- [00:13:53.190]them by location as we did with
- [00:13:55.110]environmental variables.
- [00:13:57.100]Because if we do that, we're
- [00:13:58.800]actually neglecting the
- [00:14:00.250]assumption of nonlinear
- [00:14:01.730]relationship between genotypes
- [00:14:03.700]and environments.
- [00:14:06.380]And that assumption has been
- [00:14:08.010]valid for more than 100 years
- [00:14:10.050]and accepted by the breeding
- [00:14:11.680]community for more than 100
- [00:14:13.660]years.
- [00:14:14.400]So what I did, I integrated a
- [00:14:16.140]factor analytic effect in our
- [00:14:18.000]adjustment model where I was
- [00:14:19.740]able to extract genotype loadings
- [00:14:21.850]and genotype scores and
- [00:14:23.390]location loadings.
- [00:14:24.860]Those loadings, they represent
- [00:14:26.410]how much of the total genotype
- [00:14:27.780]by environment interaction that
- [00:14:29.330]is accounted by each location.
- [00:14:30.980]So those loadings, I also call
- [00:14:32.550]them genotype by environment
- [00:14:34.070]location signals.
- [00:14:35.920]And they are a vector of just,
- [00:14:39.340]a vector of order 20 by one.
- [00:14:40.920]20 is the number of locations,
- [00:14:45.560]and one is the column of genotype
- [00:14:48.100]by environment signals.
- [00:14:50.340]So now I can merge those two
- [00:14:52.920]matrices,
- [00:14:54.160]and the data is ready for clustering,
- [00:14:56.020]for understanding better how
- [00:14:57.020]those environments,
- [00:14:57.800]they group and explain that in
- [00:15:00.230]the context
- [00:15:01.140]of our testing network.
- [00:15:03.340]So before doing the formal clustering,
- [00:15:05.700]I ran this exploratory analysis
- [00:15:08.870]just to see how many clusters I
- [00:15:11.790]would expect per maturity group
- [00:15:15.080]by minimizing the sum of
- [00:15:16.080]squares of the Euclidean
- [00:15:17.130]distances between them.
- [00:15:18.340]So for maturity group 2, I was
- [00:15:20.310]supposed to expect three
- [00:15:21.850]clusters, and for maturity
- [00:15:23.610]group 3, four clusters.
- [00:15:25.580]And then I grouped them, and
- [00:15:27.330]after the hierarchical clustering,
- [00:15:29.680]I could see that for maturity
- [00:15:31.440]group 2 locations,
- [00:15:33.460]I had three clusters, and for
- [00:15:34.880]maturity group three locations,
- [00:15:36.550]I had four clusters.
- [00:15:37.700]And from now on, the results,
- [00:15:39.430]they started to make a lot of
- [00:15:40.960]sense.
- [00:15:41.580]So this is the map of the
- [00:15:42.990]target population of
- [00:15:44.270]environments.
- [00:15:45.440]For maturity group two
- [00:15:46.850]locations, I have this target
- [00:15:48.710]population of environment one,
- [00:15:50.820]target population of
- [00:15:51.770]environment two, and target
- [00:15:52.960]population of environment three.
- [00:15:54.580]For maturity group three, I had
- [00:15:56.500]four target population of
- [00:15:58.070]environments.
- [00:15:59.200]So the assumption that we can
- [00:16:00.640]make from that is that within
- [00:16:02.080]each cluster,
- [00:16:02.880]the weather variability is
- [00:16:04.290]minimized, and also the genotype
- [00:16:06.060]by environment interaction is
- [00:16:07.660]minimized, but across clusters
- [00:16:09.500]quite the opposite. The genotype
- [00:16:11.300]by environment interaction and
- [00:16:12.920]also the correlation between
- [00:16:14.420]the environmental features is
- [00:16:15.980]maximized. And
- [00:16:16.820]the next question is:
- [00:16:19.660]So now that we understand
- [00:16:20.810]better those 20 locations,
- [00:16:22.350]should we test in those 20
- [00:16:23.740]locations, or do we have more
- [00:16:25.120]informative locations within
- [00:16:26.560]each cluster that represent
- [00:16:28.300]those
- [00:16:28.640]20 locations and represent each
- [00:16:30.390]of those target populations of
- [00:16:31.850]environments. And that's
- [00:16:33.200]answering the second question:
- [00:16:34.780]which sites can serve as
- [00:16:36.250]testing hubs?
- [00:16:37.360]So some researchers from CIMMYT
- [00:16:39.610]working with a WIP data set,
- [00:16:41.560]they clustered environments
- [00:16:43.650]based on environmental data and
- [00:16:46.270]at the end
- [00:16:46.840]they suggested that as a
- [00:16:47.980]potential future research,
- [00:16:49.360]they should indicate
- [00:16:51.860]which sites would serve as
- [00:16:53.310]pivot locations. I call those
- [00:16:55.100]as testing hubs. And to answer
- [00:16:56.900]that question,
- [00:16:57.940]question I kind of adapted a
- [00:17:00.240]framework from the genomic from
- [00:17:02.940]genomics so in
- [00:17:04.160]genomics we construct genomic
- [00:17:06.170]relationship matrices so right
- [00:17:08.290]here I
- [00:17:08.740]constructed a environmental or
- [00:17:10.600]weather relationship matrix or
- [00:17:12.450]weather kingship
- [00:17:13.540]matrix based on the
- [00:17:14.990]environmental covariates and
- [00:17:17.360]genotype by environment
- [00:17:19.320]signals and this is an example
- [00:17:22.000]for the maturity group 3 where
- [00:17:24.900]I had 15
- [00:17:26.060]locations. So I had a
- [00:17:27.470]relationship matrix of 15 by 15,
- [00:17:30.180]15 representing the number of
- [00:17:32.340]locations.
- [00:17:33.340]And to choose which locations
- [00:17:35.100]would better represent those
- [00:17:36.940]locations,
- [00:17:37.700]I defined the criteria that I
- [00:17:40.840]really wanted to have an
- [00:17:43.510]average of one hub per target
- [00:17:46.690]population
- [00:17:48.000]of environments. So that says
- [00:17:50.180]that I really wanted to
- [00:17:51.720]identify four hubs in total for
- [00:17:53.820]maturity group
- [00:17:55.000]three. And I ran all possible kinship
- [00:17:57.640]matrix of non-hub environments.
- [00:18:00.040]And when the prediction
- [00:18:01.680]error variance was minimized,
- [00:18:03.660]that means that those four
- [00:18:05.280]locations that were out of that,
- [00:18:07.380]those 11 locations were the
- [00:18:09.310]chosen ones for representing
- [00:18:11.090]the hub environments. And after
- [00:18:13.820]I ran that, I was able to
- [00:18:15.230]identify those hub locations in
- [00:18:17.120]different target populations of
- [00:18:19.160]environments for my treaty
- [00:18:20.420]group two. So I was able to
- [00:18:21.650]identify one here in this
- [00:18:22.810]target population
- [00:18:23.740]of environment, another one
- [00:18:25.160]here, and another one here, and
- [00:18:26.750]for Matrida group 3, I was able
- [00:18:28.440]to identify four hub locations.
- [00:18:31.940]One in this target population
- [00:18:33.730]of environment, two here, none
- [00:18:35.760]here, and one here at Phillips
- [00:18:37.700]in Nebraska, which is in the
- [00:18:39.400]borderline between groups 2 and
- [00:18:41.430]3 testing areas.
- [00:18:42.780]So those hub locations, they're
- [00:18:45.600]actually anchor sites.
- [00:18:47.880]They have low redundancy in
- [00:18:50.970]between them.
- [00:18:52.880]They represent each target
- [00:18:53.880]population of environment,
- [00:18:55.020]actually the whole testing
- [00:18:57.000]network.
- [00:18:57.880]The genotype by environment
- [00:18:59.580]interaction and weather
- [00:19:01.050]information between those hubs
- [00:19:02.950]is actually,
- [00:19:03.880]with that assumption of maximizing
- [00:19:05.820]or minimizing the relationship
- [00:19:07.570]between them, it's actually
- [00:19:09.190]maximum.
- [00:19:09.880]The third question: How does
- [00:19:12.560]defining those hubs impact our
- [00:19:15.470]selection?
- [00:19:16.880]see that from evaluating gains
- [00:19:18.770]and stability. And that's what
- [00:19:20.810]I did. I evaluated gains and
- [00:19:22.720]stability using the same data
- [00:19:24.970]set that we use to generate
- [00:19:26.840]those previous results for
- [00:19:29.000]stratifying those environments.
- [00:19:31.020]And we could see that, as
- [00:19:32.360]expected, evaluating all the
- [00:19:34.140]environments, the genotypes in
- [00:19:35.960]all the environments provided
- [00:19:37.650]the best response to selection
- [00:19:39.340]or
- [00:19:39.560]gains, and also the better
- [00:19:41.400]estimates for stability. The
- [00:19:43.600]recommendations were more
- [00:19:45.400]stable, and
- [00:19:46.340]It gave us more selection gains.
- [00:19:50.580]However, and although testing
- [00:19:52.930]hubs,
- [00:19:53.480]they had fewer locations than
- [00:19:55.340]Nebraska locations,
- [00:19:56.800]the testing hubs provided
- [00:19:59.070]better response to selection
- [00:20:01.680]or gains and stability when we
- [00:20:03.680]compared them
- [00:20:04.660]to just evaluating our genotypes
- [00:20:06.190]in Nebraska locations.
- [00:20:07.460]So, and to validate our
- [00:20:09.110]methodology,
- [00:20:10.240]we use an external data set
- [00:20:11.880]from the Northern Uniform Soybean
- [00:20:13.690]Trials
- [00:20:14.100]that were not used to deploy
- [00:20:15.470]those results,
- [00:20:16.320]where we could see the same
- [00:20:18.590]pattern.
- [00:20:19.660]Hub locations were more
- [00:20:21.500]effective for gains and
- [00:20:23.330]stability
- [00:20:24.200]than just evaluating in
- [00:20:25.150]Nebraska locations,
- [00:20:26.040]although hub locations,
- [00:20:27.020]there were fewer than Nebraska
- [00:20:28.590]locations.
- [00:20:29.360]So the takeaway of this first
- [00:20:31.220]part of the presentation
- [00:20:32.920]is that fewer hub locations,
- [00:20:36.140]they resulted in greater gains
- [00:20:37.550]and more stable selections
- [00:20:38.660]than Nebraska locations.
- [00:20:39.820]And those hub locations are
- [00:20:41.440]alternative for optimized
- [00:20:43.600]and informative testing network.
- [00:20:46.640]So that's the end of the first
- [00:20:49.760]part of my research.
- [00:20:52.260]And the second part of my
- [00:20:53.470]research
- [00:20:54.000]is going to talk about genomic
- [00:20:56.320]sparse testing.
- [00:20:57.880]So we talked a lot about
- [00:20:58.690]environments,
- [00:20:59.360]about genotype by environment
- [00:21:00.580]interaction.
- [00:21:01.280]So right here in the genomic sparse
- [00:21:02.730]test,
- [00:21:03.060]we can actually play with genotypes,
- [00:21:05.950]environments,
- [00:21:07.280]and genotype by environment
- [00:21:08.690]interaction
- [00:21:09.280]in the context of genomics.
- [00:21:13.600]So before going straight to sparse
- [00:21:15.480]testing, to genomic sparse
- [00:21:17.030]testing, I would like to define
- [00:21:18.770]what's genomic selection.
- [00:21:20.440]So genomic selection leverages
- [00:21:23.450]genome-wide information at a
- [00:21:26.160]nucleotide level and creates
- [00:21:29.430]relationship matrices that are
- [00:21:31.880]leveraged in prediction models
- [00:21:34.850]that are designed to predict
- [00:21:37.820]untested genotypes.
- [00:21:40.720]So the first concept of genomic
- [00:21:42.860]selection, and I consider those
- [00:21:45.170]the two seminal papers for genomic
- [00:21:47.580]selection, it started back in
- [00:21:49.800]1994 when Rex Bernardo, a
- [00:21:52.060]quantitative geneticist in the
- [00:21:54.220]University of Minnesota, he
- [00:21:56.090]said that if in some day there
- [00:21:58.130]were covariates that could
- [00:21:59.730]describe the whole genome,
- [00:22:01.600]conventional prediction models
- [00:22:03.790]could be adapted into genomic
- [00:22:05.730]prediction models, which we
- [00:22:07.730]call today as G-blub.
- [00:22:10.720]In 2001, seven years later,
- [00:22:14.020]when genome-wide markers were
- [00:22:16.680]already becoming a reality,
- [00:22:19.160]Mewison simulated some
- [00:22:20.410]prediction models, and he
- [00:22:21.930]proved that predicting untested
- [00:22:23.640]genotypes
- [00:22:24.300]could be done with a certain
- [00:22:26.770]level of reliability.
- [00:22:29.120]That reliability, we call that
- [00:22:31.200]the prediction accuracy or
- [00:22:32.960]predictive ability.
- [00:22:34.540]And today, genomic selection is
- [00:22:36.850]a reality.
- [00:22:38.060]It's being deployed at a large
- [00:22:39.800]scale in the industry, like in
- [00:22:41.660]seed companies and in plant and
- [00:22:43.430]animal breeding, and not in a
- [00:22:45.350]very much large scale in public
- [00:22:47.580]breeding programs.
- [00:22:49.300]But as I said, it's a reality
- [00:22:51.120]and it's being deployed.
- [00:22:52.960]And the results are remarkable.
- [00:22:57.660]And it's being more recommended
- [00:22:59.910]for quantitative traits such as
- [00:23:02.120]grain yield, protein and oil
- [00:23:04.180]concentration, etc.
- [00:23:06.020]Just to summarize what it is,
- [00:23:08.270]we use phenotypes in our
- [00:23:10.060]prediction models. We integrate
- [00:23:12.740]those phenotypes with genomic
- [00:23:15.100]information to predict untested
- [00:23:17.600]phenotypes, untested genotypes.
- [00:23:20.520]So just for the sake of
- [00:23:21.720]comparison, here's the
- [00:23:23.130]conventional testing. Let's say
- [00:23:25.130]that in our breeding pipeline,
- [00:23:26.940]we have 1,000 genotypes to be
- [00:23:29.500]tested in four environments,
- [00:23:30.870]which totals 4,000 plots.
- [00:23:33.700]we're going to obtain 4,000
- [00:23:35.070]yield values at the end of the
- [00:23:36.130]season.
- [00:23:36.540]With genomic selection, we have
- [00:23:38.240]those same 1,000 genotypes.
- [00:23:40.380]All of them are going to have
- [00:23:42.710]their marker information,
- [00:23:45.200]their DNA marker information.
- [00:23:46.560]But just 75% of them are going
- [00:23:51.050]to be planted
- [00:23:52.880]or are going to have their phenotypic
- [00:23:54.290]information available.
- [00:23:55.400]Those that have just the DNA
- [00:23:57.300]marker information available,
- [00:23:59.660]we call that the testing or
- [00:24:01.250]cross-validation set,
- [00:24:03.040]test and cross validation set
- [00:24:04.880]and the ones that have both
- [00:24:06.480]data sets available, we call
- [00:24:08.300]them training data sets. And so
- [00:24:10.630]the training data set, which is
- [00:24:13.410]75% of the total of the
- [00:24:15.670]material that we have are going
- [00:24:18.320]to predict those 25% leveraged
- [00:24:21.450]by the genomic information here.
- [00:24:24.540]Our lab in 2022 already proved
- [00:24:27.960]how effective
- [00:24:31.860]Genomic selection can be
- [00:24:33.430]relative to conventional
- [00:24:35.090]selection. So it is at least 20%
- [00:24:37.470]more effective in terms of
- [00:24:39.060]gains when comparing to the
- [00:24:40.850]conventional phenotypic
- [00:24:42.340]selection.
- [00:24:43.320]This genomic selection right
- [00:24:44.900]here when we compare to the sparse
- [00:24:46.650]design, I used to say that this
- [00:24:48.270]is the conventional genomic
- [00:24:49.790]selection where we're just
- [00:24:51.240]predicting 25% of that.
- [00:24:54.960]So we have not yet finished the
- [00:24:56.460]investigation of the sparse
- [00:24:57.970]designs in the context of the
- [00:24:59.340]genomic selection in our
- [00:25:00.600]program.
- [00:25:01.240]But just for defining what are
- [00:25:03.670]sparse designs integrated with
- [00:25:06.220]genomic selection, sparse
- [00:25:08.390]designs are actually partial
- [00:25:10.230]replicate designs where a
- [00:25:11.610]subset of lines are replicated
- [00:25:13.450]across environments.
- [00:25:14.940]But not every line is
- [00:25:16.030]replicated across every single
- [00:25:17.780]environment.
- [00:25:18.720]It's very sparse.
- [00:25:21.080]And the training set here is
- [00:25:23.510]relatively small compared to
- [00:25:26.040]the training set of the
- [00:25:28.000]conventional genomic selection.
- [00:25:31.320]It's actually much smaller than
- [00:25:33.130]that other training set.
- [00:25:34.740]So how we plan to investigate
- [00:25:36.540]that, we're going to do that in
- [00:25:38.480]terms of the training set
- [00:25:40.040]composition.
- [00:25:41.200]If we're able to answer which
- [00:25:43.140]genotypes we want to include in
- [00:25:45.240]the training set,
- [00:25:46.740]how many genotypes do we want
- [00:25:47.730]to include in the training set,
- [00:25:48.940]and which genotypes do we want
- [00:25:50.350]to overlap or replicate across
- [00:25:51.740]environments.
- [00:25:52.620]We're pretty much answering,
- [00:25:54.180]checking this first criteria
- [00:25:55.680]here of training set
- [00:25:56.700]composition.
- [00:25:57.580]We're also integrating in our
- [00:26:00.130]training set data from
- [00:26:02.080]historical yield trials,
- [00:26:04.500]which has mostly information
- [00:26:06.370]from advanced pipelines.
- [00:26:08.220]And that is going to check
- [00:26:10.200]those two other criteria of
- [00:26:12.370]late-stage trials in the
- [00:26:14.350]training set
- [00:26:15.460]and it's increased test
- [00:26:16.560]environments. And since we're
- [00:26:18.100]working with genomic
- [00:26:19.090]relationship matrices,
- [00:26:20.420]we're also checking this
- [00:26:21.540]criteria of integrating marker
- [00:26:23.000]by environment interaction. And
- [00:26:24.680]we're going
- [00:26:25.140]to go beyond that because we're
- [00:26:26.840]also using environmental covariates
- [00:26:28.880]for that. So we're
- [00:26:30.020]actually doing market by
- [00:26:31.240]environment covariates
- [00:26:32.470]interaction, which it's a
- [00:26:33.840]little bit more complex,
- [00:26:35.220]but I'll show you it has some
- [00:26:36.750]positive results. So the main
- [00:26:38.580]two questions that I'm going to
- [00:26:40.330]answer
- [00:26:40.900]here today for the context of
- [00:26:42.740]genome sparse testing is what
- [00:26:44.660]is an appropriate training set
- [00:26:46.820]size and overlapping rate in
- [00:26:48.740]that sparse design and does the
- [00:26:50.860]inclusion of the environmental
- [00:26:52.990]matrix
- [00:26:53.540]benefit the prediction accuracy
- [00:26:56.390]and to do so i'm going to use
- [00:26:58.600]the same data set that i used
- [00:27:01.040]to
- [00:27:01.220]stratify those environments in
- [00:27:02.720]the first part of my
- [00:27:03.650]presentation but not all of
- [00:27:04.860]them here i'm just
- [00:27:05.780]going to include the maturity
- [00:27:07.060]group too otherwise it's going
- [00:27:08.400]to be a lot of information for
- [00:27:09.720]for
- [00:27:10.660]today. So it's almost 2,000 genotypes
- [00:27:14.370]and more than 1,400 of them is
- [00:27:17.230]comprised within the
- [00:27:18.900]preliminary
- [00:27:19.260]set and almost 400 of them is
- [00:27:21.370]the advanced sets in our
- [00:27:22.860]testing pipeline. They were
- [00:27:24.710]tested in 19
- [00:27:25.820]environments in all those
- [00:27:27.610]locations here for maturity
- [00:27:29.560]group 2 and they were genotype
- [00:27:31.780]using
- [00:27:32.240]the micro inversion probes 1k
- [00:27:34.670]SNP set that was developed here
- [00:27:36.870]by Dr. David Heitens' lab. Here's
- [00:27:40.000]publication it's pretty
- [00:27:41.220]interesting if you find it
- [00:27:42.480]interesting about learning more
- [00:27:44.130]of that genotype
- [00:27:45.040]methodology it's available in
- [00:27:46.520]that paper and there's another
- [00:27:48.050]interesting paper about that
- [00:27:49.520]so that's the data set that we
- [00:27:51.220]use it to deploy our genomics
- [00:27:52.890]parts testing
- [00:27:53.840]so before going straight to the
- [00:27:56.850]uh to analyzing the data we saw
- [00:27:59.680]that for most part of our data
- [00:28:02.830]set
- [00:28:02.960]there was no clear structure
- [00:28:05.060]which allowed us to move
- [00:28:06.800]forward and integrate the phenotypes,
- [00:28:09.840]the genomic relationship
- [00:28:11.050]matrices, and the environmental
- [00:28:12.640]data into a very simple
- [00:28:13.760]prediction
- [00:28:14.240]model which is phenotypes is
- [00:28:15.680]equal to genotypes plus
- [00:28:16.880]environments plus genotype by
- [00:28:18.440]environment
- [00:28:19.120]interaction. Before showing the
- [00:28:23.170]results of the SPARSE testing
- [00:28:26.550]framework, I ran an exploratory
- [00:28:30.400]analysis just to guide us when
- [00:28:32.900]we actually run these sparse
- [00:28:35.320]designs. So this is a four-fold
- [00:28:38.240]phenotypic and genomic
- [00:28:39.720]selection strategy where I ran
- [00:28:41.670]20 independent runs or replications
- [00:28:44.300]of the
- [00:28:44.800]same analysis considering 75%
- [00:28:47.410]of the lines in the training
- [00:28:49.400]set and 25% of them in the
- [00:28:51.470]testing set.
- [00:28:52.720]And when I did that without the
- [00:28:54.800]genomic information, we saw
- [00:28:56.870]that the correlations
- [00:28:58.560]were negative, were minus 30.
- [00:29:01.340]So if the breeder selects untested
- [00:29:03.840]genotypes just relying on phenotypes,
- [00:29:06.880]the selections are going to be
- [00:29:08.690]wrong and it's going to mess up
- [00:29:10.480]downstream in the next years of
- [00:29:12.480]of the selection pipeline. But
- [00:29:14.540]that usually doesn't happen
- [00:29:16.310]because
- [00:29:16.880]programs that rely mostly on
- [00:29:18.850]phenotypic selection, they don't
- [00:29:21.290]usually
- [00:29:24.560]predict and test the genotypes
- [00:29:26.810]based just on phenotypes. But
- [00:29:29.120]when we included the
- [00:29:30.640]genomic relationship matrix, we
- [00:29:33.130]saw that the average prediction
- [00:29:35.460]accuracy was 0.55.
- [00:29:37.760]And why I did that? It's
- [00:29:39.570]because this training 75% and
- [00:29:42.120]testing 25%, it's like a
- [00:29:44.320]benchmark for
- [00:29:45.520]sparse testing considering that
- [00:29:47.350]sparse testing has a much
- [00:29:48.710]smaller training test size. So
- [00:29:50.520]if our
- [00:29:50.880]prediction accuracy ranges
- [00:29:52.580]around 55% for sparse designs,
- [00:29:54.730]it shows us that we're going
- [00:29:56.380]towards the
- [00:29:56.800]right direction into
- [00:29:58.460]investigating our sparse
- [00:30:00.470]testing. So those are the sparse
- [00:30:03.100]testing results
- [00:30:04.480]that I have to present here so
- [00:30:06.390]far, where I plotted the
- [00:30:08.080]predictive ability or the
- [00:30:09.780]prediction
- [00:30:10.640]accuracy or reliability in
- [00:30:12.300]predicting untested genotypes
- [00:30:14.240]against training set size
- [00:30:15.830]according to
- [00:30:17.360]increasing overlapping rates.
- [00:30:18.770]When I say overlapping, it's
- [00:30:19.840]the replication of lines
- [00:30:20.880]across environments. And we
- [00:30:22.730]could see that, and I did that
- [00:30:24.470]using just the genomic
- [00:30:25.810]relationship matrix
- [00:30:27.200]and use the genomic
- [00:30:28.060]relationship matrix and the
- [00:30:29.380]environmental relationship
- [00:30:30.800]matrix.
- [00:30:31.280]And we could see that
- [00:30:32.360]regardless of the scenario, the
- [00:30:34.140]prediction accuracy increased
- [00:30:35.870]as the training
- [00:30:36.800]set size increased and as the
- [00:30:38.330]overlapping rates increased.
- [00:30:40.130]But right here, and I divided
- [00:30:41.710]actually
- [00:30:42.320]those two plots into three
- [00:30:44.020]sections. The first section
- [00:30:45.980]from 0.35 percent of accuracy
- [00:30:48.500]until 0.45,
- [00:30:49.680]the second one from 0.45 to 0.50,
- [00:30:52.320]and the third one from 0.50
- [00:30:54.120]above. And you could see that
- [00:30:55.680]using
- [00:30:56.000]environmental information here
- [00:30:58.430]provided us better prediction
- [00:31:00.540]accuracies overall. We had more
- [00:31:03.040]prediction accuracy points that
- [00:31:05.640]fell between 0.45 and 0.50 of
- [00:31:09.360]prediction accuracy, which is
- [00:31:11.840]is beneficial. But if you ask
- [00:31:13.780]me just to pick one point to
- [00:31:15.440]recommend for a breeder to run
- [00:31:17.290]his tests
- [00:31:18.020]based on a number of
- [00:31:19.430]individuals in the training set
- [00:31:21.770]size and overlapping, I would
- [00:31:24.040]say that it
- [00:31:24.920]really depends. It depends on
- [00:31:26.330]the context. It depends if the
- [00:31:27.690]program has any testing
- [00:31:28.620]bottlenecks
- [00:31:29.260]or any backlogs. So let's say
- [00:31:31.240]if I had to test huge
- [00:31:32.440]populations and a great number
- [00:31:34.420]of populations
- [00:31:35.520]and I really needed that type
- [00:31:36.760]of information,
- [00:31:37.580]I would go with 45% of the
- [00:31:40.010]training set size
- [00:31:41.480]in a sparse design to test all
- [00:31:43.050]those lines
- [00:31:43.760]with 50% of overlapping.
- [00:31:46.640]So in other words,
- [00:31:47.540]if I had to test 100 genotypes
- [00:31:49.620]in two locations
- [00:31:50.620]in the traditional framework,
- [00:31:51.800]I would have to test 200
- [00:31:53.580]observations
- [00:31:54.480]with the sparse design
- [00:31:55.700]providing the same type of
- [00:31:56.810]information.
- [00:31:57.560]We're actually testing 46 genotypes
- [00:32:00.120]in just one location
- [00:32:01.160]and 22 genotypes in two
- [00:32:02.570]locations,
- [00:32:03.440]total 90 observations or 45% of
- [00:32:06.800]the total number of
- [00:32:08.440]observations.
- [00:32:12.080]So it's all about finding like
- [00:32:14.110]a sweet spot between
- [00:32:15.540]how much to test and where to
- [00:32:17.890]test.
- [00:32:18.660]It's really context dependent.
- [00:32:20.960]And we saw that integrating
- [00:32:22.740]weather information here
- [00:32:24.520]is beneficial, but it's a trade
- [00:32:26.480]off,
- [00:32:26.900]much more parameters to
- [00:32:27.850]estimate,
- [00:32:28.260]and there is a computational,
- [00:32:30.520]more computational demand to do
- [00:32:31.930]so.
- [00:32:32.160]So there's some steps to go
- [00:32:35.460]towards this
- [00:32:37.160]in terms of data analysis
- [00:32:39.160]pipeline optimization as well.
- [00:32:42.020]So for the third part of the
- [00:32:45.630]presentation,
- [00:32:47.960]I'm going to talk about phenomics,
- [00:32:51.000]about drone-based phenotyping
- [00:32:52.410]for plant maturity.
- [00:32:53.400]So those first two sections,
- [00:32:55.580]I talked mostly about grain
- [00:32:57.540]yield,
- [00:32:58.180]genomic selection and stratification
- [00:32:59.600]of environments
- [00:33:00.220]for grain yield, which is our
- [00:33:02.360]main trait.
- [00:33:03.460]It's what drives most of our
- [00:33:05.560]breeding objectives.
- [00:33:07.540]But plant maturity is also a
- [00:33:09.240]key trait
- [00:33:09.860]because yield is actually
- [00:33:12.660]meaningless
- [00:33:14.100]if it does not come with
- [00:33:16.040]information about maturity.
- [00:33:18.880]Because with maturity notes,
- [00:33:20.500]we can make proper selections
- [00:33:21.770]and recommendations,
- [00:33:22.840]and we can create yield tests
- [00:33:25.720]for the next years of testing.
- [00:33:27.080]We have to compare things that
- [00:33:28.370]mature relatively
- [00:33:30.180]close together like pretty much
- [00:33:31.640]on the same dates.
- [00:33:32.600]We cannot compare things that
- [00:33:35.370]mature,
- [00:33:36.200]let's say one week or 10 days
- [00:33:38.340]apart.
- [00:33:39.000]And how we do that in our
- [00:33:40.450]program so far,
- [00:33:41.560]we do that with visual scores.
- [00:33:43.540]We go to the field when the
- [00:33:45.850]plants are around R6 stage
- [00:33:48.540]until full maturity, we go
- [00:33:50.790]there around twice a week.
- [00:33:53.180]And we record the visual scores.
- [00:33:58.680]And we do that in between 15,000
- [00:34:01.550]to 20,000 plots per year in our
- [00:34:04.340]program.
- [00:34:05.280]But in those recent years,
- [00:34:07.350]drone phenotyping has become a
- [00:34:09.690]pretty hot topic for research
- [00:34:11.940]in many breeding programs and
- [00:34:14.270]also in other areas of
- [00:34:15.810]scientific research,
- [00:34:17.760]which uses a specialized camera
- [00:34:20.580]attached to a drone and
- [00:34:22.560]collects thousands of
- [00:34:24.550]information, of photos.
- [00:34:27.240]And those photos are translated
- [00:34:29.300]into relevant information for
- [00:34:31.280]whatever projects, in our case,
- [00:34:33.480]for plant breeding projects,
- [00:34:35.460]more specifically plant
- [00:34:37.060]maturity.
- [00:34:38.000]A lot of papers were already
- [00:34:40.470]published with open code
- [00:34:42.720]pipelines for various crops,
- [00:34:45.530]including soybeans.
- [00:34:47.780]And we can leverage that type
- [00:34:49.220]of information to create our
- [00:34:50.660]own drawn phenotyping pipeline.
- [00:34:53.840]So, what is the reason for
- [00:34:55.400]adopting high throughput phenotyping
- [00:34:57.770]in our program?
- [00:34:58.920]I would point out three reasons.
- [00:35:00.820]The first one is because it's
- [00:35:02.880]fast.
- [00:35:03.520]The second one is because it's
- [00:35:04.680]accurate.
- [00:35:05.220]And the third one is because it's
- [00:35:06.630]safe.
- [00:35:06.960]So, it's pretty fast.
- [00:35:08.660]We can fly over 3,500 plots in
- [00:35:11.010]10 minutes.
- [00:35:12.220]So, if I go to the field to
- [00:35:13.590]phenotype those plots for plant
- [00:35:15.300]maturity, it would take me at
- [00:35:16.920]least two days to go over those
- [00:35:18.530]3,500 plots.
- [00:35:22.000]And that project started here
- [00:35:23.860]at UNL in our breeding program
- [00:35:25.720]in 2023. We've been flying our
- [00:35:28.560]drones over five of our testing
- [00:35:30.990]locations in Nebraska. We have
- [00:35:33.130]been flying our drones over 60,000
- [00:35:36.520]plots over those two years.
- [00:35:38.720]and we have to start flying the
- [00:35:40.480]drone from late August through
- [00:35:42.310]early October
- [00:35:43.240]with the same consistency that
- [00:35:45.260]we go to the field to take our
- [00:35:47.060]visual notes.
- [00:35:48.180]And we still have to take our
- [00:35:51.040]visual notes
- [00:35:52.440]because we have to validate our
- [00:35:54.340]drone pipeline
- [00:35:55.340]with the real manual or visual
- [00:35:57.590]note-taking.
- [00:35:58.820]And here's just a summary of
- [00:36:01.070]the pipeline.
- [00:36:02.480]We collect thousands of photos
- [00:36:04.180]of the drone.
- [00:36:05.040]We stitch them in just one big
- [00:36:06.960]photo per field, which we call
- [00:36:09.080]orthomosaic or raster file.
- [00:36:11.200]And then we lay over each orthomosaic
- [00:36:13.990]like very small rectangles,
- [00:36:16.240]which we call them shape files
- [00:36:18.490]that identify individual plots.
- [00:36:21.260]And then we create a time
- [00:36:23.440]series data set where we
- [00:36:25.630]extract the vegetation index.
- [00:36:29.000]Here we're using the normalized
- [00:36:30.740]green-red difference index,
- [00:36:32.410]which has been giving us the
- [00:36:33.890]best accuracy in estimating
- [00:36:35.440]maturity from the drone.
- [00:36:37.060]And using this index, this
- [00:36:39.130]index serves as an input for
- [00:36:41.200]our logistic regression with
- [00:36:43.450]four parameters to estimate our
- [00:36:46.050]maturities.
- [00:36:47.160]So here's the time series raster
- [00:36:50.980]or orthomosaics.
- [00:36:53.400]we have the photo when most of
- [00:36:56.170]the plants were before R6. Some
- [00:36:59.290]of them were already in R6.
- [00:37:00.460]And this is the very last photo.
- [00:37:02.540]In between those two, we have
- [00:37:04.400]probably eight to ten other
- [00:37:06.110]photos,
- [00:37:06.780]but here I just included two.
- [00:37:09.460]Right here is one field with
- [00:37:11.840]the shape file laid over the
- [00:37:14.210]orthomosaic.
- [00:37:15.800]As you can see, we can
- [00:37:17.350]individualize every single
- [00:37:19.470]unique two-row plot and four-row
- [00:37:21.900]plot here.
- [00:37:23.400]in those shape files and we
- [00:37:24.880]extract information at a plot
- [00:37:26.570]level from each of those. So as
- [00:37:29.280]I said, we are able to detect
- [00:37:32.390]differences at a plot level.
- [00:37:34.030]Those two genotypes, they are
- [00:37:35.770]neighboring each other in the
- [00:37:37.450]field. And let's say for the
- [00:37:39.120]flight number six, this genotype
- [00:37:41.050]was pretty much mature while
- [00:37:42.660]this one was still kind of
- [00:37:44.160]yellowish. It was probably one
- [00:37:45.940]week behind of getting mature.
- [00:37:47.810]So we're able to detect those
- [00:37:49.450]differences.
- [00:37:52.680]And with the vegetation index
- [00:37:54.770]run through a logistic
- [00:37:56.360]regression, we can estimate the
- [00:37:58.700]parameters that we want.
- [00:38:00.740]So with the parameters that we
- [00:38:02.670]want, we can identify this
- [00:38:04.440]upper plateau.
- [00:38:05.700]This means that the plants are
- [00:38:07.640]still green.
- [00:38:08.720]There's a decay here.
- [00:38:10.180]That means that the plants are
- [00:38:11.550]getting yellowish.
- [00:38:12.540]They're getting mature.
- [00:38:13.300]And when it reaches that third
- [00:38:15.220]parameter or lower plateau,
- [00:38:17.140]that means that the plants
- [00:38:18.830]reached full maturity.
- [00:38:20.880]And that's when the maturity
- [00:38:22.150]notes are recorded from the
- [00:38:23.370]drone, from the high-throughput
- [00:38:24.920]phenotyping.
- [00:38:25.820]And this is the data from 2024
- [00:38:28.380]from just three genotypes.
- [00:38:30.420]But we have thousands of genotypes.
- [00:38:32.800]But I just ran those for three
- [00:38:34.300]genotypes so you could
- [00:38:35.480]visualize that.
- [00:38:36.540]And just for 2024 overall, we
- [00:38:38.970]have an accuracy between 65%
- [00:38:40.970]and 89% in getting those notes
- [00:38:43.070]from the high-throughput phenotyping,
- [00:38:45.620]which is not bad.
- [00:38:47.240]and we still have to validate
- [00:38:49.780]that for 2025.
- [00:38:52.160]So here's what we estimated
- [00:38:54.120]from the drone.
- [00:38:55.420]Here's what we estimated from
- [00:38:57.230]manual note taking.
- [00:38:58.660]You see that there is a
- [00:39:00.090]coincidence here,
- [00:39:01.460]but there is some deviation
- [00:39:03.020]that's expected,
- [00:39:04.180]but overall the accuracies are
- [00:39:06.530]good, 75 to 89%.
- [00:39:08.760]That's a good accuracy.
- [00:39:11.620]So this is the very last slide
- [00:39:14.190]of the presentation.
- [00:39:16.400]And I would like to talk a
- [00:39:18.230]little bit about what is the
- [00:39:20.330]impact of everything that I
- [00:39:22.440]said on the University of
- [00:39:24.360]Nebraska soybean breeding
- [00:39:26.460]program.
- [00:39:27.500]And I'm going to answer that in
- [00:39:30.270]terms of response to selection
- [00:39:33.140]or gains.
- [00:39:35.180]So gains are a function of
- [00:39:36.980]prediction accuracy or accuracy,
- [00:39:39.570]selection intensity, genetic
- [00:39:41.950]variance, and generation
- [00:39:44.170]interval, which is the time
- [00:39:45.870]from enough observations being
- [00:39:47.840]tested in the field and when
- [00:39:49.510]those genotypes being tested in
- [00:39:51.500]the field when they are back to
- [00:39:53.280]the crossing block to serve as
- [00:39:55.060]parents.
- [00:39:57.300]I don't want to include
- [00:39:58.950]accuracy and selection
- [00:40:00.710]intensity here in the
- [00:40:02.280]discussion because those are
- [00:40:04.550]very population dependent and
- [00:40:06.660]that's going to complicate
- [00:40:08.580]things. And I'm also not going
- [00:40:10.650]to talk about the impacts on
- [00:40:12.480]gains from stratification of
- [00:40:14.710]environments and from drone
- [00:40:16.420]phenotype. And I could talk
- [00:40:18.160]about that for the whole day.
- [00:40:19.390]But here I'm going to focus on
- [00:40:20.520]this parse testing. So just
- [00:40:21.710]thinking about this parse
- [00:40:22.770]testing.
- [00:40:24.520]And those are real numbers from
- [00:40:26.650]our program.
- [00:40:27.760]Those are the numbers from the
- [00:40:29.260]lines that I have genotyped
- [00:40:30.640]from my project and that are
- [00:40:32.020]part of our code word
- [00:40:32.830]development pipeline.
- [00:40:33.960]So let's suppose those 3,000
- [00:40:36.480]lines are tested in a regular
- [00:40:38.310]testing framework in two
- [00:40:39.920]locations.
- [00:40:40.920]We have 6,000 plots to be
- [00:40:43.350]tested.
- [00:40:44.060]And selecting 20% of those 3,000
- [00:40:46.740]genotypes, we ended up with 600
- [00:40:48.950]genotypes being selected the
- [00:40:50.630]next cycle.
- [00:40:52.440]Considering that that cycle
- [00:40:54.320]takes four years to go back, to
- [00:40:56.320]complete that cycle, to go back
- [00:40:58.410]to the crossing block to begin
- [00:41:00.350]another cycle, it gives us a
- [00:41:02.210]gain of 0.35, absolute gain of
- [00:41:04.940]0.35 per year.
- [00:41:06.940]If we deploy sparse testing, we
- [00:41:08.640]can save resources, we can
- [00:41:10.050]screen those same 3,000 genotypes,
- [00:41:12.570]but instead of planting 6,000
- [00:41:14.700]plots, we're planting 3,000
- [00:41:16.850]plots.
- [00:41:17.560]And the absolute gains per year
- [00:41:20.490]is still 0.35.
- [00:41:23.000]But if we reinvest what we
- [00:41:24.700]saved from sparse testing into
- [00:41:26.830]screening more lines, still
- [00:41:28.810]keeping the total number of
- [00:41:30.620]plots in 6,000, we can screen 4,500
- [00:41:34.580]lines, selecting the same
- [00:41:36.340]number of lines, which means we're
- [00:41:38.740]not hampering the effective
- [00:41:40.670]population size of our breeding
- [00:41:42.830]program.
- [00:41:43.960]and also that's important
- [00:41:45.540]thinking about the future
- [00:41:47.120]diversity of our breeding
- [00:41:48.700]program.
- [00:41:49.520]And this gives us an absolute
- [00:41:52.100]gain of 0.4 per year.
- [00:41:54.860]But if we reinvest that into
- [00:41:57.160]screening more genotypes,
- [00:41:59.440]and instead of testing our genotypes
- [00:42:01.330]in two environments, in four
- [00:42:02.850]environments,
- [00:42:04.780]and still having a limit of 6,000
- [00:42:07.550]plots for 3,750 lines,
- [00:42:09.780]we have absolute gains of 0.51
- [00:42:17.040]per year.
- [00:42:19.320]This is an increase of 47% in
- [00:42:22.670]gains relative
- [00:42:24.320]to the conventional test.
- [00:42:26.820]And that's a lot and that's the
- [00:42:28.920]impact
- [00:42:29.480]of what just the sparse testing
- [00:42:31.230]design can promote
- [00:42:33.680]in our cultural development
- [00:42:34.960]pipeline.
- [00:42:35.580]So that's the end of my
- [00:42:37.580]presentation.
- [00:42:39.320]I would like to thank everyone
- [00:42:42.330]that's in our lab,
- [00:42:44.320]Dr. George Graff, Luis Posadas,
- [00:42:47.000]Dr. David Highton.
- [00:42:48.800]He's also involved in a lot of
- [00:42:51.480]the research
- [00:42:52.780]that we carry out in our
- [00:42:54.190]program.
- [00:42:54.920]Grad students, technologists,
- [00:42:56.330]and technicians,
- [00:42:57.160]and also the committee of my
- [00:42:59.590]PhD, Dr. David Highton,
- [00:43:03.320]Dr. Rick Howard, Dr. Jin Yang
- [00:43:05.770]Yang, and Caio Diaz from Brazil.
- [00:43:08.320]And I would like also to thank
- [00:43:10.900]the support
- [00:43:12.040]from the Nebraska Soybean Board.
- [00:43:15.080]They are big supporters of most
- [00:43:17.510]of what we do here
- [00:43:18.820]in our program.
- [00:43:19.920]Thanks.
- [00:43:20.760]- Thanks, Isaac.
- [00:43:28.200]Yeah, great presentation.
- [00:43:29.580]So now we have some time for
- [00:43:31.120]some questions.
- [00:43:32.280]I'll have you start. So, uh,
- [00:43:34.140]when you come up with, when you,
- [00:43:37.370]with results that show that,
- [00:43:40.360]you know, clear benefits from
- [00:43:42.000]optimization like this, in your
- [00:43:43.750]opinion, what is, um, what is
- [00:43:45.680]the
- [00:43:45.960]response from, uh, a breeding
- [00:43:48.280]program to results like this
- [00:43:50.370]are, is, uh, not just picking
- [00:43:52.740]on George,
- [00:43:53.660]but, uh, is our, like our
- [00:43:55.270]breeding public breeding
- [00:43:56.810]programs quick to respond to
- [00:43:58.490]information
- [00:43:59.280]like this are they quick to
- [00:44:01.190]change um to because this is
- [00:44:03.180]relying on you know changing
- [00:44:05.270]where you've
- [00:44:06.300]been testing how much you've
- [00:44:07.850]been testing yeah yeah for sure
- [00:44:09.540]those changes they have to be
- [00:44:11.110]carried
- [00:44:11.660]out like gradually it's hard to
- [00:44:14.120]implement that right away and
- [00:44:16.480]uh because it can create like a
- [00:44:18.920]big shift in the breeding
- [00:44:20.000]program if that gets
- [00:44:20.880]implemented right away so i
- [00:44:22.070]would say it's a
- [00:44:22.820]gradual change and thinking
- [00:44:25.380]about public breeding programs
- [00:44:28.230]where turnaround time doesn't
- [00:44:31.010]always
- [00:44:31.720]help us in terms, let's say,
- [00:44:33.230]with genotyping. We have to genotype
- [00:44:35.130]things fast to make that
- [00:44:36.140]work efficiently. That not
- [00:44:38.690]always happens, so we have to
- [00:44:41.230]really examine how to do that
- [00:44:43.760]in a good way to make that
- [00:44:46.010]succeed in the long term, I
- [00:44:48.580]would say.
- [00:44:52.100]Questions?
- [00:44:52.600]In the slide, you showed a big
- [00:44:58.410]field with lots and lots of
- [00:45:03.410]plots in it.
- [00:45:06.240]Yes.
- [00:45:06.520]How do you deal with spatial
- [00:45:07.930]variability in the soil?
- [00:45:09.320]Okay.
- [00:45:10.040]Yeah, there is spatial
- [00:45:11.100]variability in the soil, and we
- [00:45:12.690]handle spatial variability with
- [00:45:14.700]our experimental designs.
- [00:45:16.560]And our experimental designs,
- [00:45:18.000]they have been doing a great
- [00:45:19.220]job with controlling the
- [00:45:20.380]spatial variability in the soil.
- [00:45:22.100]So for advanced tests, we're
- [00:45:24.000]using complete block designs or
- [00:45:26.050]randomized complete block
- [00:45:27.810]designs with two reps in that
- [00:45:29.620]case.
- [00:45:30.280]And in the preliminary, just
- [00:45:31.890]one rep.
- [00:45:32.500]And that can control the
- [00:45:33.710]spatial variation in those.
- [00:45:35.360]And I've compared that with
- [00:45:37.240]spatial designs, which kind of
- [00:45:39.440]eliminates the spatial
- [00:45:41.100]variation in our trials.
- [00:45:43.040]And it's comparable.
- [00:45:44.280]The results are comparable.
- [00:45:45.260]So that's taken into account to
- [00:45:46.960]evaluate our trials.
- [00:45:49.240]We plant those and we evaluate
- [00:45:51.170]those in our statistical models
- [00:45:53.320]following statistical and
- [00:45:54.700]experimental designs.
- [00:45:56.060]So the spatial variation, it
- [00:45:58.310]exists, but we handle those.
- [00:46:00.800]Have you tried incorporating
- [00:46:04.740]satellite data as opposed to
- [00:46:08.510]drone data?
- [00:46:10.440]There are some. I've heard that
- [00:46:15.400]there are some startups trying
- [00:46:16.730]to do that with those big seed
- [00:46:17.940]companies.
- [00:46:19.240]satellite data, taking photos
- [00:46:21.080]from the fields and providing
- [00:46:22.840]those photos from their
- [00:46:24.130]breeding programs. But the
- [00:46:25.840]thing here is resolution. I'm
- [00:46:27.740]not sure how good the
- [00:46:28.810]resolution would be. Because
- [00:46:30.650]here for the drone, we have a
- [00:46:32.260]pretty high resolution of one
- [00:46:33.960]centimeter per pixel. And that's
- [00:46:35.990]a lot. Because we need high
- [00:46:37.610]quality photos to identify
- [00:46:39.170]those patterns in maturity. But
- [00:46:41.150]yeah, maybe in the future, that's
- [00:46:43.160]going to become a reality. If
- [00:46:44.920]it's not already a reality that
- [00:46:48.480]I do not know yet. Have you
- [00:46:50.370]compared any? No, I haven't.
- [00:46:53.730]Because I think you could pay
- [00:46:56.510]for high resolution satellite
- [00:46:59.550]photos on a particular day and
- [00:47:02.460]just compare those.
- [00:47:04.820]And to get one centimeter per
- [00:47:07.380]pixel, we fly our drones in 37
- [00:47:10.460]meters, 40 meters above ground.
- [00:47:14.180]So
- [00:47:14.900]the satellite is I don't know
- [00:47:17.220]where. So yeah, my concern is
- [00:47:19.830]about resolution.
- [00:47:21.700]Plus then Ryan wouldn't have a
- [00:47:23.510]job. So.
- [00:47:31.300]Thank you for the great
- [00:47:32.560]presentation. I'm curious in
- [00:47:34.360]terms of the testing hubs that
- [00:47:36.030]you have selected,
- [00:47:37.060]because all your data is based
- [00:47:38.930]on four years, right? 2020 to
- [00:47:41.540]2024? Yes. So let's suppose we
- [00:47:44.420]expand that range, right? From
- [00:47:46.170]1990s all the way to up here,
- [00:47:48.100]or maybe into the future. Would
- [00:47:49.780]you
- [00:47:50.100]expect the testing hub location
- [00:47:51.940]would change or will they stay
- [00:47:53.700]consistent throughout the years?
- [00:47:55.860]Yes, because it's dependent on
- [00:47:57.040]the genotypes that we're
- [00:47:57.960]evaluating. Since I integrated
- [00:47:59.250]genotype by
- [00:47:59.780]by environment interaction
- [00:48:01.410]signals in those stratifying
- [00:48:03.090]models, if we go really far
- [00:48:04.630]from
- [00:48:04.980]what we have now in our program,
- [00:48:06.390]it's another variability,
- [00:48:07.540]another germplasm. I mean, it's
- [00:48:08.940]the
- [00:48:09.080]same germplasm, but with
- [00:48:10.280]different allele frequencies.
- [00:48:11.880]So I would expect that it would
- [00:48:13.320]change. It actually changes if
- [00:48:14.580]I do not include the genotype
- [00:48:15.650]by environment interaction
- [00:48:16.740]signals
- [00:48:17.160]in that environmental matrix.
- [00:48:19.010]So the results are different.
- [00:48:20.800]And it makes more sense after I
- [00:48:22.520]included those genotype by
- [00:48:23.710]environment interaction. So I
- [00:48:25.170]guess from that point of view,
- [00:48:26.560]right,
- [00:48:27.260]On a practicality note, how
- [00:48:28.890]many years do you recommend
- [00:48:30.460]that we re-evaluate what is the
- [00:48:32.210]optimal location to test our
- [00:48:33.850]genotype?
- [00:48:34.800]I would say anything from four
- [00:48:36.500]to seven.
- [00:48:37.180]Here, I evaluated four or five
- [00:48:38.610]years from 2020.
- [00:48:39.960]We still have data from 2025
- [00:48:41.890]that we can use because I can
- [00:48:43.300]still put that and include that
- [00:48:44.910]in my study.
- [00:48:45.780]Thinking about the initial
- [00:48:47.270]cross from the code for
- [00:48:48.550]development, when that line is
- [00:48:50.250]not more in the testing
- [00:48:51.480]pipeline, that's seven years.
- [00:48:53.920]And four years is the cycle
- [00:48:55.460]when that line comes back to
- [00:48:57.070]the crossing block.
- [00:48:58.420]So four to seven years would be
- [00:49:00.120]a reasonable time window to
- [00:49:01.660]include those into stratified
- [00:49:03.420]environments.
- [00:49:04.600]Okay, thank you.
- [00:49:05.720]If you let me ask one more.
- [00:49:14.500]When you do your drone work,
- [00:49:16.770]the values that you get for
- [00:49:18.840]each plot, is that a mosaic of
- [00:49:21.210]10 or 15 different images that
- [00:49:24.220]have been blended together to
- [00:49:26.490]avoid the hotspot?
- [00:49:28.220]Yes.
- [00:49:28.800]Okay.
- [00:49:29.140]Yes.
- [00:49:29.600]Thank you.
- [00:49:30.220]Yeah, it's a time series.
- [00:49:31.340]Yeah, a lot of photos.
- [00:49:32.740]I have another question, Arthur.
- [00:49:35.320]Yes.
- [00:49:35.680]Is with the drone data, are
- [00:49:37.160]there other phenotypes that you
- [00:49:38.900]can get from the same images
- [00:49:40.380]other than maturity?
- [00:49:41.780]Yeah.
- [00:49:44.000]height, lodging, and pivot
- [00:49:47.510]tracks. But it's tricky to do
- [00:49:51.160]that because, yeah, the
- [00:49:54.320]correlation is not
- [00:49:56.480]very good for lodging. For
- [00:49:58.330]pivot tracks, I tried to do
- [00:50:00.090]that and it's not been that
- [00:50:01.780]successful. I
- [00:50:02.960]just communicated with our
- [00:50:05.390]group that we need to take
- [00:50:07.590]maturity notes pivot tracks
- [00:50:10.110]this year because
- [00:50:11.760]I don't think that the drone is
- [00:50:13.460]going to capture good
- [00:50:14.660]information on that. And for
- [00:50:16.420]heights,
- [00:50:17.200]it has like an, I would say,
- [00:50:19.970]intermediate prediction
- [00:50:22.500]accuracy. And the difference is
- [00:50:25.920]that you have to fly the drone
- [00:50:27.390]before planting your genotypes
- [00:50:29.050]where you are going to plant
- [00:50:30.460]them.
- [00:50:30.800]Because it's just a difference
- [00:50:32.850]between the images that you
- [00:50:35.480]capture when the plants are
- [00:50:38.470]mature or
- [00:50:39.680]fully grown from when the
- [00:50:42.760]digital soil model is extracted
- [00:50:46.720]from those images. Any
- [00:50:49.740]questions online?
- [00:50:51.600]All right, this one from Jim
- [00:50:53.850]Speck online. Relative to the
- [00:50:56.350]first part of your seminar,
- [00:50:58.720]you mentioned whether
- [00:50:59.830]irrigation versus non-irrigation
- [00:51:01.560]locations played a role in
- [00:51:02.880]grouping locations. So is
- [00:51:04.140]irrigation whether the location
- [00:51:05.620]was irrigated play a role on
- [00:51:06.820]whether...
- [00:51:09.120]No, I did not stratify that in
- [00:51:10.530]terms of irrigation and non-irrigation.
- [00:51:12.760]So I cannot conclude anything
- [00:51:15.880]about that because I did not
- [00:51:17.640]include that type of
- [00:51:18.410]information in my stratification.
- [00:51:19.920]This one from Sean Jenkins
- [00:51:21.850]online.
- [00:51:22.700]How does the yield potential of
- [00:51:24.400]a location influence the
- [00:51:25.750]location's likelihood to be
- [00:51:27.320]selected as a hub?
- [00:51:28.520]He goes on to say, are the hubs
- [00:51:30.180]typically the highest yielding
- [00:51:32.000]locations in a cluster?
- [00:51:33.560]Yes.
- [00:51:34.440]Yes, we saw that some of the
- [00:51:36.280]highest-yielding locations that
- [00:51:38.620]we've seen just by evaluating
- [00:51:40.630]the data from those last five
- [00:51:42.550]years that I'm here and, George,
- [00:51:44.880]for the last 30 years, we could
- [00:51:47.290]see that there are some high-yielding
- [00:51:50.080]locations that are really
- [00:51:51.850]considered hubs.
- [00:51:53.360]I would say Phillips and Urbana,
- [00:51:55.500]but I've not evaluated what's
- [00:51:57.440]the actual contribution of
- [00:51:59.240]those high-yielding locations.
- [00:52:02.380]But high-yielding locations,
- [00:52:04.300]they're likely going to provide
- [00:52:06.230]better accuracy because of
- [00:52:07.870]better heritability.
- [00:52:09.480]So, yeah, that might influence
- [00:52:11.750]zone de-stratification.
- [00:52:13.760]So you think you'd still have
- [00:52:15.130]high-yielding environment?
- [00:52:16.760]Even if you're selecting in
- [00:52:17.940]those environments, they'd
- [00:52:19.230]still be stable?
- [00:52:20.120]The goal was to identify stable
- [00:52:21.540]ones over that environment,
- [00:52:22.920]yeah.
- [00:52:23.380]And then one last question from
- [00:52:25.010]the business issue.
- [00:52:26.900]Can you clarify if the increase
- [00:52:28.800]in response from 0.35 to 0.51
- [00:52:31.720]with the sparse testing design
- [00:52:33.610]was from early stage or late
- [00:52:35.660]stage trials?
- [00:52:36.460]If I was using observed data to
- [00:52:39.220]do that, I could do that with
- [00:52:41.850]advanced or late.
- [00:52:44.540]But just thinking about the breeder's
- [00:52:47.880]equation, if you keep the
- [00:52:50.180]genetic variance and accuracy
- [00:52:53.530]constant, you can check the
- [00:52:55.400]selection intensity from a
- [00:52:57.800]standardized table.
- [00:53:00.040]So that's just playing with
- [00:53:01.810]numbers, playing with a
- [00:53:03.420]percentage of selected
- [00:53:04.970]individuals.
- [00:53:06.160]So that's not actually with
- [00:53:08.340]observed data.
- [00:53:09.780]That's actually with the
- [00:53:11.040]standardized table of selection
- [00:53:12.790]intensity.
- [00:53:14.540]playing with percentage of
- [00:53:16.080]selected individuals,
- [00:53:17.520]thinking that we have to
- [00:53:19.320]preserve the same number
- [00:53:21.220]of selected individuals,
- [00:53:22.400]regardless of the scenario.
- [00:53:23.780]- Excellent.
- [00:53:27.780]So, .
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/25570?format=iframe&autoplay=0" title="Video Player: Integrating Testing Optimization Frameworks into Soybean Breeding at UNL" allowfullscreen ></iframe> </div>
Comments
0 Comments