Skip to main content
Visit the University of Nebraska–Lincoln
Apply to the University of Nebraska–Lincoln
Give to the University of Nebraska–Lincoln
Visit the University of Nebraska–Lincoln
Apply to the University of Nebraska–Lincoln
Give to the University of Nebraska–Lincoln
Data-driven Modeling of Ecological Dynamics
Oct. 31, 2018 Seminar
Use the text input to search the transcript.
Click any line to jump to that spot in the video.
Use the icons to the right to toggle between list and paragraph view.
Toggle between list and paragraph view.
Okay, thanks for the introduction, Jessica.
And thank you all for coming to my talk.
I'm really happy to be here and to talk a little bit
about my research.
I'm actually going to be there through Friday,
so if you don't get a chance to talk with me
and ask questions after the talk,
I guess email Jessica,
and we'll figure out a meeting time of sorts.
Yeah, if you happen to be on Twitter,
you're welcome to the live Tweet and tag me in it.
So, the research I'm going to present
involves a lot of people from different locations.
Like to acknowledge the contributions of my collaborators,
at the Sugihara Lab, at Scripps Institution of Oceanography,
work that we did with the Pacific Biological Station
at the fisheries in Oceans Canada,
the Southern California
Coastal Ocean Observation System at Scripps,
members of Weecology Lab at the University of Florida,
and other members, other collaborators
across different universities throughout the world.
I also want to acknowledge that research and data
was collected and conducted on land
that is traditionally the traditional land
of indigenous tribes of North America.
And the talk that I'm giving today is on
the traditional land of Pawnee tribe.
If you go to this website, you can find out more information
about indigenous tribes of North America.
Alright, so as Jessica mentioned,
I have this weird background.
So, I did my formal training in Computer Science,
and then I decided I was going to go study human brain,
since I did a stint in Experimental Psychology,
And then I wound up at an Oceanographic institution,
and now I'm postdocing in a Wildlife Ecology Department.
So you know, you can interpret this in many different ways.
One possibility is that I get bored very easily.
That's probably true most of the time.
The way I like to think about it is that
I'm really interested in understanding
puzzles and complex systems.
And so, you know, going through these different fields,
there are very you know,
different kinds of interesting problems
about complex systems.
And so, the research that I'm doing now,
I'm really interested in understanding
how ecosystems change.
So, there are a lot of challenges in this area.
One of the, one of the first ones I'm gonna talk about,
is that ecosystems are complex.
So, they have lots of interacting components,
and the interactions between these components
can be nonlinear,
meaning that their effects will depend on each other.
And these interactions can also change over time
and produce effects that change over time.
So, when we're trying to understand
and model what's going on in ecosystems,
that can be a very difficult thing
to incorporate into our models.
The second kind of challenge for understanding
and modeling ecosystems is that
we don't have mathematical laws like physics or chemistry.
We don't have like fundamental equations
for gravity or how chemical reactions occur.
And so, when we want to understand,
you know, like what is the effect of temperature
on species distributions,
you know, we don't have these convenient equations
that we can like, you know, plug in our data to,
and like parameterize our equations.
And so, that can also be a problem.
So, we have these theories and we have these hypotheses,
and they can be good descriptions
of different concepts in ecology,
but when we try to apply them to real ecosystems,
you know, how they interact,
we don't always know which mechanisms will be important.
We don't always know if our descriptions of how they,
you know, these effects that we might be able to
quantify in the laboratory environment
will translate easily into real ecosystems.
So, how do I resolve these different challenges?
So, the approach that I take is to
look at using data to build models.
So, this is the approach that I'm gonna call,
So, one of the ways that I use data to build models
is to infer mechanism from time series data.
And the reason I'm going to use time series data
is because it is the most natural way to understand
how change occurs in ecosystems.
Since I'm interested in the processes and mechanisms
that produce the observed changes in ecosystems,
the most natural data that I'm going to use
are going to be the observations
of how those systems have actually change in time.
So, to give you a little bit of background
about where time series come from.
This is an example from Fluid Dynamics.
So, this is the Lorenz attractor, and the Lorenz attractor.
Alright, is the movie playing?
Okay, cool, so the movie's gonna play
So, in this system, we have three variables,
X and Y and Z
and the behavior of the system in time
is governed by these three differential equations.
So, the behavior of X and Y and Z is gonna change
depending on the other variables in the system
according to those equations
as well as the parameters that we chose.
And so, when we take the system
and we make recordings of a single variable,
so in this case, recordings of variable Y,
and we make a sequence of observations of it,
all we get in the end is this time series of Y.
So, this time series of Y is going to record
not only the changes in Y,
but because it's recording those values of Y,
the time series also is going to record
how X and Z have influenced Y.
So, in this way, you know, we can think of these time series
as capturing actually quite a lot of information
about what's going on in the system,
even if we don't yet know how to unpack that out
just from the data.
So, how are we gonna make use of this time series?
So, there's some,
there's some convenient math that we can use.
So, there's this theorem from Takens,
very spooky, I'm gonna skip all that and say,
we don't need to worry about that.
The summary of that is we can
actually rewrite all the complex dynamics of a system
in terms of lags of just one variable.
So, in the previous slide I showed you, you know,
we had this, the system in three variables X and Y and Z,
you might think, well if I were gonna simulate this
in a computer or something I would need to have,
you know, a variable X, a variable Y, a variable Z.
If I wanted to, you know, build a model from data,
I would need to make observations of X and Y and Z.
Well, Takens' Theorem says
you don't actually need all of those variables.
In fact, you can replace all of those variables
just with lags of the time series Y.
So, the way this works is instead of structuring the,
the system using coordinates of X and Y and Z,
we're gonna replace those with Y and time lags of Y.
So, here instead of X and Y and Z,
I have the time series Y, a time T,
the time series of Y at time T with a lag,
and then another lag of the same variable Y.
And so, if I use these as the coordinates,
if I just have this time series of Y,
I get this reconstruction.
And so, the idea of Takens' Theorem is that given,
you know, this time series Y
that's observed from the system,
if I have sufficient lags
and sufficient other conditions hold,
I can actually make this reconstructed
transformation of the system
that you can see on the right side here.
And that is mathematically identical
to the original system with X and Y and Z.
So, that's one way that we can use
the time series that we have to look at mechanism,
but if we wanted to actually produce models
how do you do that, right?
So, instead of those equations in X and Y and Z,
now we have these lags of Y,
how do we actually like make sense of all of that?
Alright, well, the next thing that we can do
is that we can actually take those observed patterns
and we can infer the relationships.
So, in the absence of fundamental mathematical laws,
what we can do is
we can actually reconstruct those relationships from data.
So, the way that.
So, we can generally approach this problem,
looks something like this.
So, I'm gonna say that, you know, this vector Z
is our ecosystem state.
So, you can, for whatever system you're in,
you can imagine that vector Z
has a bunch of different components
for all the important state variables of the system.
And so, at a particular time T,
you have observations of all those state variables.
So, when we want to model the dynamics of the system,
we think of it as, well,
there's some kind of transformation,
some kind of process that goes on
whereby we go from the state of the system at time T
to the state of the system at time T plus one.
So, there are some kind of, you know,
determinism that happens, you know,
the ecosystem behavior occurs
and we can now make observations at a future state.
And so, we can apply that same,
the same rules, the same mechanisms and processes
and we can continue to make observations into the future
as well as relating our current observations
to what we observe in the past.
And so, if you are in like,
traditional modeling, mathematical modeling class,
you know, one way that you would try and write
the function F would be something like this.
We're gonna break it out into different components
and we might establish some equations.
So, this is a paper I found for
like a plankton ecosystem
and you can get these equations
and then you can fit those equations to the data, right?
But what happens if you don't know what those equations are,
or if you think those equations maybe aren't that realistic?
Well, now we want to model F using some black box, right?
So, how do you model a black box?
So, luckily, you know, people in CS have been doing this
for, you know, the past decade
and, you know, making lots of money.
The answer here is machine learning.
Alright so, how machine learning works,
gonna oversimplify it a lot
and say it looks something like this.
You have inputs and outputs,
you feed this into a computer
and you run your like machine learning algorithms
and what you get out are the rules
for the relationship between the inputs and the outputs.
So, in the context of that like simplified description
of the ecosystem state changing in time,
our inputs we can think of those as
just the ecosystem state at time T,
the outputs are the ecosystems at state, at time T plus one.
So, what the future state looks like.
And then the rules are gonna be that function.
So, applying this methodology
allows us to take the data that we have,
the observations of the ecosystem state,
and infer what the function is that relates
the states at time T to the states at time T plus 1.
So, we can reconstruct those dynamics from the data.
To get a little bit more into detail
to how we actually generate those rules.
So, imagine we have our observed state,
our observed inputs, Z of T,
our corresponding outputs, Z of T plus 1,
and we want to infer just a simple function that describes,
well, how do we get from the current,
the state at time T to the state at time T plus 1?
So, I'm gonna use an example
of predicting tomorrow's weather, right?
So, suppose we don't know anything about physics
or meteorology, we just have a lot of collection,
a lot of data collection on historical weather
and we want to know what the weather looks like tomorrow.
So, one simple way to do this
is called the Lorenz Method of Analogues.
So, what we do is, for a given Z of T,
so imagine that's today's weather,
we look for its nearest neighbor in
all of our historical data.
So, we look for,
within our historical records
for data points where we have similar temperature,
similar time of year,
you know, precipitation or air pressure.
Basically we look for whatever we think are
the most similar situations
that approximate the like all the ecosystem state
that is relevant for our,
our system, our process.
And so, then what we do is,
because we have those observations of the past,
we can use the future state from those past observations
as a prediction for the weather tomorrow.
So, imagine we decide that, you know,
of all of our historical dates,
October 20, 1988, was the best approximation
or the most similar to today's weather.
Then we are gonna say that, well,
a reasonable prediction for tomorrow's weather,
is just gonna be October 21st, right?
So, we just take the past and we just advance it forward.
And so, this way we can make these predictions of weather
without actually knowing anything about physics, right?
All we did is we just looked at our historical data
and we looked for the most similar states.
So, this is a way that we can actually reconstruct mechanism
without actually, you know, needing to know about equations
and in a way that is always informed by the data.
Okay, so at this point,
you might be feeling a little bit lost, but don't worry,
I'm gonna a few examples up next to explain.
So, one study that I did in my Ph.D
was to look at recruitment dynamics
of Fraser River sockeye salmon.
So, salmon go through this kind of complicated lifecycle.
So, the adult salmon spawn
in freshwater rivers and lakes
and the eggs hatch, they grow up to be juvenile salmon
and then at some point, the juvenile salmon decide
they're gonna migrate into the ocean
and then after spending some time in the ocean,
they return back to
the rivers and streams to spawn again.
After they spawn, the adults die.
Okay, so, if you are,
you know, trying to manage this fishery
or if you are someone who is
like a fisherman and you are actually
catching these salmon to make money or,
because they're your food source,
one thing that you would want to know is,
how many adult salmon are gonna return every year.
And so, this is kind of a complicated, not complicated,
this is a challenging question to answer,
because we don't actually know,
we don't actually have good data
on the abundances of all the juvenile salmon in the rivers,
we don't have measurements of what the salmon do
once they're in the ocean.
Really, our best sources of data are going to be from when
the adult salmon are returning to spawn, and that's it.
So, we have like an idea
of how many eggs are produced by the salmon,
but, you know, throughout the rest of the lifecycle,
we don't have any data, so, we're trying to predict
how many adult salmon are gonna come back.
And, because the salmon die after they spawn,
you know, we don't have the ability to say,
well, look at how many adults we caught last year.
That might be a good, you know, estimate
for how many adults we are going to catch this year,
but, you know, those are like,
completely different cohorts of salmon,
because after they spawn, they die.
And so, you know, our measurements last year
might be completely unrelated
to how many salmon we catch this year.
So, the way that this has been done in fisheries
has been through looking at simple recruitment models
and the classical example of this is the Ricker model
which models recruitment as a function of the stock.
And so, in the case of salmon,
we have this equation where R is the recruitment,
so, the number of new adults,
and then S is the number of spawning adults.
So, in the case of the Fraser River sockeye salmon,
because their lifecycle is very closely four years,
you know, we can look at
how many adults return to spawn this year
and then four years from now that is, you know,
that we can plug in that number into this equation
and that's supposed to give us the number
of adults that we can expect four years from now.
And so, if you have collected a lot of data,
you can fit that equation to the data
get estimates for those parameters, alpha and beta,
and you can fit that model, right?
So, you get something like this.
So, if you have, you know, lots of data,
so in this case, about 60 years of observations
of spawning abundance and recruitment,
so, all of those data points, you can fit this model
and here are the best fit models that curve through, right?
So, at this point you might say, okay, this is pretty good,
right, like we have lots of data,
we can parameterize the equation,
we can get our estimates out.
But the problem is,
there's still a lot of variability and recruitment
that's unexplained just by the spawning abundance.
So, you can see on the graph, you know,
if we pick a value for the spawning abundance,
well, the model says, you know,
our best estimate is along that line.
But if we look at the data,
there's a lot of scatter along the y-axis, right?
So, what we actually observe in terms of recruitment
can vary a lot, and in some cases, you know,
between 10,000 fish and 500,000 fish.
And if you're trying to like figure out for your business
whether you can catch 10,000 fish or 500,000 fish,
that's a very big difference from year to year, right?
Hopefully we can do something
to improve our forecast models, right?
We can make,
we can make better predictions of recruitment somehow.
And if we're an ecologist,
the natural thing to think about is
what covariates can we include?
Alright, well, so,
a lot of studies have been done on salmon.
They found that there are several important
environmental covariates we might think about.
The first is that there are these
Decadal-scale Pacific climate regimes
that are linked to salmon productivity.
So, those variables, those indicator variables,
might be useful to include in our models.
The next is that when juvenile salmon first enter the ocean,
they're undergoing these physiological changes
to adapt from freshwater to saltwater.
And so, the availability of food
environmental stress can be important
for how many of those juveniles
actually survive that migration into the ocean.
So, we can think of environmental conditions
during that migration as being one critical factor
for how many juveniles will survive to adulthood.
So, the idea is, well, we should be able to include
these environmental variables.
Things like sea surface temperature,
river discharge, Pacific Decadal Oscillation
into our models and improve our predictions of recruitment.
So, how does that work?
So, from the simple Ricker Model,
we now have the Extended Ricker Model.
So, this is the simple Ricker Model.
So, recruitment as a function of stock, of stock size.
And Extended Ricker Model
adds a term for environmental covariate.
So, in this case, the environment covariate
is just gonna be indicated by T.
You can think of that as temperature.
And then there's an additional parameter
that we're gonna fit to the data called gamma.
So, that's the sensitivity of recruitment to temperature.
And so, now that we have recruitment
as a function of stock size and temperature,
we can again collect the data
and fit a model surface to it, right?
So, now our variables that, our predictors
are spawning abundance and sea surface temperature
and on this z-axis, we have recruitment.
So, with all of these data points,
we can fit a best fit model surface.
So, that's that, you know,
nice curved surface that we have there.
And this has been done by fisheries in Ocean Canada
for, you know, about a decade or so,
and what they found is that actually,
including the environment didn't improve forecasts.
So, you know, you can do model selection
to identify what the best covariates are
for different stocks, you can choose those best models
and you can make forecasts from them and it turns out that
you don't actually do any better at predicting recruitment.
Okay, so, this is a problem, right?
Like, we want to make better models of recruitment
including these environmental covariates in this way
didn't really seem to help, why might that be?
So, one thing that we can see is that
if we take the Extended Ricker Model
and we apply a log transformation,
well now, in the log space,
log recruitment is just a sum of these different terms.
So, if we think that
the effect of temperature on recruitment
might depend on whether we have a lot of adults to spawn
or very few adults to spawn, well, in this model structure,
we actually aren't able to capture that interaction, right?
What we see is that the effect of temperature
is just independent of the stock size on recruitment.
And so, in this model structure,
we don't actually have a way
to incorporate any interaction between those effects.
So, ideally we want to do, you know,
some kind of model fitting, some kind of model building,
that is able to be flexible for how those different effects
might actually interact to produce recruitment.
So, what we did is, we applied our nonlinear perspective
to build models that actually infer
what the relationship is from the data.
So, instead of saying, we have this equation
that says recruitment is this function
based on stock size and temperature,
we're gonna say, we are gonna use
our machine learning approaches to infer
what that functional relationship is from the data.
And so, again if we plot those points
and we use our method,
we get a model service that looks something like this.
So, all right, a few caveats.
The first is that we don't actually believe that
the relationship is this complex as depicted by the surface.
This just happens to be that, you know,
the way that we've done our predictions,
you know, it can be very sensitive to
a few of the data points.
The important thing that, you know,
I really want you to get out of this
is that doing the function, approximation, in this way
allows us flexibility for how
temperature and stock abundance interact.
So, here we can see that where we have lots of data,
we have observations.
Low spawning abundance,
we can see there is an effect of temperature
that we can capture in that model surface.
And in cases where we don't have a lot of data
for high spawning abundance and the effect of temperature,
we don't have those observations
about what the recruitment might be.
So, in fact, that kind of back area of the figure,
it's more, our predictions are flatter.
So, in this way we can allow the data to tell us about
the nonlinear relationship between
biology and the environment.
So, we took this approach and we applied it
to nine different populations of salmon in the system,
and we compared a couple different models.
We compared the original Ricker Model,
which was just recruitment based on stock abundance,
we looked at the Extended Ricker Model,
which was trying to incorporate
the best environmental covariates,
and then we looked at this nonlinear approach
where we allowed the data to tell us
what the relationship is
between the environment and stock abundance.
And so, what we found, you know,
kind of matched up with historical results.
So, here the original Ricker is this dash,
these dashed blue bars,
the y-axis is the accuracy of forecasts
and the x-axis are those nine different populations.
And so, going from the Ricker to the Extended Ricker,
so, including the environment in that equation context,
you get a little bit of an improvement,
but it turns out not to be significant
when we try to do this nonlinear approach to look at
the interaction between temperature and abundance.
We are able to do better in some cases
and only a little bit better or not at all in other cases.
So, you know, this seems like a reasonable approach to
try and look at interactions that we might not otherwise
be able to predict in advance.
So, in other words using the data to tell us
what are the important variables and interactions
in the system.
Okay, so, moving on to a second example.
So, this is this a study we did recently
to look at predictions of coastal algal blooms.
So, I did my PhD at Scripps which is in San Diego.
You can see that's the Scripps Pier over there.
And this is a nice map
of chlorophyll abundance from satellite.
And we have algal blooms in this area
and the really cool thing about these algal blooms
is that the algae are bioluminescent.
So, sometimes when that happens,
you can go out at night and you can take pictures
and you can actually get these like really cool
like waves where when they crashed, the algae decide,
oh no, man, something's happening, and they light up.
So, it's really cool.
So, it's a good thing they're not harmful algal blooms,
but regardless, this is like a biological phenomena
that we are, that we want to understand
and we want to make predictive models for.
Okay, so, some of our co-authors in the study,
Sir John McGowan at the
Southern California Coastal Ocean Observation System
had worked previously with another researcher at Scripps,
Dan Rudnick, to look at this problem.
And after, you know, several,
actually nearly 10 years of data collection, they decide,
they thought they had discovered what
the ideal predictor covariate was.
So, here is their data.
So, green is the chlorophyll abundance
over these 10 years, over this 10 year span,
and then blue is this temperature anomaly.
So, the idea is, if you take the temperature
measured out at the end of Scripps Pier,
at zero meters depth and five meters depth,
the difference between those tells you about
the physical conditions of the coastal ocean.
And so, when they're, when that difference is large,
the idea is that there is some kind of upwelling event
that brings nutrients into the region
and those nutrients are ideal conditions for
increases in chlorophyll, increases in plankton abundance.
Okay, so, we don't know what happened,
but they decided not to publish this.
You know, they got distracted by other projects
and, you know, this hypothesis was left aside.
More data was collected.
So, in this case 16 more years of data.
So now from 1994 to 2010
and using the same measured variables.
So, the chlorophyll abundance
and that temperature anomaly,
that relationship disappeared.
And actually, if you see that nice little gap,
that's where funding ran out,
which might be common to a lot of our studies.
So, we looked at this problem and we decided,
okay, let's go back, we don't know,
we don't think this is the best covariate
for predicting chlorophyll, what are other possibilities?
So, we looked at things like
looking at scatter plots of chlorophyll
against the physical environment, so water density,
different kinds of nutrient concentrations,
phosphate, nitrate, nitrite, as well as wind speed,
here, wind speed being, you know, a process,
a weather process that is affecting ocean upwelling,
which again, we think is one of the mechanisms
where nutrients get introduced into the surface waters
to produce algal blooms.
And so, producing all of these different scatter plots.
Well, none of them really seem to show like a good,
you know, linear one-to-one relationship.
So, it doesn't seem like there's going to be signal there,
but what we do see is that these environmental variables
do show non-random associations with chlorophyll.
So, in other words, you know,
it doesn't seem like there,
the relationship between the X,
the variable on the x-axis and the y-axis
are completely at random.
There is some kind of like relationship there,
it's just not as simple as we would hope to see
from something like, you know,
smoking and lung cancer, right?
So, we think that these covariates may be necessary,
but not sufficient predictions.
So, we want some way of being able to identify
which of these covariates are gonna be causal,
how can we best combine them?
So, our next task was to identify
which of these covariates are gonna be causal?
And so, again we can turn to this Takens' Theorem,
which says, if you just have one time series variable,
you can take the lags of that time series,
that one variable,
and you can reconstruct the system dynamics.
And so, one way that we can apply this to look at
which covariates might be causal
is we can take the lags of the affected variable,
so, this case chlorophyll,
and if we reconstruct the system dynamics
from the time series of chlorophyll,
it should contain within those dynamics
signal for the causal covariates.
Alright, so let me give you an example.
So, we have here the chlorophyll time series.
We are gonna make our reconstructed system dynamics
from lags of chlorophyll, something like this,
and then we're gonna see how well we can make a mapping
from this reconstruction of the system dynamics
to a possible causal covariate, like density.
So, we can do this for all of our
hypothesized covariates in the system
and see whether or not the signal that we get
is greater than what we'd expect by chance.
So, in identifying which covariates are causal,
we looked at again those different nutrient variables,
nitrate, phosphate, silicate, nitrite,
physical variables, temperature, salinity, density,
wind speed and rainfall.
Okay, so in this table,
we have all of these Candidate variables.
The things that we are gonna look at
are going to be the Cross-map skill.
So, here, cross-mapping is our method
that we use to identify causal covariates.
And so, we can compare that, the measure of that signal,
to what we would find
if we just tried to do a Linear Cross-correlation
between those covariates and chlorophyll.
And so, ignoring Prediction time for now.
And so, what we found is that actually
a lot of these variables do seem like
they might be causal covariates, so, yellow bolded
means significant at the 0.05 level.
So yeah, a lot of these variables look like
they are causal covariates.
The salinity was significant through correlation,
but not through our method for identifying whether the,
it was causal using this time series method.
And then we have another variable here
called Prediction time.
Alright, so, let me explain what that means.
When we are making the mapping from the
chlorophyll dynamics to those causal covariates,
what we can look at is whether that mapping occurs
with zero time lag
or whether it occurs with a negative time lag,
the idea being that if we are predicting,
if we're looking at a relationship
between our affected variable,
which is chlorophyll, and our covariates,
we should expect that the relationship is from
current chlorophyll values
to the covariates in the past, right?
So, past values of nutrients
or past values of the environment should be responsible
for affecting the current values of chlorophyll.
And so, that what we ideally always should find
is that the Prediction time for these covariates
should be negative.
And so, actually we do find that that is the case
for all of our causal covariates.
Again, salinity, the one variable
that shows up as significant using correlation
looks like it as the,
you know, the best prediction time is zero.
So, salinity now
is significantly correlated with chlorophyll now.
So, it doesn't look like it's gonna be a good
variable for making predictive forecasts.
Right, if we want to make forecasts, we want to be,
we want to have that time lag response built in.
Okay, so, by the time we kind of went through this analysis,
we identified several of these covariates
as being important,
we, you know, more data had been collected,
so we thought this was great.
We could use what we know so far
to build some predictive models
and we can try and validate those models
against the observations
that have been made in the meantime.
So, what we did is we went through the, you know,
the like 30-some year history of the data,
we built our, you know, we did our model selection.
So, we built a few best models
and then what we wanted to do was predict that,
predict over the time span for the next couple years
where we had now collected data, additional data
that we haven't used in our original analysis.
So, we looked at a couple different models
using year one, two, or three
of those environmental covariates.
We looked at the comparison against actual observations.
So, here the actual observations were in black,
and these different models are these different colors,
blue, green and red.
And we found, you know, that,
you know, they sometimes worked, sometimes didn't.
They all seemed to capture
the largest bloom that we had observed,
which is the October 2011,
in other cases, depending on the models,
they either did or did not seem to do so well.
So, you know, we're still working on trying to figure out
the best way to select the models in the future,
but in the meantime, if you're interested,
you can check out our publication.
Okay, so, what am I working on now?
So, one thing that I'm again interested in
is ecosystem change.
And so, one project that I'm working on
is to look at dynamic indicators for community change.
So, this is motivated by this problem
of understanding Ecological Regime Shifts.
So, a regime,
so ecosystems go through these like
large scale regime shifts, these abrupt changes.
Classical example comes from this 1974 study
where nutrients were added to this lake.
So, this is a photograph of the lake,
and you can see that yellow line in the middle,
that's the plastic divider that was inserted into the lake.
So, they introduced a divider
to separate the lake into these two portions
and then they added different kinds of nutrients.
So, in the top portion, carbon and nitrogen were added,
no change was found
in the resulting biological productivity.
In the bottom, carbon and nitrogen
and phosphorous were added
and we got these large algal blooms.
So, we can see like just like from this one added variable,
you can produce these large-scale changes in the system.
Alright, so, that's, you know, a pretty obvious example.
But how do we generalize this concept
of what's a regime shift?
And the reason this is a problem is
ecosystems are composed of many different components,
right, we have a lot of different species,
we have a lot of different populations,
we can measure the abundance of all those populations.
Well, if every population
increases or decreases substantially,
okay, that's a pretty obvious sign
that like something has gone on, right?
Like maybe a meteor hit and all the dinosaurs went extinct.
Okay, that's a regime shift, right?
What happens if only one population
increases or decreases?
Well, again, there are cases where,
you know, we have keystone species
like apex predators or we have like
some kind of really strongly competitive invasive species
that enters the system and that causes a large change.
So, in cases where we have one really important species
and that population undergoes a large change,
we might also be able to say, that's a regime shift.
But more commonly we have this case where
we have a bunch of different populations
and they're all fluctuating.
Sometimes some populations fluctuate a lot,
sometimes they fluctuate only a little bit.
There might be seasonal patterns
where some of the species change seasonally
and other ones are not so much.
So, how do we actually identify, you know,
what's a regime shift and what's not?
Okay, so, I'm gonna put a pin in that for now
and go to a study that I did
with some colleagues, Masayuki Ushio,
led by Ushio,
and published earlier this year called Dynamic Stability.
And so, the idea is, you know,
to take community time series
and apply some of these techniques for
modeling time series and inferring interactions
to generate a single measure,
a single quantitative measure,
for how rapidly the system is changing.
So, you can think of this Dynamic Stability measure
as a time varying analog
to the classical stability defined in Pimm,
where you either say
that the system is stable or it's unstable.
So, instead of having this one
binary variable that is fixed for the system,
what we have with Dynamic Stability approach
is a way to have a quantification
and to have that quantification change in time.
And so, the idea is with the,
with time series data, we can be able to tell
at given different points in time
whether the community is stable or is unstable.
Okay, so, brief summary of how this works.
Imagine we have these time series in this community.
So, this is example data
from the paper of a fish community.
So, each of these colored lines is a different fish species.
And so, the first thing that we do,
is we take this community data
and we build an interaction network.
So, we use again that same approach to identify
causal covariates we can apply to
each pair of fish in the system
and we can look for which interactions look like
they're occurring in the time series data.
So, identify the interactions between the species.
Alright, what do we do with the interaction network?
So, now that we know like, which interactions
we can infer from the data, we can fit population models.
So, the population model looks something like this.
And what we have here is
the populations of all of those fish species at time T
is our ecosystem state at time T.
Those future populations at time T plus one,
that's our ecosystem state at time T plus one,
and then what we are fitting from the data
are these time varying interactions.
So, all those interactions between
fish species I and fish species J goes into this matrix.
And where we have the interaction network coming in
is we only allow those parameters to vary
if there is an interaction
that we've identified in the network.
So, if there's no interaction,
we can fix that value in the matrix at zero.
So, that really simplifies like
how many parameters we're gonna fit in our model.
Okay, so now we have this
matrix of the effects through time.
How do we get some kind of single stability measure
out of that?
Well, we can do the same thing that we do
if we think of this as like a fixed interaction matrix,
which is we can look at
computing the dominant eigenvalue.
So, here this matrix,
we can compute the dominant eigenvalue.
This dominant eigenvalue is gonna change in time
and that's the thing that determines
whether the system is stable.
So, if the dominant eigenvalue happens to be
less than one at a given point in time,
that indicates that the model
tells us that perturbations to the system
are gonna decrease over time.
So, we have a stable condition.
If the dominant eigenvalue is greater than one,
then perturbations to the system increase over time
and we have an unstable condition.
Alright, so, how do we look,
how do we use this to look at regime shifts?
So, what I did is I applied this approach,
and I'm gonna use Portal as a test case.
So, Portal is a long term experimental site that my lab,
half of my lab currently runs.
So, we collect rodents, if you remember from the photo,
that's a zoom in one of our kangaroo rats
that I'm trying to weigh, it's not that happy about it.
So, we have these.
So, we have this long-running experimental site.
So, we have time series for the abundances
of all these different rodent species.
And so, they change through time
and a former student who just graduated
identified four different regime shifts in this system.
So, there were these periods of time where,
you know, through the analysis that she did
looking at the community time series,
there were these periods where the result's
large structural change in the community
using a different method.
So, I thought, this is great.
Like I have these long term community time series data,
I have this method that can look at stability,
I have a publication that identified existing regime shifts,
let's see if dynamic stability also identifies
those same time periods as being unstable, right?
Okay, so the output looks something like this.
So, over the same time span, time on the x-axis,
I have my dynamic stability measure on the y-axis
and again, the key indicator here
is whether that value is going to be above one,
in which case, it's unstable,
or below one, in which case, it's stable.
And specifically, we're looking at
those four identified regime shifts.
So, those four different time periods.
And in the ideal scenario,
Dynamic Stability should increase above one
during those regime shifts
or just slightly before those regime shifts
and then otherwise it could be stable.
Alright, so what happens
when we actually apply this to the data?
Well, we didn't really find anything.
So, we did see the Dynamic Stability measure
did change in time, so that's good.
Our like models didn't just produce like a constant value.
So, that was a good thing.
But it looks like the magnitude of the dominant eigenvalue
is kind of insensitive to these shifts in the community.
Okay, so we got stuck at this point.
We didn't know quite what to do.
But then we thought about, you know, kind of this,
like this, you know, back to the drawing board of,
well, ecosystems are really complex, right?
Like, what is this single measure actually telling us?
Well, so it's this magnitude of the dominant eigenvalue.
So, it's telling us in this model
where our population time series are,
where we're predicting the population time series,
it tells us at a given point in time
whether change in one direction
happens to be large enough to produce
instability, well, you know, maybe the thing to look at
is not just the magnitude in one direction.
Maybe the thing to look at
is really the direction of the change.
So, instead of looking at
the magnitude of that dominant eigenvalue,
we thought we could look at
the direction of the dominant eigenvector.
So, now it's a little bit more complicated.
Instead of one numerical value, which is the magnitude,
we have this seven dimensional vector.
So, the way I'm going to represent that
is for each of the species is gonna be in a different color.
And then the y-axis here is the relative
strength of that species in that eigenvector.
So, if, you know, all the species
have the same magnitude of change,
then you would expect their values
to be the same along the y-axis.
If we think that like one species is changing a lot
and the other species are not changing at all,
we would expect that one species to be,
to dominate the composition of the eigenvector
and the other species to be near zero.
And so, we can track the direction
of the eigenvector through time in this way.
And when we plot our results,
we get something that looks like this.
So, again, still very very messy, and we want to know,
well, what happens during these periods of regime shift
that have been previously identified?
Well, the cool thing we saw was that we don't know
what's going on the first half of the time series,
but in the second half of the time series,
we do see that the direction of the dominant eigenvector
does show some cool signal.
So, again zooming out to, so that's the whole time series,
we're gonna look specifically at that third regime shift.
We can see that there is a strong shift
in the direction of the dominant eigenvector.
So, this was January 1999 to January 2000.
And then again in that last regime shift,
again, the strong shift in the direction of the eigenvector.
So, between August 2009 and January 2011.
So, in some ways this makes sense, right?
So, when the rodent community is reorganizing, it's,
you know, which species are dominant is changing,
we can expect that the changes in the abundances
will be reflected by this dominant eigenvector, right?
So the dominant eigenvector tells us
which species are changing most rapidly.
When the system is undergoing a regime shift,
we should expect that
the species that are changing most rapidly
will be different in that time period
compared to the time period
immediately before and after regime shift.
So, that's what we find, preliminarily.
We're still working on the methodology
and trying to interpret what the results mean.
So, look forward to that.
Okay, so that was one of our Current Projects.
The next project that I'm looking at
is something that we're calling MATSS,
so the Macroecological
analyses of the time series structure.
And so, this is a group project with a bunch of students.
And one thing that we are really kind of trying to address
is this question of Models versus Observations.
So, we have lots of methods
for investigating ecological change,
you know, open up ecological applications,
methods in ecology and evolution,
there are a bunch of papers out there.
How often are any of these methods
applied to more than one set of observations?
Is one problem, the other problem is,
well, we also have lots of observations.
So, how often do we actually use
multiple methods on a single dataset
to see what different patterns we observe?
So, we thought these are all kind of like issues
that we might want to address.
So, how are we gonna do this?
Well, we're gonna do all the analyses on all the time series
and see what happens.
So, our goal for this project is to build a platform
for a reproducible and collaborative research on change
across a bunch of different
ecological time series datasets.
We want to enable anyone to be able to reuse
any of the data or the methods
through sharing our code and data, and then finally,
because we're lazy and because computers are great,
we are gonna automate all our analyses
and report generation.
So, this project has two different components to it.
So, the first is this R package that we're developing.
So, this includes all the code
for ingesting and cleaning different datasets
and code for all the different analysis methods
that we're gonna look at,
and the second component is this Pipeline,
so, the part that actually does the work.
And so, this Pipeline is gonna do
all of the analysis and dataset calculations,
it's gonna collect all the results and produce reports,
and ideally, it's gonna be automated.
So, we don't actually need to touch it.
As soon as we update the code for a new dataset,
or we add code for an additional analysis,
this thing should ideally run automatically
and give us all our answers
which we then have to spend time arguing about.
Okay, so, how this works.
If you are an end-user of this product,
if you want to use the data, you can install the package,
you can run functions to get the data.
So, one line, install the package, one line, get the data
boom, you have your data.
Okay, suppose you want to use one of the methods.
It's a little bit more complicated.
Well again, you can install the package,
you have to prepare the data
that you want to use the method on,
but then you can run the method on the data.
So again, just a few lines of code, you can do all of that.
So, you can actually do all this
right now on your computers, if you want.
And then if you want to contribute data or methods,
we have our project set up GitHub, you can navigate there,
and then you can follow the standard GitHub procedures,
you can contribute codes to add an individual dataset,
you can contribute an analysis method,
and a way to generate synthesized reports,
and then hopefully the automation will run
and it'll take care of all the calculations,
generate the reports.
So, we're still working on the automation bit,
but everything else is in place.
So, we haven't figured out a way quite yet
to get the computer to click the Run button.
So, right now like the human has to go there
and like run the code, but like hopefully soon,
we'll get that figured out and up and running.
So, this is building off of other kinds of work
that we're doing in the Weecology Lab.
So, a project that was recently published
that I'm not involved in, but other members are,
is this automated forecasting.
So again, we run this long-term experimental site at Portal
and all the infrastructure,
the computational infrastructure,
is in place to generate automated forecasts from the data.
So, every month, members of our teams go out
and we count how many rodents there are.
As soon as we get back, those get entered into the computer
and then stuff happens in the cloud and forecasts are made.
And if you navigate to the website, you can see
we actually already have forecasts for November rodents.
So, we don't actually need to do anything
other than collecting the data and process them.
And so, this is a really cool approach that I think
will be really valuable for ecology in the future.
Alright, so that was a lot of material.
So, to kind of wrap up.
So, I think these empirical ways of generating models
can be really valuable.
So we can look at ways of building
data-driven models from time series data
to identify and understand mechanisms that we might,
that might be difficult to understand a priori.
Ecosystems change in all of these different complex ways.
So, you know, one of the things that I think we need next
are these indicators that are dynamic,
so, changing in time, as well as multifaceted
to be able to track all these different kinds of changes.
And then, you know, a big thing that is,
you know, happening is people are being
more open and sharing of their data and their code
and we have supercomputers everywhere.
So, the kind of ideal next step for us as researchers,
I think, is really to synthesize all these things together
to provide guidance on selection of methods
and interpreting their results.
Alright, so these research projects were funded
by a lot of these different organizations,
so I want to thank all these.
And I'm happy to take any questions.
I'm interested in the,
because of the way that you're using that time series,
I'm interested in,
I'd like to hear you talk about indices
and how important that under, I don't wanna say quality,
but the underlying data that's in that time series,
if it's an index
or some more robust estimate of something.
Does that matter to these processes that you're describing?
Yeah, that's a great question, you know.
So, by indices, you mean like
uncertainty in the observations, or like,
different measurement schemes, maybe?
Am I understanding that correctly?
Well, like standard things
that a agency might, like pheasants per square mile.
Yeah. On a road survey,
rather than a double observer
or a distance-generated estimate of pheasant density.
Yeah, okay, great question.
So, that's the part of the Takens' Theorem
that I glossed over, which was under suitable conditions,
suitable conditions meaning like
your data ARBs are perfectly
and that you have an infinite amount of it.
So yeah, definitely one of the challenges,
like when you have transformations of data
and the data are observed with noise and then those,
the systems has dynamics that are nonlinear and complex,
weird things are gonna happen when you try
and reconstruct the dynamics just from the time series.
So, I don't have a great answer to kind of,
you know, how sensitive the methods are.
It really is going to depend on the individual system.
You know, there are, we are looking at
kind of extending the methods or rather,
building variants of the methods
that are more amenable to noise.
So, ways of actually estimating
some of these coefficients more robustly,
making predictions with uncertainty built into them,
that's, you know, that's definitely
something that we're working on.
So, I didn't touch on that,
but it's, yeah, stuff that's in the works, for sure.
Thanks, Hao, that is really awesome.
I'm trying to understand the Dynamic Stability index,
and if I...
So, the elements of that matrix A?
are there basically gonna be the ratio of?
So, let's just say one of them has a, there's one,
there's a link between species one and species four,
the element relevant to that is going to be
the ratio of species one and species four
between those two time steps, is that right?
So, we don't, we don't fix the,
the diagonal of that matrix,
which is like the self affect on the future,
you know, the math kind of works out
that regardless of like exactly how you set it up
as either the change is, like the first derivative
is that effect of matrix times the population
or the future population is the matrix times the population.
The dominant eigenvalue should be the same
between those different like representations.
Whether that, whether the threshold is that one or zero
does depend on like whether you're doing the thing
in discreet time or continuous time,
but yeah, mathematically it doesn't really
make that a big difference, but yeah.
So, then the eigenvector
is like, is sort of like the attractor in a sense?
Is that what you mean by the direction?
So the system might not actually be at that eigenvector,
but if it kept, but if that persisted,
then that's the direction it would go?
Yeah, the dominant direction it would go, right.
Because we're looking at the eigenvector
associated with the largest eigenvalue.
I was just wondering as a grad student,
these methods seem pretty cutting-edge
and I was just wondering what kind of resources
you would recommend for grad students
that could try and incorporate the equation-free approach
for their kind of studies?
Yeah, good question.
I can get out of this, right?
Okay, so I have an R package I'm still working on
like the publication, but if you go to...
So, it's called R EDM for Empirical Dynamic Modeling.
If you go to the website on GitHub,
or on CRAN, you can find your way to the GitHub page,
and you can click the link for the docs
and you should be able to find, under Reference?
Or article, is that article?
Yeah, there is a tutorial.
And this is a really long description
of kind of a little bit of the theory,
a little bit about how to do,
apply all the functions
and then some real data examples towards the end.
So, as far as like a primer or primer goes on the methods
and the software to use it,
yeah, refer to this and feel free to email me.
Cool, thank you,
I guess I'll go, so.
A lot of the methods that you used in the projects
that you presented today and the ongoing projects
seem to have long high-quality, as Larga mentioned,
time series that appear to be periodic in nature.
And I've tried to understand the idiom,
but a lot of it goes over my head.
So, a lot of our time series
that aren't as long as the datasets that you've been using
will likely not have these periodic trends.
So, how sensitive is EDM
and these other methods that you've been playing around with
to that, to the nature of like nonperiodic data?
Yeah, so, a kind of good rule of thumb for
how long the time series needs to be is
roughly 30 data points in time
and/or five generation times.
So, you know, take the, take the characteristic time scale
of the dynamic you're trying to model
and multiply that by five,
you know, ideally the, you know, you should have,
like if it's periodic that you have enough observations
to capture like that signal.
In cases where
we don't have that.
Adam Clark published a paper on using spacial replicates.
So, if you have a lot of replicate
plots, where you only have like a couple data points
in each of those, how to combine that together
to apply this methodology.
So, that's one way to get around it, it's not ideal.
There are other things we're looking at
to look at
doing analyses in the absence of time,
but where you have multiple covariates.
Whether that is really good for recovering the mechanism,
I'm agnostic and skeptical about.
What I think it might be good at is hypothesis generation.
So, suggesting that like
these combination of covariates might be important
that might, that is something that I think
it is possible if you don't have time series data,
but I think if you want to make
slightly stronger inferences about mechanism
you are gonna have to have like observations of changes.
I also really appreciated the fact that you put words like
regime and stability in quotes,
because as we discussed yesterday in length,
they're jargon, so they can be interpreted
by each of us in this room in a different way.
So, going back to that Christensen,
I think Christensen et al time series
of the rodentia community,
you guys highlighted regime shifts
that were of varying length.
So, I think they varied between,
I don't know, some months to a handful of years.
So, I guess I wonder if you might comment on
how you claim something is a regime shift
and at what period of time is it in that regime shift?
Are you basing this on your stability metric, I guess?
Does that make sense?
Yeah, so, the Christensen et al paper,
the regime shifts were,
they are identified using a change point model
that is probabilistic, so you are inserting
change points in the time series structure
and then selecting for,
you know, letting the Bayesian stuff work at,
work out how many change points there should be
and where they, you know,
what time point they are correct.
So, those periods are,
I think they're actually like
the 90% credible intervals for
the location of the change points.
So, after like throwing in the data into the model,
the posterior distribution suggests that there is
a change point somewhere in that
time span of 90% credibility.
That's kind of how that method worked.
Does that answer the question?
Sort of? Yeah, sort of for now.
Okay, we'll talk later.
Yeah, for sure.
Are there any other questions?
I just want to say, I'm a complete fan
of the whole automated prediction thing
and I'm working at, we're doing some stuff up here.
We're gonna be following in your footsteps.
So, thanks for putting together all the packages
and saying this is a good thing to do, because I agree.
Yeah, it also seems like really impossible
to do all that stuff that you're gonna do
with the students in the White Ernest Lab.
Yeah, I commend you.
So, thank you so much.
Please join me in thanking Dr. Hao Ye.
The screen size you are trying to search captions on is too small!
You can always
jump over to MediaHub
and check it out there.
Log in to post comments
icon arrow down
School of Natural Resources
iframe embed code:
Copy the following code into your page
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/10148?format=iframe&autoplay=0" title="Video Player: Data-driven Modeling of Ecological Dynamics" allowfullscreen></iframe> </div>