Data-driven Modeling of Ecological Dynamics

Hao Ye Author

11/01/2018 Added

137 Plays

Description

Oct. 31, 2018 Seminar

Searchable Transcript

Search:

[00:00:00.540]Okay, thanks for the introduction, Jessica.
[00:00:04.220]And thank you all for coming to my talk.
[00:00:06.802]I'm really happy to be here and to talk a little bit
[00:00:10.000]about my research.
[00:00:11.500]I'm actually going to be there through Friday,
[00:00:14.830]so if you don't get a chance to talk with me
[00:00:17.590]and ask questions after the talk,
[00:00:19.320]I guess email Jessica,
[00:00:20.630]and we'll figure out a meeting time of sorts.
[00:00:25.440]Yeah, if you happen to be on Twitter,
[00:00:28.180]you're welcome to the live Tweet and tag me in it.
[00:00:32.000]So, the research I'm going to present
[00:00:34.290]involves a lot of people from different locations.
[00:00:38.400]Like to acknowledge the contributions of my collaborators,
[00:00:43.060]at the Sugihara Lab, at Scripps Institution of Oceanography,
[00:00:47.040]work that we did with the Pacific Biological Station
[00:00:50.660]at the fisheries in Oceans Canada,
[00:00:52.910]the Southern California
[00:00:55.020]Coastal Ocean Observation System at Scripps,
[00:00:58.410]members of Weecology Lab at the University of Florida,
[00:01:01.050]and other members, other collaborators
[00:01:03.310]across different universities throughout the world.
[00:01:08.410]I also want to acknowledge that research and data
[00:01:10.860]was collected and conducted on land
[00:01:14.010]that is traditionally the traditional land
[00:01:17.550]of indigenous tribes of North America.
[00:01:21.070]And the talk that I'm giving today is on
[00:01:24.377]the traditional land of Pawnee tribe.
[00:01:26.640]If you go to this website, you can find out more information
[00:01:29.920]about indigenous tribes of North America.
[00:01:33.930]Alright, so as Jessica mentioned,
[00:01:35.357]I have this weird background.
[00:01:37.660]So, I did my formal training in Computer Science,
[00:01:40.520]and then I decided I was going to go study human brain,
[00:01:45.050]since I did a stint in Experimental Psychology,
[00:01:47.870]And then I wound up at an Oceanographic institution,
[00:01:50.800]and now I'm postdocing in a Wildlife Ecology Department.
[00:01:54.960]So you know, you can interpret this in many different ways.
[00:01:58.160]One possibility is that I get bored very easily.
[00:02:02.070]That's probably true most of the time.
[00:02:05.591]The way I like to think about it is that
[00:02:07.970]I'm really interested in understanding
[00:02:11.920]puzzles and complex systems.
[00:02:14.780]And so, you know, going through these different fields,
[00:02:17.340]there are very you know,
[00:02:19.070]different kinds of interesting problems
[00:02:20.750]about complex systems.
[00:02:22.410]And so, the research that I'm doing now,
[00:02:25.930]I'm really interested in understanding
[00:02:27.910]how ecosystems change.
[00:02:30.660]So, there are a lot of challenges in this area.
[00:02:34.850]One of the, one of the first ones I'm gonna talk about,
[00:02:36.960]is that ecosystems are complex.
[00:02:39.430]So, they have lots of interacting components,
[00:02:42.290]and the interactions between these components
[00:02:44.250]can be nonlinear,
[00:02:45.590]meaning that their effects will depend on each other.
[00:02:48.810]And these interactions can also change over time
[00:02:51.890]and produce effects that change over time.
[00:02:53.610]So, when we're trying to understand
[00:02:56.210]and model what's going on in ecosystems,
[00:02:58.510]that can be a very difficult thing
[00:03:00.430]to incorporate into our models.
[00:03:04.420]The second kind of challenge for understanding
[00:03:06.380]and modeling ecosystems is that
[00:03:08.600]we don't have mathematical laws like physics or chemistry.
[00:03:12.360]We don't have like fundamental equations
[00:03:15.150]for gravity or how chemical reactions occur.
[00:03:18.900]And so, when we want to understand,
[00:03:20.610]you know, like what is the effect of temperature
[00:03:23.290]on species distributions,
[00:03:25.220]you know, we don't have these convenient equations
[00:03:27.700]that we can like, you know, plug in our data to,
[00:03:30.300]and like parameterize our equations.
[00:03:32.650]And so, that can also be a problem.
[00:03:35.070]So, we have these theories and we have these hypotheses,
[00:03:38.020]and they can be good descriptions
[00:03:39.580]of different concepts in ecology,
[00:03:41.870]but when we try to apply them to real ecosystems,
[00:03:45.100]you know, how they interact,
[00:03:47.120]we don't always know which mechanisms will be important.
[00:03:51.050]We don't always know if our descriptions of how they,
[00:03:54.700]you know, these effects that we might be able to
[00:03:58.550]quantify in the laboratory environment
[00:04:00.660]will translate easily into real ecosystems.
[00:04:05.920]So, how do I resolve these different challenges?
[00:04:09.220]So, the approach that I take is to
[00:04:12.220]look at using data to build models.
[00:04:14.530]So, this is the approach that I'm gonna call,
[00:04:16.370]Data-driven Modeling.
[00:04:18.640]So, one of the ways that I use data to build models
[00:04:23.310]is to infer mechanism from time series data.
[00:04:26.920]And the reason I'm going to use time series data
[00:04:29.300]is because it is the most natural way to understand
[00:04:32.420]how change occurs in ecosystems.
[00:04:34.720]Since I'm interested in the processes and mechanisms
[00:04:38.180]that produce the observed changes in ecosystems,
[00:04:41.500]the most natural data that I'm going to use
[00:04:43.730]are going to be the observations
[00:04:44.850]of how those systems have actually change in time.
[00:04:49.140]So, to give you a little bit of background
[00:04:51.490]about where time series come from.
[00:04:53.610]This is an example from Fluid Dynamics.
[00:04:56.030]So, this is the Lorenz attractor, and the Lorenz attractor.
[00:05:00.560]Alright, is the movie playing?
[00:05:01.990]Okay, cool, so the movie's gonna play
[00:05:05.380]So, in this system, we have three variables,
[00:05:08.160]X and Y and Z
[00:05:09.940]and the behavior of the system in time
[00:05:12.380]is governed by these three differential equations.
[00:05:15.040]So, the behavior of X and Y and Z is gonna change
[00:05:18.190]depending on the other variables in the system
[00:05:21.260]according to those equations
[00:05:22.730]as well as the parameters that we chose.
[00:05:26.040]And so, when we take the system
[00:05:29.240]and we make recordings of a single variable,
[00:05:31.640]so in this case, recordings of variable Y,
[00:05:33.727]and we make a sequence of observations of it,
[00:05:36.180]all we get in the end is this time series of Y.
[00:05:40.640]So, this time series of Y is going to record
[00:05:42.940]not only the changes in Y,
[00:05:44.650]but because it's recording those values of Y,
[00:05:47.740]the time series also is going to record
[00:05:49.900]how X and Z have influenced Y.
[00:05:52.630]So, in this way, you know, we can think of these time series
[00:05:55.180]as capturing actually quite a lot of information
[00:05:57.270]about what's going on in the system,
[00:05:59.180]even if we don't yet know how to unpack that out
[00:06:02.100]just from the data.
[00:06:04.800]Alright.
[00:06:05.633]So, how are we gonna make use of this time series?
[00:06:08.630]So, there's some,
[00:06:10.320]there's some convenient math that we can use.
[00:06:13.640]So, there's this theorem from Takens,
[00:06:17.870]very spooky, I'm gonna skip all that and say,
[00:06:20.800]we don't need to worry about that.
[00:06:22.330]The summary of that is we can
[00:06:24.530]actually rewrite all the complex dynamics of a system
[00:06:27.750]in terms of lags of just one variable.
[00:06:30.150]So, in the previous slide I showed you, you know,
[00:06:33.020]we had this, the system in three variables X and Y and Z,
[00:06:37.220]you might think, well if I were gonna simulate this
[00:06:39.230]in a computer or something I would need to have,
[00:06:42.520]you know, a variable X, a variable Y, a variable Z.
[00:06:45.030]If I wanted to, you know, build a model from data,
[00:06:48.270]I would need to make observations of X and Y and Z.
[00:06:51.020]Well, Takens' Theorem says
[00:06:52.170]you don't actually need all of those variables.
[00:06:54.600]In fact, you can replace all of those variables
[00:06:56.690]just with lags of the time series Y.
[00:07:00.200]So, the way this works is instead of structuring the,
[00:07:04.967]the system using coordinates of X and Y and Z,
[00:07:08.410]we're gonna replace those with Y and time lags of Y.
[00:07:13.400]So, here instead of X and Y and Z,
[00:07:16.040]I have the time series Y, a time T,
[00:07:18.821]the time series of Y at time T with a lag,
[00:07:21.480]and then another lag of the same variable Y.
[00:07:25.030]And so, if I use these as the coordinates,
[00:07:27.250]if I just have this time series of Y,
[00:07:29.470]I get this reconstruction.
[00:07:32.380]And so, the idea of Takens' Theorem is that given,
[00:07:37.600]you know, this time series Y
[00:07:38.780]that's observed from the system,
[00:07:40.550]if I have sufficient lags
[00:07:42.490]and sufficient other conditions hold,
[00:07:44.460]I can actually make this reconstructed
[00:07:48.350]transformation of the system
[00:07:49.960]that you can see on the right side here.
[00:07:52.120]And that is mathematically identical
[00:07:54.240]to the original system with X and Y and Z.
[00:08:00.600]Okay.
[00:08:01.433]So, that's one way that we can use
[00:08:03.180]the time series that we have to look at mechanism,
[00:08:06.740]but if we wanted to actually produce models
[00:08:08.830]how do you do that, right?
[00:08:09.663]So, instead of those equations in X and Y and Z,
[00:08:12.280]now we have these lags of Y,
[00:08:13.780]how do we actually like make sense of all of that?
[00:08:15.950]Alright, well, the next thing that we can do
[00:08:18.260]is that we can actually take those observed patterns
[00:08:21.090]and we can infer the relationships.
[00:08:24.100]So, in the absence of fundamental mathematical laws,
[00:08:27.090]what we can do is
[00:08:27.923]we can actually reconstruct those relationships from data.
[00:08:31.253]So, the way that.
[00:08:32.200]So, we can generally approach this problem,
[00:08:34.944]looks something like this.
[00:08:36.440]So, I'm gonna say that, you know, this vector Z
[00:08:40.370]is our ecosystem state.
[00:08:42.010]So, you can, for whatever system you're in,
[00:08:44.750]you can imagine that vector Z
[00:08:46.990]has a bunch of different components
[00:08:48.440]for all the important state variables of the system.
[00:08:51.280]And so, at a particular time T,
[00:08:53.050]you have observations of all those state variables.
[00:08:56.680]So, when we want to model the dynamics of the system,
[00:08:59.490]we think of it as, well,
[00:09:00.730]there's some kind of transformation,
[00:09:02.220]some kind of process that goes on
[00:09:03.900]whereby we go from the state of the system at time T
[00:09:06.920]to the state of the system at time T plus one.
[00:09:09.390]So, there are some kind of, you know,
[00:09:11.440]determinism that happens, you know,
[00:09:13.360]the ecosystem behavior occurs
[00:09:16.190]and we can now make observations at a future state.
[00:09:18.930]And so, we can apply that same,
[00:09:20.620]the same rules, the same mechanisms and processes
[00:09:23.290]and we can continue to make observations into the future
[00:09:25.950]as well as relating our current observations
[00:09:28.280]to what we observe in the past.
[00:09:31.407]And so, if you are in like,
[00:09:33.770]traditional modeling, mathematical modeling class,
[00:09:36.320]you know, one way that you would try and write
[00:09:38.790]the function F would be something like this.
[00:09:41.560]We're gonna break it out into different components
[00:09:44.640]and we might establish some equations.
[00:09:46.860]So, this is a paper I found for
[00:09:50.780]like a plankton ecosystem
[00:09:52.460]and you can get these equations
[00:09:54.150]and then you can fit those equations to the data, right?
[00:09:56.710]But what happens if you don't know what those equations are,
[00:09:59.670]or if you think those equations maybe aren't that realistic?
[00:10:04.280]Well, now we want to model F using some black box, right?
[00:10:08.930]So, how do you model a black box?
[00:10:11.580]So, luckily, you know, people in CS have been doing this
[00:10:15.210]for, you know, the past decade
[00:10:16.850]and, you know, making lots of money.
[00:10:18.640]The answer here is machine learning.
[00:10:21.460]Alright so, how machine learning works,
[00:10:23.590]gonna oversimplify it a lot
[00:10:25.270]and say it looks something like this.
[00:10:26.710]You have inputs and outputs,
[00:10:29.580]you feed this into a computer
[00:10:31.617]and you run your like machine learning algorithms
[00:10:34.230]and what you get out are the rules
[00:10:35.720]for the relationship between the inputs and the outputs.
[00:10:39.040]So, in the context of that like simplified description
[00:10:42.250]of the ecosystem state changing in time,
[00:10:44.880]our inputs we can think of those as
[00:10:46.490]just the ecosystem state at time T,
[00:10:48.920]the outputs are the ecosystems at state, at time T plus one.
[00:10:52.620]So, what the future state looks like.
[00:10:54.770]And then the rules are gonna be that function.
[00:10:56.740]So, applying this methodology
[00:10:58.950]allows us to take the data that we have,
[00:11:01.374]the observations of the ecosystem state,
[00:11:03.230]and infer what the function is that relates
[00:11:06.820]the states at time T to the states at time T plus 1.
[00:11:09.210]So, we can reconstruct those dynamics from the data.
[00:11:13.410]Alright, so.
[00:11:15.750]To get a little bit more into detail
[00:11:17.650]to how we actually generate those rules.
[00:11:20.230]So, imagine we have our observed state,
[00:11:22.600]our observed inputs, Z of T,
[00:11:25.300]our corresponding outputs, Z of T plus 1,
[00:11:27.990]and we want to infer just a simple function that describes,
[00:11:31.100]well, how do we get from the current,
[00:11:33.200]the state at time T to the state at time T plus 1?
[00:11:35.820]So, I'm gonna use an example
[00:11:37.280]of predicting tomorrow's weather, right?
[00:11:39.670]So, suppose we don't know anything about physics
[00:11:42.190]or meteorology, we just have a lot of collection,
[00:11:45.330]a lot of data collection on historical weather
[00:11:49.310]and we want to know what the weather looks like tomorrow.
[00:11:51.790]So, one simple way to do this
[00:11:54.370]is called the Lorenz Method of Analogues.
[00:11:57.050]So, what we do is, for a given Z of T,
[00:12:00.890]so imagine that's today's weather,
[00:12:02.980]we look for its nearest neighbor in
[00:12:06.190]all of our historical data.
[00:12:07.690]So, we look for,
[00:12:10.580]within our historical records
[00:12:12.130]for data points where we have similar temperature,
[00:12:15.830]similar time of year,
[00:12:19.130]similar,
[00:12:21.530]you know, precipitation or air pressure.
[00:12:23.960]Basically we look for whatever we think are
[00:12:26.990]the most similar situations
[00:12:28.930]that approximate the like all the ecosystem state
[00:12:32.470]that is relevant for our,
[00:12:35.140]our system, our process.
[00:12:37.290]And so, then what we do is,
[00:12:38.310]because we have those observations of the past,
[00:12:40.600]we can use the future state from those past observations
[00:12:44.430]as a prediction for the weather tomorrow.
[00:12:47.480]So, imagine we decide that, you know,
[00:12:49.770]of all of our historical dates,
[00:12:51.610]October 20, 1988, was the best approximation
[00:12:55.080]or the most similar to today's weather.
[00:12:57.360]Then we are gonna say that, well,
[00:12:59.550]a reasonable prediction for tomorrow's weather,
[00:13:01.290]is just gonna be October 21st, right?
[00:13:03.620]So, we just take the past and we just advance it forward.
[00:13:06.880]And so, this way we can make these predictions of weather
[00:13:10.310]without actually knowing anything about physics, right?
[00:13:12.380]All we did is we just looked at our historical data
[00:13:15.477]and we looked for the most similar states.
[00:13:19.740]So, this is a way that we can actually reconstruct mechanism
[00:13:23.240]without actually, you know, needing to know about equations
[00:13:26.550]and in a way that is always informed by the data.
[00:13:31.040]Okay, so at this point,
[00:13:32.330]you might be feeling a little bit lost, but don't worry,
[00:13:35.100]I'm gonna a few examples up next to explain.
[00:13:43.770]So, one study that I did in my Ph.D
[00:13:47.450]was to look at recruitment dynamics
[00:13:49.240]of Fraser River sockeye salmon.
[00:13:51.890]So, salmon go through this kind of complicated lifecycle.
[00:13:57.400]So, the adult salmon spawn
[00:14:00.440]in freshwater rivers and lakes
[00:14:02.640]and the eggs hatch, they grow up to be juvenile salmon
[00:14:06.250]and then at some point, the juvenile salmon decide
[00:14:08.240]they're gonna migrate into the ocean
[00:14:10.000]and then after spending some time in the ocean,
[00:14:11.980]they return back to
[00:14:13.640]the rivers and streams to spawn again.
[00:14:16.150]After they spawn, the adults die.
[00:14:19.020]Okay, so, if you are,
[00:14:21.080]you know, trying to manage this fishery
[00:14:23.110]or if you are someone who is
[00:14:26.490]like a fisherman and you are actually
[00:14:28.390]catching these salmon to make money or,
[00:14:32.240]because they're your food source,
[00:14:35.030]one thing that you would want to know is,
[00:14:36.950]how many adult salmon are gonna return every year.
[00:14:40.910]And so, this is kind of a complicated, not complicated,
[00:14:44.960]this is a challenging question to answer,
[00:14:47.430]because we don't actually know,
[00:14:49.350]we don't actually have good data
[00:14:50.680]on the abundances of all the juvenile salmon in the rivers,
[00:14:54.320]we don't have measurements of what the salmon do
[00:14:57.550]once they're in the ocean.
[00:14:58.840]Really, our best sources of data are going to be from when
[00:15:01.470]the adult salmon are returning to spawn, and that's it.
[00:15:05.920]So, we.
[00:15:07.560]So, we have like an idea
[00:15:09.030]of how many eggs are produced by the salmon,
[00:15:11.520]but, you know, throughout the rest of the lifecycle,
[00:15:13.830]we don't have any data, so, we're trying to predict
[00:15:15.720]how many adult salmon are gonna come back.
[00:15:17.770]And, because the salmon die after they spawn,
[00:15:20.530]you know, we don't have the ability to say,
[00:15:22.180]well, look at how many adults we caught last year.
[00:15:24.670]That might be a good, you know, estimate
[00:15:26.740]for how many adults we are going to catch this year,
[00:15:28.930]but, you know, those are like,
[00:15:29.900]completely different cohorts of salmon,
[00:15:31.340]because after they spawn, they die.
[00:15:32.750]And so, you know, our measurements last year
[00:15:34.580]might be completely unrelated
[00:15:35.930]to how many salmon we catch this year.
[00:15:38.710]So, the way that this has been done in fisheries
[00:15:41.170]has been through looking at simple recruitment models
[00:15:43.480]and the classical example of this is the Ricker model
[00:15:46.160]which models recruitment as a function of the stock.
[00:15:49.210]And so, in the case of salmon,
[00:15:50.920]we have this equation where R is the recruitment,
[00:15:54.690]so, the number of new adults,
[00:15:56.290]and then S is the number of spawning adults.
[00:15:58.960]So, in the case of the Fraser River sockeye salmon,
[00:16:01.650]because their lifecycle is very closely four years,
[00:16:05.830]you know, we can look at
[00:16:07.060]how many adults return to spawn this year
[00:16:10.200]and then four years from now that is, you know,
[00:16:12.640]that we can plug in that number into this equation
[00:16:15.870]and that's supposed to give us the number
[00:16:17.590]of adults that we can expect four years from now.
[00:16:21.790]And so, if you have collected a lot of data,
[00:16:24.360]you can fit that equation to the data
[00:16:26.350]get estimates for those parameters, alpha and beta,
[00:16:30.830]and you can fit that model, right?
[00:16:32.130]So, you get something like this.
[00:16:33.090]So, if you have, you know, lots of data,
[00:16:35.190]so in this case, about 60 years of observations
[00:16:37.800]of spawning abundance and recruitment,
[00:16:39.750]so, all of those data points, you can fit this model
[00:16:43.330]and here are the best fit models that curve through, right?
[00:16:46.680]So, at this point you might say, okay, this is pretty good,
[00:16:49.130]right, like we have lots of data,
[00:16:50.290]we can parameterize the equation,
[00:16:51.980]we can get our estimates out.
[00:16:54.100]But the problem is,
[00:16:55.200]there's still a lot of variability and recruitment
[00:16:57.170]that's unexplained just by the spawning abundance.
[00:17:00.140]So, you can see on the graph, you know,
[00:17:02.010]if we pick a value for the spawning abundance,
[00:17:05.320]well, the model says, you know,
[00:17:06.950]our best estimate is along that line.
[00:17:09.138]But if we look at the data,
[00:17:10.120]there's a lot of scatter along the y-axis, right?
[00:17:12.880]So, what we actually observe in terms of recruitment
[00:17:15.250]can vary a lot, and in some cases, you know,
[00:17:17.200]between 10,000 fish and 500,000 fish.
[00:17:21.260]And if you're trying to like figure out for your business
[00:17:24.150]whether you can catch 10,000 fish or 500,000 fish,
[00:17:27.440]that's a very big difference from year to year, right?
[00:17:31.060]Alright, so.
[00:17:32.780]Hopefully we can do something
[00:17:34.420]to improve our forecast models, right?
[00:17:36.390]We can make,
[00:17:37.360]we can make better predictions of recruitment somehow.
[00:17:40.140]And if we're an ecologist,
[00:17:41.940]the natural thing to think about is
[00:17:43.560]what covariates can we include?
[00:17:46.050]Alright, well, so,
[00:17:47.250]a lot of studies have been done on salmon.
[00:17:49.430]They found that there are several important
[00:17:51.200]environmental covariates we might think about.
[00:17:53.370]The first is that there are these
[00:17:54.880]Decadal-scale Pacific climate regimes
[00:17:57.170]that are linked to salmon productivity.
[00:17:58.880]So, those variables, those indicator variables,
[00:18:01.300]might be useful to include in our models.
[00:18:03.820]The next is that when juvenile salmon first enter the ocean,
[00:18:07.120]they're undergoing these physiological changes
[00:18:09.340]to adapt from freshwater to saltwater.
[00:18:11.580]And so, the availability of food
[00:18:14.460]environmental stress can be important
[00:18:16.300]for how many of those juveniles
[00:18:17.790]actually survive that migration into the ocean.
[00:18:20.180]So, we can think of environmental conditions
[00:18:22.020]during that migration as being one critical factor
[00:18:25.190]for how many juveniles will survive to adulthood.
[00:18:29.580]So, the idea is, well, we should be able to include
[00:18:33.070]these environmental variables.
[00:18:34.230]Things like sea surface temperature,
[00:18:35.820]river discharge, Pacific Decadal Oscillation
[00:18:39.260]into our models and improve our predictions of recruitment.
[00:18:43.110]So, how does that work?
[00:18:44.830]So, from the simple Ricker Model,
[00:18:47.160]we now have the Extended Ricker Model.
[00:18:49.250]So, this is the simple Ricker Model.
[00:18:50.700]So, recruitment as a function of stock, of stock size.
[00:18:56.070]And Extended Ricker Model
[00:18:57.210]adds a term for environmental covariate.
[00:19:00.030]So, in this case, the environment covariate
[00:19:01.670]is just gonna be indicated by T.
[00:19:03.630]You can think of that as temperature.
[00:19:05.550]And then there's an additional parameter
[00:19:06.810]that we're gonna fit to the data called gamma.
[00:19:09.040]So, that's the sensitivity of recruitment to temperature.
[00:19:13.180]And so, now that we have recruitment
[00:19:16.030]as a function of stock size and temperature,
[00:19:18.690]we can again collect the data
[00:19:20.590]and fit a model surface to it, right?
[00:19:23.240]So, now our variables that, our predictors
[00:19:25.690]are spawning abundance and sea surface temperature
[00:19:28.870]and on this z-axis, we have recruitment.
[00:19:32.690]So, with all of these data points,
[00:19:34.400]we can fit a best fit model surface.
[00:19:36.640]So, that's that, you know,
[00:19:38.910]nice curved surface that we have there.
[00:19:41.230]And this has been done by fisheries in Ocean Canada
[00:19:43.960]for, you know, about a decade or so,
[00:19:45.690]and what they found is that actually,
[00:19:47.320]including the environment didn't improve forecasts.
[00:19:50.090]So, you know, you can do model selection
[00:19:53.440]to identify what the best covariates are
[00:19:56.680]for different stocks, you can choose those best models
[00:19:59.470]and you can make forecasts from them and it turns out that
[00:20:02.570]you don't actually do any better at predicting recruitment.
[00:20:06.680]Okay, so, this is a problem, right?
[00:20:08.660]Like, we want to make better models of recruitment
[00:20:12.800]including these environmental covariates in this way
[00:20:14.830]didn't really seem to help, why might that be?
[00:20:18.309]So, one thing that we can see is that
[00:20:20.360]if we take the Extended Ricker Model
[00:20:22.260]and we apply a log transformation,
[00:20:25.490]well now, in the log space,
[00:20:27.120]log recruitment is just a sum of these different terms.
[00:20:30.790]So, if we think that
[00:20:32.730]the effect of temperature on recruitment
[00:20:34.690]might depend on whether we have a lot of adults to spawn
[00:20:37.870]or very few adults to spawn, well, in this model structure,
[00:20:41.440]we actually aren't able to capture that interaction, right?
[00:20:44.433]What we see is that the effect of temperature
[00:20:47.580]is just independent of the stock size on recruitment.
[00:20:52.290]And so, in this model structure,
[00:20:54.160]we don't actually have a way
[00:20:55.740]to incorporate any interaction between those effects.
[00:20:59.740]So, ideally we want to do, you know,
[00:21:01.970]some kind of model fitting, some kind of model building,
[00:21:03.867]that is able to be flexible for how those different effects
[00:21:07.700]might actually interact to produce recruitment.
[00:21:11.510]So, what we did is, we applied our nonlinear perspective
[00:21:15.930]to build models that actually infer
[00:21:19.370]what the relationship is from the data.
[00:21:21.800]So, instead of saying, we have this equation
[00:21:24.060]that says recruitment is this function
[00:21:26.140]based on stock size and temperature,
[00:21:28.330]we're gonna say, we are gonna use
[00:21:30.090]our machine learning approaches to infer
[00:21:31.820]what that functional relationship is from the data.
[00:21:35.890]And so, again if we plot those points
[00:21:37.740]and we use our method,
[00:21:39.070]we get a model service that looks something like this.
[00:21:42.760]So, all right, a few caveats.
[00:21:45.390]The first is that we don't actually believe that
[00:21:47.680]the relationship is this complex as depicted by the surface.
[00:21:51.350]This just happens to be that, you know,
[00:21:54.120]the way that we've done our predictions,
[00:21:57.120]you know, it can be very sensitive to
[00:21:59.020]a few of the data points.
[00:22:00.850]The important thing that, you know,
[00:22:02.590]I really want you to get out of this
[00:22:04.860]is that doing the function, approximation, in this way
[00:22:09.070]allows us flexibility for how
[00:22:10.900]temperature and stock abundance interact.
[00:22:13.140]So, here we can see that where we have lots of data,
[00:22:16.400]we have observations.
[00:22:18.040]Low spawning abundance,
[00:22:19.130]we can see there is an effect of temperature
[00:22:21.220]that we can capture in that model surface.
[00:22:23.570]And in cases where we don't have a lot of data
[00:22:26.040]for high spawning abundance and the effect of temperature,
[00:22:30.050]we don't have those observations
[00:22:31.590]about what the recruitment might be.
[00:22:33.500]So, in fact, that kind of back area of the figure,
[00:22:37.670]it's more, our predictions are flatter.
[00:22:40.566]So, in this way we can allow the data to tell us about
[00:22:43.730]the nonlinear relationship between
[00:22:45.500]biology and the environment.
[00:22:48.290]So, we took this approach and we applied it
[00:22:50.990]to nine different populations of salmon in the system,
[00:22:54.770]and we compared a couple different models.
[00:22:56.810]We compared the original Ricker Model,
[00:22:59.170]which was just recruitment based on stock abundance,
[00:23:02.720]we looked at the Extended Ricker Model,
[00:23:04.620]which was trying to incorporate
[00:23:05.900]the best environmental covariates,
[00:23:07.800]and then we looked at this nonlinear approach
[00:23:11.100]where we allowed the data to tell us
[00:23:13.410]what the relationship is
[00:23:14.340]between the environment and stock abundance.
[00:23:17.690]And so, what we found, you know,
[00:23:19.100]kind of matched up with historical results.
[00:23:21.070]So, here the original Ricker is this dash,
[00:23:24.150]these dashed blue bars,
[00:23:25.510]the y-axis is the accuracy of forecasts
[00:23:28.770]and the x-axis are those nine different populations.
[00:23:33.420]And so, going from the Ricker to the Extended Ricker,
[00:23:35.990]so, including the environment in that equation context,
[00:23:39.410]you get a little bit of an improvement,
[00:23:40.690]but it turns out not to be significant
[00:23:42.770]when we try to do this nonlinear approach to look at
[00:23:46.230]the interaction between temperature and abundance.
[00:23:49.060]We are able to do better in some cases
[00:23:52.185]and only a little bit better or not at all in other cases.
[00:23:55.360]So, you know, this seems like a reasonable approach to
[00:23:58.540]try and look at interactions that we might not otherwise
[00:24:02.750]be able to predict in advance.
[00:24:04.850]So, in other words using the data to tell us
[00:24:07.740]what are the important variables and interactions
[00:24:11.100]in the system.
[00:24:14.180]Okay, so, moving on to a second example.
[00:24:16.930]So, this is this a study we did recently
[00:24:20.450]to look at predictions of coastal algal blooms.
[00:24:25.880]So, I did my PhD at Scripps which is in San Diego.
[00:24:29.040]You can see that's the Scripps Pier over there.
[00:24:31.430]And this is a nice map
[00:24:33.630]of chlorophyll abundance from satellite.
[00:24:36.250]And we have algal blooms in this area
[00:24:38.650]and the really cool thing about these algal blooms
[00:24:40.440]is that the algae are bioluminescent.
[00:24:42.960]So, sometimes when that happens,
[00:24:44.930]you can go out at night and you can take pictures
[00:24:48.090]and you can actually get these like really cool
[00:24:49.990]like waves where when they crashed, the algae decide,
[00:24:52.810]oh no, man, something's happening, and they light up.
[00:24:55.720]So, it's really cool.
[00:24:56.750]So, it's a good thing they're not harmful algal blooms,
[00:24:59.280]but regardless, this is like a biological phenomena
[00:25:03.200]that we are, that we want to understand
[00:25:05.710]and we want to make predictive models for.
[00:25:08.550]Okay, so, some of our co-authors in the study,
[00:25:11.780]Sir John McGowan at the
[00:25:13.890]Southern California Coastal Ocean Observation System
[00:25:17.060]had worked previously with another researcher at Scripps,
[00:25:20.450]Dan Rudnick, to look at this problem.
[00:25:22.620]And after, you know, several,
[00:25:25.650]actually nearly 10 years of data collection, they decide,
[00:25:29.350]they thought they had discovered what
[00:25:31.207]the ideal predictor covariate was.
[00:25:35.360]So, here is their data.
[00:25:37.760]So, green is the chlorophyll abundance
[00:25:42.090]over these 10 years, over this 10 year span,
[00:25:44.840]and then blue is this temperature anomaly.
[00:25:46.990]So, the idea is, if you take the temperature
[00:25:50.490]measured out at the end of Scripps Pier,
[00:25:52.520]at zero meters depth and five meters depth,
[00:25:55.130]the difference between those tells you about
[00:25:57.350]the physical conditions of the coastal ocean.
[00:26:01.470]And so, when they're, when that difference is large,
[00:26:04.010]the idea is that there is some kind of upwelling event
[00:26:06.710]that brings nutrients into the region
[00:26:08.410]and those nutrients are ideal conditions for
[00:26:12.220]increases in chlorophyll, increases in plankton abundance.
[00:26:16.650]Okay, so, we don't know what happened,
[00:26:20.027]but they decided not to publish this.
[00:26:21.800]You know, they got distracted by other projects
[00:26:23.400]and, you know, this hypothesis was left aside.
[00:26:28.340]More data was collected.
[00:26:30.630]So, in this case 16 more years of data.
[00:26:32.410]So now from 1994 to 2010
[00:26:34.930]and using the same measured variables.
[00:26:37.700]So, the chlorophyll abundance
[00:26:39.180]and that temperature anomaly,
[00:26:40.800]that relationship disappeared.
[00:26:43.440]And actually, if you see that nice little gap,
[00:26:46.670]that's where funding ran out,
[00:26:48.890]which might be common to a lot of our studies.
[00:26:52.090]So, we looked at this problem and we decided,
[00:26:54.000]okay, let's go back, we don't know,
[00:26:55.800]we don't think this is the best covariate
[00:26:57.930]for predicting chlorophyll, what are other possibilities?
[00:27:01.850]So, we looked at things like
[00:27:04.470]looking at scatter plots of chlorophyll
[00:27:06.160]against the physical environment, so water density,
[00:27:10.100]different kinds of nutrient concentrations,
[00:27:12.590]phosphate, nitrate, nitrite, as well as wind speed,
[00:27:16.590]here, wind speed being, you know, a process,
[00:27:20.070]a weather process that is affecting ocean upwelling,
[00:27:23.780]which again, we think is one of the mechanisms
[00:27:25.930]where nutrients get introduced into the surface waters
[00:27:29.080]to produce algal blooms.
[00:27:31.430]And so, producing all of these different scatter plots.
[00:27:33.810]Well, none of them really seem to show like a good,
[00:27:36.800]you know, linear one-to-one relationship.
[00:27:39.850]So, it doesn't seem like there's going to be signal there,
[00:27:43.030]but what we do see is that these environmental variables
[00:27:45.420]do show non-random associations with chlorophyll.
[00:27:48.620]So, in other words, you know,
[00:27:50.670]it doesn't seem like there,
[00:27:52.450]the relationship between the X,
[00:27:54.190]the variable on the x-axis and the y-axis
[00:27:55.970]are completely at random.
[00:27:56.930]There is some kind of like relationship there,
[00:27:59.420]it's just not as simple as we would hope to see
[00:28:02.460]from something like, you know,
[00:28:05.730]smoking and lung cancer, right?
[00:28:08.850]So, we think that these covariates may be necessary,
[00:28:11.330]but not sufficient predictions.
[00:28:12.830]So, we want some way of being able to identify
[00:28:14.930]which of these covariates are gonna be causal,
[00:28:17.510]how can we best combine them?
[00:28:20.170]So, our next task was to identify
[00:28:22.330]which of these covariates are gonna be causal?
[00:28:24.530]And so, again we can turn to this Takens' Theorem,
[00:28:27.420]which says, if you just have one time series variable,
[00:28:30.560]you can take the lags of that time series,
[00:28:32.530]that one variable,
[00:28:33.760]and you can reconstruct the system dynamics.
[00:28:36.680]And so, one way that we can apply this to look at
[00:28:39.160]which covariates might be causal
[00:28:40.860]is we can take the lags of the affected variable,
[00:28:44.750]so, this case chlorophyll,
[00:28:47.070]and if we reconstruct the system dynamics
[00:28:48.840]from the time series of chlorophyll,
[00:28:50.480]it should contain within those dynamics
[00:28:52.820]signal for the causal covariates.
[00:28:55.558]Alright, so let me give you an example.
[00:28:57.410]So, we have here the chlorophyll time series.
[00:28:59.634]We are gonna make our reconstructed system dynamics
[00:29:02.290]from lags of chlorophyll, something like this,
[00:29:05.540]and then we're gonna see how well we can make a mapping
[00:29:09.010]from this reconstruction of the system dynamics
[00:29:11.600]to a possible causal covariate, like density.
[00:29:14.110]So, we can do this for all of our
[00:29:16.610]hypothesized covariates in the system
[00:29:18.540]and see whether or not the signal that we get
[00:29:20.980]is greater than what we'd expect by chance.
[00:29:24.780]So, in identifying which covariates are causal,
[00:29:28.000]we looked at again those different nutrient variables,
[00:29:31.210]nitrate, phosphate, silicate, nitrite,
[00:29:34.770]physical variables, temperature, salinity, density,
[00:29:37.070]wind speed and rainfall.
[00:29:40.620]Okay, so in this table,
[00:29:42.570]we have all of these Candidate variables.
[00:29:44.350]The things that we are gonna look at
[00:29:46.210]are going to be the Cross-map skill.
[00:29:48.540]So, here, cross-mapping is our method
[00:29:51.410]that we use to identify causal covariates.
[00:29:53.710]And so, we can compare that, the measure of that signal,
[00:29:56.700]to what we would find
[00:29:57.790]if we just tried to do a Linear Cross-correlation
[00:30:00.200]between those covariates and chlorophyll.
[00:30:03.000]And so, ignoring Prediction time for now.
[00:30:05.709]And so, what we found is that actually
[00:30:06.950]a lot of these variables do seem like
[00:30:09.110]they might be causal covariates, so, yellow bolded
[00:30:14.010]means significant at the 0.05 level.
[00:30:17.429]So yeah, a lot of these variables look like
[00:30:19.990]they are causal covariates.
[00:30:21.940]The salinity was significant through correlation,
[00:30:25.940]but not through our method for identifying whether the,
[00:30:29.980]it was causal using this time series method.
[00:30:32.010]And then we have another variable here
[00:30:34.600]called Prediction time.
[00:30:36.760]Alright, so, let me explain what that means.
[00:30:38.650]When we are making the mapping from the
[00:30:41.010]chlorophyll dynamics to those causal covariates,
[00:30:43.810]what we can look at is whether that mapping occurs
[00:30:47.650]with zero time lag
[00:30:49.690]or whether it occurs with a negative time lag,
[00:30:51.980]the idea being that if we are predicting,
[00:30:55.040]if we're looking at a relationship
[00:30:56.380]between our affected variable,
[00:30:58.450]which is chlorophyll, and our covariates,
[00:31:00.830]we should expect that the relationship is from
[00:31:04.570]current chlorophyll values
[00:31:06.510]to the covariates in the past, right?
[00:31:09.380]So, past values of nutrients
[00:31:11.960]or past values of the environment should be responsible
[00:31:14.740]for affecting the current values of chlorophyll.
[00:31:16.990]And so, that what we ideally always should find
[00:31:19.430]is that the Prediction time for these covariates
[00:31:21.620]should be negative.
[00:31:23.410]And so, actually we do find that that is the case
[00:31:25.520]for all of our causal covariates.
[00:31:27.580]Again, salinity, the one variable
[00:31:29.810]that shows up as significant using correlation
[00:31:32.760]looks like it as the,
[00:31:35.130]you know, the best prediction time is zero.
[00:31:37.070]So, salinity now
[00:31:39.520]is significantly correlated with chlorophyll now.
[00:31:43.610]So, it doesn't look like it's gonna be a good
[00:31:46.480]variable for making predictive forecasts.
[00:31:48.950]Right, if we want to make forecasts, we want to be,
[00:31:51.020]we want to have that time lag response built in.
[00:31:55.260]Okay, so, by the time we kind of went through this analysis,
[00:31:58.430]we identified several of these covariates
[00:32:00.360]as being important,
[00:32:02.640]we, you know, more data had been collected,
[00:32:04.900]so we thought this was great.
[00:32:05.990]We could use what we know so far
[00:32:08.030]to build some predictive models
[00:32:09.550]and we can try and validate those models
[00:32:11.250]against the observations
[00:32:12.083]that have been made in the meantime.
[00:32:14.810]So, what we did is we went through the, you know,
[00:32:17.290]the like 30-some year history of the data,
[00:32:21.470]we built our, you know, we did our model selection.
[00:32:24.400]So, we built a few best models
[00:32:26.190]and then what we wanted to do was predict that,
[00:32:29.350]predict over the time span for the next couple years
[00:32:33.570]where we had now collected data, additional data
[00:32:36.800]that we haven't used in our original analysis.
[00:32:40.310]So, we looked at a couple different models
[00:32:41.950]using year one, two, or three
[00:32:43.990]of those environmental covariates.
[00:32:46.610]We looked at the comparison against actual observations.
[00:32:50.120]So, here the actual observations were in black,
[00:32:52.950]and these different models are these different colors,
[00:32:56.380]blue, green and red.
[00:32:58.127]And we found, you know, that,
[00:33:00.870]you know, they sometimes worked, sometimes didn't.
[00:33:04.040]They all seemed to capture
[00:33:05.530]the largest bloom that we had observed,
[00:33:08.090]which is the October 2011,
[00:33:10.510]in other cases, depending on the models,
[00:33:12.751]they either did or did not seem to do so well.
[00:33:16.620]So, you know, we're still working on trying to figure out
[00:33:20.390]the best way to select the models in the future,
[00:33:24.395]but in the meantime, if you're interested,
[00:33:26.850]you can check out our publication.
[00:33:30.270]Okay, so, what am I working on now?
[00:33:32.920]So, one thing that I'm again interested in
[00:33:35.860]is ecosystem change.
[00:33:37.690]And so, one project that I'm working on
[00:33:40.040]is to look at dynamic indicators for community change.
[00:33:44.100]So, this is motivated by this problem
[00:33:46.770]of understanding Ecological Regime Shifts.
[00:33:50.260]So, a regime,
[00:33:52.080]so ecosystems go through these like
[00:33:53.880]large scale regime shifts, these abrupt changes.
[00:33:57.910]Classical example comes from this 1974 study
[00:34:01.810]where nutrients were added to this lake.
[00:34:05.060]So, this is a photograph of the lake,
[00:34:07.200]and you can see that yellow line in the middle,
[00:34:09.260]that's the plastic divider that was inserted into the lake.
[00:34:12.140]So, they introduced a divider
[00:34:14.560]to separate the lake into these two portions
[00:34:17.250]and then they added different kinds of nutrients.
[00:34:19.740]So, in the top portion, carbon and nitrogen were added,
[00:34:23.010]no change was found
[00:34:24.010]in the resulting biological productivity.
[00:34:26.960]In the bottom, carbon and nitrogen
[00:34:29.590]and phosphorous were added
[00:34:30.690]and we got these large algal blooms.
[00:34:33.410]So, we can see like just like from this one added variable,
[00:34:37.570]you can produce these large-scale changes in the system.
[00:34:41.210]Alright, so, that's, you know, a pretty obvious example.
[00:34:45.240]But how do we generalize this concept
[00:34:47.200]of what's a regime shift?
[00:34:49.760]And the reason this is a problem is
[00:34:51.690]ecosystems are composed of many different components,
[00:34:54.060]right, we have a lot of different species,
[00:34:56.080]we have a lot of different populations,
[00:34:57.750]we can measure the abundance of all those populations.
[00:35:00.430]Well, if every population
[00:35:02.420]increases or decreases substantially,
[00:35:04.910]okay, that's a pretty obvious sign
[00:35:06.370]that like something has gone on, right?
[00:35:07.910]Like maybe a meteor hit and all the dinosaurs went extinct.
[00:35:10.570]Okay, that's a regime shift, right?
[00:35:13.830]What happens if only one population
[00:35:15.780]increases or decreases?
[00:35:16.980]Well, again, there are cases where,
[00:35:19.160]you know, we have keystone species
[00:35:20.610]like apex predators or we have like
[00:35:23.370]some kind of really strongly competitive invasive species
[00:35:27.040]that enters the system and that causes a large change.
[00:35:29.920]So, in cases where we have one really important species
[00:35:33.310]and that population undergoes a large change,
[00:35:35.030]we might also be able to say, that's a regime shift.
[00:35:37.920]But more commonly we have this case where
[00:35:39.800]we have a bunch of different populations
[00:35:41.430]and they're all fluctuating.
[00:35:43.070]Sometimes some populations fluctuate a lot,
[00:35:45.390]sometimes they fluctuate only a little bit.
[00:35:47.270]There might be seasonal patterns
[00:35:48.620]where some of the species change seasonally
[00:35:50.930]and other ones are not so much.
[00:35:53.060]So, how do we actually identify, you know,
[00:35:55.260]what's a regime shift and what's not?
[00:35:57.795]Okay, so, I'm gonna put a pin in that for now
[00:36:01.120]and go to a study that I did
[00:36:04.970]with some colleagues, Masayuki Ushio,
[00:36:08.890]led by Ushio,
[00:36:10.120]and published earlier this year called Dynamic Stability.
[00:36:13.150]And so, the idea is, you know,
[00:36:15.000]to take community time series
[00:36:17.360]and apply some of these techniques for
[00:36:19.730]modeling time series and inferring interactions
[00:36:22.930]to generate a single measure,
[00:36:24.820]a single quantitative measure,
[00:36:26.230]for how rapidly the system is changing.
[00:36:29.310]So, you can think of this Dynamic Stability measure
[00:36:32.300]as a time varying analog
[00:36:34.680]to the classical stability defined in Pimm,
[00:36:38.160]where you either say
[00:36:39.560]that the system is stable or it's unstable.
[00:36:41.840]So, instead of having this one
[00:36:43.600]binary variable that is fixed for the system,
[00:36:47.100]what we have with Dynamic Stability approach
[00:36:49.380]is a way to have a quantification
[00:36:52.260]and to have that quantification change in time.
[00:36:54.650]And so, the idea is with the,
[00:36:56.290]with time series data, we can be able to tell
[00:36:59.270]at given different points in time
[00:37:01.080]whether the community is stable or is unstable.
[00:37:06.280]Okay, so, brief summary of how this works.
[00:37:08.940]Imagine we have these time series in this community.
[00:37:11.460]So, this is example data
[00:37:12.970]from the paper of a fish community.
[00:37:16.720]So, each of these colored lines is a different fish species.
[00:37:21.140]And so, the first thing that we do,
[00:37:22.003]is we take this community data
[00:37:24.220]and we build an interaction network.
[00:37:25.830]So, we use again that same approach to identify
[00:37:28.420]causal covariates we can apply to
[00:37:30.920]each pair of fish in the system
[00:37:34.590]and we can look for which interactions look like
[00:37:37.400]they're occurring in the time series data.
[00:37:39.490]So, identify the interactions between the species.
[00:37:42.930]Alright, what do we do with the interaction network?
[00:37:44.960]So, now that we know like, which interactions
[00:37:47.660]we can infer from the data, we can fit population models.
[00:37:51.370]So, the population model looks something like this.
[00:37:54.270]And what we have here is
[00:37:56.830]the populations of all of those fish species at time T
[00:38:01.460]is our ecosystem state at time T.
[00:38:04.870]Those future populations at time T plus one,
[00:38:07.250]that's our ecosystem state at time T plus one,
[00:38:09.570]and then what we are fitting from the data
[00:38:13.180]are these time varying interactions.
[00:38:15.100]So, all those interactions between
[00:38:17.230]fish species I and fish species J goes into this matrix.
[00:38:21.850]And where we have the interaction network coming in
[00:38:25.480]is we only allow those parameters to vary
[00:38:29.840]if there is an interaction
[00:38:31.270]that we've identified in the network.
[00:38:33.050]So, if there's no interaction,
[00:38:34.220]we can fix that value in the matrix at zero.
[00:38:37.450]So, that really simplifies like
[00:38:38.900]how many parameters we're gonna fit in our model.
[00:38:42.830]Okay, so now we have this
[00:38:43.890]matrix of the effects through time.
[00:38:46.230]How do we get some kind of single stability measure
[00:38:50.350]out of that?
[00:38:51.470]Well, we can do the same thing that we do
[00:38:53.620]if we think of this as like a fixed interaction matrix,
[00:38:57.120]which is we can look at
[00:39:00.369]computing the dominant eigenvalue.
[00:39:01.960]So, here this matrix,
[00:39:03.540]we can compute the dominant eigenvalue.
[00:39:06.190]This dominant eigenvalue is gonna change in time
[00:39:07.813]and that's the thing that determines
[00:39:09.580]whether the system is stable.
[00:39:11.050]So, if the dominant eigenvalue happens to be
[00:39:13.680]less than one at a given point in time,
[00:39:15.620]that indicates that the model
[00:39:19.030]tells us that perturbations to the system
[00:39:21.240]are gonna decrease over time.
[00:39:22.950]So, we have a stable condition.
[00:39:24.950]If the dominant eigenvalue is greater than one,
[00:39:27.370]then perturbations to the system increase over time
[00:39:29.337]and we have an unstable condition.
[00:39:32.470]Alright, so, how do we look,
[00:39:34.830]how do we use this to look at regime shifts?
[00:39:36.870]So, what I did is I applied this approach,
[00:39:39.880]and I'm gonna use Portal as a test case.
[00:39:42.300]So, Portal is a long term experimental site that my lab,
[00:39:47.870]half of my lab currently runs.
[00:39:49.550]So, we collect rodents, if you remember from the photo,
[00:39:52.260]that's a zoom in one of our kangaroo rats
[00:39:54.750]that I'm trying to weigh, it's not that happy about it.
[00:39:59.050]So, we have these.
[00:40:00.200]So, we have this long-running experimental site.
[00:40:02.290]So, we have time series for the abundances
[00:40:04.040]of all these different rodent species.
[00:40:06.840]And so, they change through time
[00:40:08.990]and a former student who just graduated
[00:40:13.570]identified four different regime shifts in this system.
[00:40:15.930]So, there were these periods of time where,
[00:40:19.790]you know, through the analysis that she did
[00:40:24.110]looking at the community time series,
[00:40:26.550]there were these periods where the result's
[00:40:28.150]large structural change in the community
[00:40:31.060]using a different method.
[00:40:33.340]So, I thought, this is great.
[00:40:35.630]Like I have these long term community time series data,
[00:40:38.860]I have this method that can look at stability,
[00:40:40.840]I have a publication that identified existing regime shifts,
[00:40:45.320]let's see if dynamic stability also identifies
[00:40:48.240]those same time periods as being unstable, right?
[00:40:52.270]Okay, so the output looks something like this.
[00:40:55.080]So, over the same time span, time on the x-axis,
[00:40:59.080]I have my dynamic stability measure on the y-axis
[00:41:01.710]and again, the key indicator here
[00:41:03.690]is whether that value is going to be above one,
[00:41:06.010]in which case, it's unstable,
[00:41:07.780]or below one, in which case, it's stable.
[00:41:10.360]And specifically, we're looking at
[00:41:11.880]those four identified regime shifts.
[00:41:13.690]So, those four different time periods.
[00:41:16.970]And in the ideal scenario,
[00:41:19.670]Dynamic Stability should increase above one
[00:41:22.520]during those regime shifts
[00:41:23.620]or just slightly before those regime shifts
[00:41:25.660]and then otherwise it could be stable.
[00:41:28.160]Alright, so what happens
[00:41:28.993]when we actually apply this to the data?
[00:41:30.830]Well, we didn't really find anything.
[00:41:33.830]So, we did see the Dynamic Stability measure
[00:41:35.830]did change in time, so that's good.
[00:41:37.960]Our like models didn't just produce like a constant value.
[00:41:42.780]So, that was a good thing.
[00:41:44.279]But it looks like the magnitude of the dominant eigenvalue
[00:41:46.720]is kind of insensitive to these shifts in the community.
[00:41:51.516]Okay, so we got stuck at this point.
[00:41:53.490]We didn't know quite what to do.
[00:41:55.100]But then we thought about, you know, kind of this,
[00:41:57.473]like this, you know, back to the drawing board of,
[00:42:00.520]well, ecosystems are really complex, right?
[00:42:02.200]Like, what is this single measure actually telling us?
[00:42:06.110]Well, so it's this magnitude of the dominant eigenvalue.
[00:42:09.400]So, it's telling us in this model
[00:42:11.660]where our population time series are,
[00:42:15.790]where we're predicting the population time series,
[00:42:18.110]it tells us at a given point in time
[00:42:21.260]whether change in one direction
[00:42:24.230]happens to be large enough to produce
[00:42:28.870]instability, well, you know, maybe the thing to look at
[00:42:33.040]is not just the magnitude in one direction.
[00:42:35.900]Maybe the thing to look at
[00:42:37.110]is really the direction of the change.
[00:42:39.790]So, instead of looking at
[00:42:41.110]the magnitude of that dominant eigenvalue,
[00:42:43.530]we thought we could look at
[00:42:44.710]the direction of the dominant eigenvector.
[00:42:47.050]So, now it's a little bit more complicated.
[00:42:50.560]Instead of one numerical value, which is the magnitude,
[00:42:53.850]we have this seven dimensional vector.
[00:42:56.170]So, the way I'm going to represent that
[00:42:57.640]is for each of the species is gonna be in a different color.
[00:43:00.530]And then the y-axis here is the relative
[00:43:03.760]strength of that species in that eigenvector.
[00:43:07.113]So, if, you know, all the species
[00:43:09.420]have the same magnitude of change,
[00:43:11.400]then you would expect their values
[00:43:13.010]to be the same along the y-axis.
[00:43:15.270]If we think that like one species is changing a lot
[00:43:18.640]and the other species are not changing at all,
[00:43:20.450]we would expect that one species to be,
[00:43:23.040]to dominate the composition of the eigenvector
[00:43:24.957]and the other species to be near zero.
[00:43:27.790]And so, we can track the direction
[00:43:29.030]of the eigenvector through time in this way.
[00:43:32.360]And when we plot our results,
[00:43:35.720]we get something that looks like this.
[00:43:37.460]So, again, still very very messy, and we want to know,
[00:43:40.920]well, what happens during these periods of regime shift
[00:43:43.280]that have been previously identified?
[00:43:46.200]Well, the cool thing we saw was that we don't know
[00:43:49.500]what's going on the first half of the time series,
[00:43:51.240]but in the second half of the time series,
[00:43:53.040]we do see that the direction of the dominant eigenvector
[00:43:55.570]does show some cool signal.
[00:43:58.530]So, again zooming out to, so that's the whole time series,
[00:44:03.890]we're gonna look specifically at that third regime shift.
[00:44:06.863]We can see that there is a strong shift
[00:44:08.910]in the direction of the dominant eigenvector.
[00:44:12.070]So, this was January 1999 to January 2000.
[00:44:16.400]And then again in that last regime shift,
[00:44:18.450]again, the strong shift in the direction of the eigenvector.
[00:44:21.480]So, between August 2009 and January 2011.
[00:44:25.260]So, in some ways this makes sense, right?
[00:44:28.730]So, when the rodent community is reorganizing, it's,
[00:44:33.180]you know, which species are dominant is changing,
[00:44:35.810]we can expect that the changes in the abundances
[00:44:40.570]will be reflected by this dominant eigenvector, right?
[00:44:43.040]So the dominant eigenvector tells us
[00:44:46.010]which species are changing most rapidly.
[00:44:48.110]When the system is undergoing a regime shift,
[00:44:50.330]we should expect that
[00:44:51.440]the species that are changing most rapidly
[00:44:53.430]will be different in that time period
[00:44:55.130]compared to the time period
[00:44:56.770]immediately before and after regime shift.
[00:44:58.880]So, that's what we find, preliminarily.
[00:45:02.620]We're still working on the methodology
[00:45:05.040]and trying to interpret what the results mean.
[00:45:07.280]So, look forward to that.
[00:45:10.450]Okay, so that was one of our Current Projects.
[00:45:12.810]The next project that I'm looking at
[00:45:15.200]is something that we're calling MATSS,
[00:45:17.160]so the Macroecological
[00:45:18.220]analyses of the time series structure.
[00:45:20.040]And so, this is a group project with a bunch of students.
[00:45:23.570]And one thing that we are really kind of trying to address
[00:45:27.890]is this question of Models versus Observations.
[00:45:30.740]So, we have lots of methods
[00:45:32.120]for investigating ecological change,
[00:45:34.530]you know, open up ecological applications,
[00:45:37.140]methods in ecology and evolution,
[00:45:38.736]there are a bunch of papers out there.
[00:45:40.700]How often are any of these methods
[00:45:42.093]applied to more than one set of observations?
[00:45:45.737]Is one problem, the other problem is,
[00:45:47.280]well, we also have lots of observations.
[00:45:49.040]So, how often do we actually use
[00:45:51.170]multiple methods on a single dataset
[00:45:53.020]to see what different patterns we observe?
[00:45:56.590]So, we thought these are all kind of like issues
[00:45:59.060]that we might want to address.
[00:46:00.260]So, how are we gonna do this?
[00:46:02.310]Well, we're gonna do all the analyses on all the time series
[00:46:05.680]and see what happens.
[00:46:07.760]So, our goal for this project is to build a platform
[00:46:10.790]for a reproducible and collaborative research on change
[00:46:13.710]across a bunch of different
[00:46:15.260]ecological time series datasets.
[00:46:17.100]We want to enable anyone to be able to reuse
[00:46:19.710]any of the data or the methods
[00:46:21.010]through sharing our code and data, and then finally,
[00:46:24.010]because we're lazy and because computers are great,
[00:46:26.560]we are gonna automate all our analyses
[00:46:28.330]and report generation.
[00:46:30.692]Okay.
[00:46:31.525]So, this project has two different components to it.
[00:46:34.270]So, the first is this R package that we're developing.
[00:46:36.700]So, this includes all the code
[00:46:38.290]for ingesting and cleaning different datasets
[00:46:41.290]and code for all the different analysis methods
[00:46:43.860]that we're gonna look at,
[00:46:45.453]and the second component is this Pipeline,
[00:46:47.950]so, the part that actually does the work.
[00:46:49.977]And so, this Pipeline is gonna do
[00:46:51.600]all of the analysis and dataset calculations,
[00:46:54.450]it's gonna collect all the results and produce reports,
[00:46:58.010]and ideally, it's gonna be automated.
[00:46:59.680]So, we don't actually need to touch it.
[00:47:01.530]As soon as we update the code for a new dataset,
[00:47:06.070]or we add code for an additional analysis,
[00:47:09.000]this thing should ideally run automatically
[00:47:11.450]and give us all our answers
[00:47:12.910]which we then have to spend time arguing about.
[00:47:16.740]Okay, so, how this works.
[00:47:18.390]If you are an end-user of this product,
[00:47:22.660]if you want to use the data, you can install the package,
[00:47:25.580]you can run functions to get the data.
[00:47:28.580]So, one line, install the package, one line, get the data
[00:47:32.010]boom, you have your data.
[00:47:34.256]Okay, suppose you want to use one of the methods.
[00:47:36.890]It's a little bit more complicated.
[00:47:37.890]Well again, you can install the package,
[00:47:39.780]you have to prepare the data
[00:47:40.920]that you want to use the method on,
[00:47:42.560]but then you can run the method on the data.
[00:47:45.920]So again, just a few lines of code, you can do all of that.
[00:47:49.240]So, you can actually do all this
[00:47:50.660]right now on your computers, if you want.
[00:47:53.370]And then if you want to contribute data or methods,
[00:47:55.560]we have our project set up GitHub, you can navigate there,
[00:47:59.350]and then you can follow the standard GitHub procedures,
[00:48:02.520]you can contribute codes to add an individual dataset,
[00:48:05.440]you can contribute an analysis method,
[00:48:07.550]and a way to generate synthesized reports,
[00:48:10.600]and then hopefully the automation will run
[00:48:13.430]and it'll take care of all the calculations,
[00:48:15.130]generate the reports.
[00:48:16.250]So, we're still working on the automation bit,
[00:48:17.950]but everything else is in place.
[00:48:19.460]So, we haven't figured out a way quite yet
[00:48:22.570]to get the computer to click the Run button.
[00:48:24.880]So, right now like the human has to go there
[00:48:26.540]and like run the code, but like hopefully soon,
[00:48:28.630]we'll get that figured out and up and running.
[00:48:32.160]So, this is building off of other kinds of work
[00:48:35.240]that we're doing in the Weecology Lab.
[00:48:37.370]So, a project that was recently published
[00:48:40.170]that I'm not involved in, but other members are,
[00:48:42.500]is this automated forecasting.
[00:48:45.540]So again, we run this long-term experimental site at Portal
[00:48:49.810]and all the infrastructure,
[00:48:51.400]the computational infrastructure,
[00:48:52.430]is in place to generate automated forecasts from the data.
[00:48:55.520]So, every month, members of our teams go out
[00:48:58.780]and we count how many rodents there are.
[00:49:00.900]As soon as we get back, those get entered into the computer
[00:49:03.620]and then stuff happens in the cloud and forecasts are made.
[00:49:08.260]And if you navigate to the website, you can see
[00:49:10.480]we actually already have forecasts for November rodents.
[00:49:13.920]So, we don't actually need to do anything
[00:49:16.140]other than collecting the data and process them.
[00:49:18.660]And so, this is a really cool approach that I think
[00:49:21.470]will be really valuable for ecology in the future.
[00:49:26.080]Alright, so that was a lot of material.
[00:49:27.550]So, to kind of wrap up.
[00:49:29.440]So, I think these empirical ways of generating models
[00:49:32.710]can be really valuable.
[00:49:33.770]So we can look at ways of building
[00:49:36.730]data-driven models from time series data
[00:49:38.390]to identify and understand mechanisms that we might,
[00:49:41.950]that might be difficult to understand a priori.
[00:49:45.580]Ecosystems change in all of these different complex ways.
[00:49:48.380]So, you know, one of the things that I think we need next
[00:49:52.080]are these indicators that are dynamic,
[00:49:54.430]so, changing in time, as well as multifaceted
[00:49:57.150]to be able to track all these different kinds of changes.
[00:50:00.510]And then, you know, a big thing that is,
[00:50:02.650]you know, happening is people are being
[00:50:05.470]more open and sharing of their data and their code
[00:50:08.480]and we have supercomputers everywhere.
[00:50:11.340]So, the kind of ideal next step for us as researchers,
[00:50:14.390]I think, is really to synthesize all these things together
[00:50:17.200]to provide guidance on selection of methods
[00:50:19.120]and interpreting their results.
[00:50:22.270]Alright, so these research projects were funded
[00:50:23.890]by a lot of these different organizations,
[00:50:25.430]so I want to thank all these.
[00:50:27.260]And I'm happy to take any questions.
[00:50:30.246](audience applauding)
[00:50:33.931]I'm interested in the,
[00:50:35.300]because of the way that you're using that time series,
[00:50:38.520]I'm interested in,
[00:50:40.920]I'd like to hear you talk about indices
[00:50:43.490]and how important that under, I don't wanna say quality,
[00:50:48.740]but the underlying data that's in that time series,
[00:50:53.990]if it's an index
[00:50:55.530]or some more robust estimate of something.
[00:50:59.930]Does that matter to these processes that you're describing?
[00:51:04.938]Yeah, that's a great question, you know.
[00:51:08.920]So, by indices, you mean like
[00:51:11.740]uncertainty in the observations, or like,
[00:51:16.360]different measurement schemes, maybe?
[00:51:18.540]Am I understanding that correctly?
[00:51:19.840]Well, like standard things
[00:51:21.640]that a agency might, like pheasants per square mile.
[00:51:25.060]Yeah. On a road survey,
[00:51:26.730]rather than a double observer
[00:51:30.630]or a distance-generated estimate of pheasant density.
[00:51:36.147]Yeah, okay, great question.
[00:51:37.920]So, that's the part of the Takens' Theorem
[00:51:40.283]that I glossed over, which was under suitable conditions,
[00:51:44.180]suitable conditions meaning like
[00:51:46.170]your data ARBs are perfectly
[00:51:48.100]and that you have an infinite amount of it.
[00:51:50.980]So yeah, definitely one of the challenges,
[00:51:52.560]like when you have transformations of data
[00:51:54.900]and the data are observed with noise and then those,
[00:52:00.610]the systems has dynamics that are nonlinear and complex,
[00:52:05.130]weird things are gonna happen when you try
[00:52:06.940]and reconstruct the dynamics just from the time series.
[00:52:09.790]So, I don't have a great answer to kind of,
[00:52:13.650]you know, how sensitive the methods are.
[00:52:18.550]It really is going to depend on the individual system.
[00:52:22.180]You know, there are, we are looking at
[00:52:26.490]kind of extending the methods or rather,
[00:52:28.690]building variants of the methods
[00:52:30.013]that are more amenable to noise.
[00:52:32.586]So, ways of actually estimating
[00:52:35.100]some of these coefficients more robustly,
[00:52:37.480]making predictions with uncertainty built into them,
[00:52:40.370]that's, you know, that's definitely
[00:52:41.203]something that we're working on.
[00:52:42.940]So, I didn't touch on that,
[00:52:43.840]but it's, yeah, stuff that's in the works, for sure.
[00:52:49.860]Anyone else?
[00:52:58.010]Thanks, Hao, that is really awesome.
[00:53:00.750]I'm trying to understand the Dynamic Stability index,
[00:53:05.200]and if I...
[00:53:07.320]So, the elements of that matrix A?
[00:53:09.886]Mm hm?
[00:53:11.490]There,
[00:53:13.230]are there basically gonna be the ratio of?
[00:53:17.130]So, let's just say one of them has a, there's one,
[00:53:20.340]there's a link between species one and species four,
[00:53:24.430]the element relevant to that is going to be
[00:53:27.210]the ratio of species one and species four
[00:53:30.900]between those two time steps, is that right?
[00:53:33.640]Yeah.
[00:53:34.473]So, we don't, we don't fix the,
[00:53:39.260]the diagonal of that matrix,
[00:53:40.980]which is like the self affect on the future,
[00:53:45.410]you know, the math kind of works out
[00:53:47.610]that regardless of like exactly how you set it up
[00:53:50.610]as either the change is, like the first derivative
[00:53:55.040]is that effect of matrix times the population
[00:53:57.800]or the future population is the matrix times the population.
[00:54:03.230]The dominant eigenvalue should be the same
[00:54:05.070]between those different like representations.
[00:54:09.620]Whether that, whether the threshold is that one or zero
[00:54:14.500]does depend on like whether you're doing the thing
[00:54:16.550]in discreet time or continuous time,
[00:54:19.480]but yeah, mathematically it doesn't really
[00:54:22.490]make that a big difference, but yeah.
[00:54:25.410]So, then the eigenvector
[00:54:29.580]is like, is sort of like the attractor in a sense?
[00:54:33.050]Yeah. Right?
[00:54:34.434]Is that what you mean by the direction?
[00:54:35.610]So the system might not actually be at that eigenvector,
[00:54:40.220]but if it kept, but if that persisted,
[00:54:42.680]then that's the direction it would go?
[00:54:46.679]Yeah, the dominant direction it would go, right.
[00:54:48.747]Because we're looking at the eigenvector
[00:54:50.460]associated with the largest eigenvalue.
[00:55:00.230]Hi there.
[00:55:01.063]I was just wondering as a grad student,
[00:55:04.160]these methods seem pretty cutting-edge
[00:55:06.320]and I was just wondering what kind of resources
[00:55:08.530]you would recommend for grad students
[00:55:10.750]that could try and incorporate the equation-free approach
[00:55:13.920]for their kind of studies?
[00:55:17.190]Yeah, good question.
[00:55:18.885]I can get out of this, right?
[00:55:22.350]Okay, so I have an R package I'm still working on
[00:55:26.010]like the publication, but if you go to...
[00:55:41.150]So, it's called R EDM for Empirical Dynamic Modeling.
[00:55:47.139]Should work.
[00:55:49.010]If you go to the website on GitHub,
[00:55:52.490]or on CRAN, you can find your way to the GitHub page,
[00:55:55.910]and you can click the link for the docs
[00:55:58.500]and you should be able to find, under Reference?
[00:56:04.460]Or article, is that article?
[00:56:07.230]Yeah, there is a tutorial.
[00:56:10.380]And this is a really long description
[00:56:12.760]of kind of a little bit of the theory,
[00:56:14.620]a little bit about how to do,
[00:56:17.210]apply all the functions
[00:56:19.000]and then some real data examples towards the end.
[00:56:21.610]So, as far as like a primer or primer goes on the methods
[00:56:26.490]and the software to use it,
[00:56:28.690]yeah, refer to this and feel free to email me.
[00:56:31.820]Cool, thank you,
[00:56:35.880]I guess I'll go, so.
[00:56:39.130]A lot of the methods that you used in the projects
[00:56:42.070]that you presented today and the ongoing projects
[00:56:44.550]seem to have long high-quality, as Larga mentioned,
[00:56:48.970]time series that appear to be periodic in nature.
[00:56:55.170]And I've tried to understand the idiom,
[00:56:57.420]but a lot of it goes over my head.
[00:56:59.320]So, a lot of our time series
[00:57:01.670]that aren't as long as the datasets that you've been using
[00:57:04.540]will likely not have these periodic trends.
[00:57:07.260]So, how sensitive is EDM
[00:57:08.800]and these other methods that you've been playing around with
[00:57:11.180]to that, to the nature of like nonperiodic data?
[00:57:14.310]Yeah, so, a kind of good rule of thumb for
[00:57:17.570]how long the time series needs to be is
[00:57:21.710]roughly 30 data points in time
[00:57:26.200]and/or five generation times.
[00:57:29.080]So, you know, take the, take the characteristic time scale
[00:57:32.770]of the dynamic you're trying to model
[00:57:35.470]and multiply that by five,
[00:57:36.900]you know, ideally the, you know, you should have,
[00:57:39.360]like if it's periodic that you have enough observations
[00:57:42.570]to capture like that signal.
[00:57:47.010]In cases where
[00:57:49.230]we don't have that.
[00:57:51.640]So...
[00:57:54.030]Adam Clark published a paper on using spacial replicates.
[00:57:58.420]So, if you have a lot of replicate
[00:58:02.880]plots, where you only have like a couple data points
[00:58:07.490]in each of those, how to combine that together
[00:58:10.570]to apply this methodology.
[00:58:12.860]So, that's one way to get around it, it's not ideal.
[00:58:17.320]You know.
[00:58:19.600]There are other things we're looking at
[00:58:23.700]to look at
[00:58:27.290]doing analyses in the absence of time,
[00:58:29.690]but where you have multiple covariates.
[00:58:32.700]Whether that is really good for recovering the mechanism,
[00:58:39.030]I'm agnostic and skeptical about.
[00:58:42.700]What I think it might be good at is hypothesis generation.
[00:58:46.520]So, suggesting that like
[00:58:48.020]these combination of covariates might be important
[00:58:50.430]that might, that is something that I think
[00:58:52.630]it is possible if you don't have time series data,
[00:58:56.020]but I think if you want to make
[00:58:57.280]slightly stronger inferences about mechanism
[00:58:59.300]you are gonna have to have like observations of changes.
[00:59:03.530]Great, thanks.
[00:59:04.810]I also really appreciated the fact that you put words like
[00:59:07.507]regime and stability in quotes,
[00:59:10.540]because as we discussed yesterday in length,
[00:59:15.200]they're jargon, so they can be interpreted
[00:59:17.730]by each of us in this room in a different way.
[00:59:20.150]So, going back to that Christensen,
[00:59:22.030]I think Christensen et al time series
[00:59:24.480]of the rodentia community,
[00:59:28.960]you guys highlighted regime shifts
[00:59:30.350]that were of varying length.
[00:59:31.480]So, I think they varied between,
[00:59:33.440]I don't know, some months to a handful of years.
[00:59:37.140]So, I guess I wonder if you might comment on
[00:59:42.340]how you claim something is a regime shift
[00:59:44.680]and at what period of time is it in that regime shift?
[00:59:48.420]Are you basing this on your stability metric, I guess?
[00:59:52.500]Does that make sense?
[00:59:53.700]Yeah, so, the Christensen et al paper,
[00:59:57.130]the regime shifts were,
[01:00:00.910]they are identified using a change point model
[01:00:05.000]that is probabilistic, so you are inserting
[01:00:09.540]change points in the time series structure
[01:00:13.040]and then selecting for,
[01:00:15.660]you know, letting the Bayesian stuff work at,
[01:00:17.870]work out how many change points there should be
[01:00:20.150]and where they, you know,
[01:00:23.150]what time point they are correct.
[01:00:25.140]So, those periods are,
[01:00:27.810]I think they're actually like
[01:00:28.910]the 90% credible intervals for
[01:00:33.360]the location of the change points.
[01:00:35.550]So, after like throwing in the data into the model,
[01:00:39.090]the posterior distribution suggests that there is
[01:00:40.990]a change point somewhere in that
[01:00:45.410]time span of 90% credibility.
[01:00:50.520]That's kind of how that method worked.
[01:00:53.040]Does that answer the question?
[01:00:56.017]Yes.
[01:00:57.099]Sort of? Yeah, sort of for now.
[01:00:58.780]Okay, we'll talk later.
[01:01:00.820]Yeah, for sure.
[01:01:02.210]Are there any other questions?
[01:01:04.960]Drew?
[01:01:08.053]Uh oh.
[01:01:09.760]I just want to say, I'm a complete fan
[01:01:12.040]of the whole automated prediction thing
[01:01:14.500]and I'm working at, we're doing some stuff up here.
[01:01:17.770]We're gonna be following in your footsteps.
[01:01:20.170]So, thanks for putting together all the packages
[01:01:22.502]and saying this is a good thing to do, because I agree.
[01:01:27.090]Awesome.
[01:01:28.000]Yeah, it also seems like really impossible
[01:01:30.050]to do all that stuff that you're gonna do
[01:01:31.533]with the students in the White Ernest Lab.
[01:01:34.890]Yeah, I commend you.
[01:01:37.675]So, thank you so much.
[01:01:38.640]Please join me in thanking Dr. Hao Ye.
[01:01:41.905](audience applauding)

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

Data-driven Modeling of Ecological Dynamics

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments