Investigating the Effects of Disasters on Social Unrest Through Clustering Analysis

James Erickson Author

04/05/2021 Added

6 Plays

Description

With Dr. Leen-Kiat Soh as an advisor, I worked on looking for trends in social unrest data in the aftermath of disasters in order to study the effects that disasters could have on social unrest.

Searchable Transcript

Search:

[00:00:00.330]Hello, my name is Jimmy Erickson,
[00:00:01.760]and today I'm going to be talking
[00:00:02.890]about investigating the effects of disasters
[00:00:04.850]on social unrest, through cluster analysis.
[00:00:07.290]I've been working with Dr. Soh on this as my advisor.
[00:00:09.860]So first off, social unrest activities
[00:00:11.870]are very complex to model.
[00:00:13.110]There are many different approaches that are needed
[00:00:15.310]and there's currently actually a search project
[00:00:16.980]ongoing at UNL, and they're looking to model social unrest
[00:00:19.790]in four target countries, including India.
[00:00:22.090]And disasters are just one of many areas of interest
[00:00:24.480]in that kind of model.
[00:00:25.490]So I've been able to focus on that for my research.
[00:00:27.630]So little bit of an overview.
[00:00:28.870]The goal is to gain knowledge about trends and social unrest
[00:00:31.790]following disasters observed through clustering analysis.
[00:00:34.580]For data sources, we're using DesInventar for disaster data
[00:00:37.400]and GDELT for social unrest data.
[00:00:39.480]Those have both been determined
[00:00:40.720]to be the data sets that are best suited for our purposes
[00:00:43.560]in terms of their data collection methods
[00:00:46.030]and the availability.
[00:00:47.590]So some questions before we get into it
[00:00:49.670]that we want to keep in mind
[00:00:50.810]are what trends can be observed in the events
[00:00:53.020]that follow a given disaster.
[00:00:54.350]And can disaster data be clustered
[00:00:57.100]with the data about the events that follow it?
[00:00:59.600]If so, there could be a relationship between disasters
[00:01:02.470]and social unrest events that should be further explored.
[00:01:06.500]So onto our approaches.
[00:01:08.300]The GDELT data includes many different events.
[00:01:10.510]So we wanna filter that down to Protests,
[00:01:12.470]Aid events and Government Suppression Events.
[00:01:14.470]We identified Aid events and Government Suppression Events
[00:01:17.010]as potentially important events
[00:01:19.210]that could change people's perception
[00:01:20.490]about the handling of a disaster.
[00:01:22.270]The Deslnventar data, we chose to focus on India
[00:01:25.050]and it's available for the States of Orissa
[00:01:27.150]Tamil Nadu and Uttarakhand.
[00:01:28.750]So we're focusing on those states for this research.
[00:01:33.190]How to format the data to work with it.
[00:01:35.160]What we do is, we take a disaster event
[00:01:38.170]and then that starts a timeline.
[00:01:39.880]And so there are three timelines for every disaster event,
[00:01:43.410]which include events, the social events
[00:01:45.870]such as Protests, Aid or Government Suppression
[00:01:48.170]that occurred within 40, 80 and 120 kilometers
[00:01:51.090]of that disaster.
[00:01:52.660]Every timeline is limited to only events
[00:01:54.840]that occurred within one year after the disaster.
[00:01:57.340]So for one disaster, we have three different timelines.
[00:02:00.900]So we can observe the difference
[00:02:02.390]in clustering between a 40 kilometer range, 80 and 120.
[00:02:08.890]Now the clustering setup.
[00:02:09.980]We organize the data
[00:02:10.860]into three different categories of attributes.
[00:02:13.030]First off, we have the disaster attributes.
[00:02:15.150]That includes number of deaths,
[00:02:16.440]number of injuries, trend-based attributes
[00:02:19.400]did X type of event increase or decrease
[00:02:21.950]over the course of the timeline.
[00:02:23.630]These attributes were calculated
[00:02:25.330]using the slope and error from linear regression.
[00:02:27.730]The third category are event count attributes,
[00:02:29.940]which for example, it's the number of protests
[00:02:31.680]in a timeline.
[00:02:32.700]And then what we did is
[00:02:33.890]we created six different combinations of these categories.
[00:02:37.200]So first off we have
[00:02:38.470]the three just plain single categories.
[00:02:40.760]That we have category one and two,
[00:02:42.420]category one and three, and then all three of them.
[00:02:44.740]And this way we can observe
[00:02:46.730]which sets of attributes actually cluster best.
[00:02:49.200]So we can get an idea
[00:02:50.560]as to the nature of the possible relationship
[00:02:53.270]between a disaster event and the events that follow it.
[00:02:57.460]So then after this, we wanna ask ourselves,
[00:02:59.920]what clustering methods do we want to perform?
[00:03:02.950]We identified K-Means clustering
[00:03:04.960]as one of the methods that we would like to use for this.
[00:03:07.410]What we do with this is you pass in a K value.
[00:03:09.870]And that is how many clusters the data is to be split into.
[00:03:13.560]This is very useful because this gives us an idea of say
[00:03:16.550]there are four main types of disasters.
[00:03:19.510]And then we could see how the disasters fit into those
[00:03:22.350]and the type of events or trends that follow that disaster.
[00:03:26.180]So when we approach this clustering
[00:03:27.910]we want to ask ourselves, what radius the 40, 80, 120
[00:03:31.640]yields the most viable K value.
[00:03:34.200]The viable K value being,
[00:03:36.270]it's one that we are actually able to observe.
[00:03:39.100]So a K value of greater than 100
[00:03:41.970]there is no feasible way for us to, with our resources
[00:03:47.030]dive into and really understand
[00:03:48.810]what makes each of those clusters unique.
[00:03:51.440]And then also, which combination is the most viable K value
[00:03:54.920]and combination being the combination of categories.
[00:03:57.770]And so the viable K value
[00:03:59.030]we want to be able to dig into it.
[00:04:00.790]And we also want it to just be quality clustered.
[00:04:03.360]So how do we find out if a set of clusters are high quality?
[00:04:07.270]Well, we're using a goodness variable, which is calculated
[00:04:10.370]by looking at the inter-cluster distance
[00:04:11.990]versus the intra-cluster distance for each K value.
[00:04:15.460]And what we did is we plotted these with the K value
[00:04:17.520]on the X axis and the goodness on the y-axis.
[00:04:20.470]So then we can see trends
[00:04:22.510]such as for categories one and two,
[00:04:24.600]we can see that the goodness greatly increases
[00:04:27.350]and then decreases after K equals 15
[00:04:30.250]for category one and K equals three for category two.
[00:04:32.830]That is good clustering.
[00:04:33.820]And that is what we want to see.
[00:04:35.250]Category three, this is bad clustering
[00:04:37.320]because it only ever decreases
[00:04:39.220]when you get greater than K equals one.
[00:04:41.000]Which means that it doesn't actually want to be clustered.
[00:04:44.070]For the three combinations along the bottom.
[00:04:46.450]They actually look very similar.
[00:04:48.130]So we don't want to draw any strong conclusions yet
[00:04:51.520]but what we can get from this
[00:04:52.580]is that categories one and category two,
[00:04:55.130]each cluster well independently
[00:04:56.720]and category three does not cluster well.
[00:05:00.040]So the next clustering method
[00:05:02.400]that we used was hierarchical clustering.
[00:05:04.830]The way that hierarchical clustering works
[00:05:07.010]is it clusters one data point at a time.
[00:05:09.020]So it basically lets us observe the process
[00:05:11.510]of the clustering
[00:05:12.450]so we can see how stable the clusters are.
[00:05:14.480]If you plot it on a dendrogram,
[00:05:15.910]you have each of the individual data points along the X axis
[00:05:18.890]and then the Y axis has the distance between each cluster.
[00:05:22.590]And this let's us look at the stability.
[00:05:24.470]So category one, fairly stable.
[00:05:27.530]You combine that with the goodness plot that we saw,
[00:05:29.650]and that is good, we wanna keep that.
[00:05:31.300]Category two, very stable.
[00:05:33.280]That combined with its goodness plot,
[00:05:34.810]it's great, it clusters well, we want to keep that.
[00:05:37.220]Category three, fairly stable.
[00:05:39.500]But again, we go back and look at the goodness plot
[00:05:41.690]and we do not want to be using it.
[00:05:43.120]Categories one and two combined, quite stable.
[00:05:46.470]They had a fairly good plot.
[00:05:48.390]And so that is looking quite good.
[00:05:51.510]And categories one and three, it looks okay
[00:05:56.360]but again, category three we do not want to use
[00:05:58.990]because it decreases the quality of the clustering.
[00:06:01.970]So the results of this clustering investigation
[00:06:04.380]is we're going to use categories one and two.
[00:06:06.400]And then we also chose to use the 120 kilometer range
[00:06:09.610]as the goodness values were higher
[00:06:11.760]and the clusters were more stable across the board
[00:06:15.060]for every combination for 120 kilometers.
[00:06:18.770]And so we settled on K equals eight.
[00:06:21.290]And so here on the right
[00:06:22.280]we can see the distribution of timelines per cluster.
[00:06:24.970]This lets us just see whether there are any clusters
[00:06:28.060]that are massive and some are insignificant.
[00:06:30.570]And so it looks quite good.
[00:06:32.310]So then what we can do is look at the mean
[00:06:34.150]of attributes by cluster and compare it
[00:06:35.450]with the standard deviation of attributes by cluster.
[00:06:37.600]And this lets us see just how extreme each cluster is
[00:06:40.980]for the disasters and just kind of
[00:06:43.840]what kind of effect the disasters in that cluster had.
[00:06:46.350]And then we can also see how variable they are.
[00:06:48.550]And so what really sticks out obviously is cluster seven.
[00:06:51.970]With this heat map, we can see that cluster seven
[00:06:54.290]except for injuries is very extreme
[00:06:57.130]in all of these disaster categories.
[00:07:00.090]And so that's one thing that sticks out.
[00:07:01.520]We can also see that it's quite variable
[00:07:03.480]for those disaster attributes.
[00:07:06.050]So in conclusion, disaster attributes can be clustered
[00:07:08.980]with trend based attributes for social events.
[00:07:11.490]There are certain combinations of disaster attributes
[00:07:13.500]that line up with trends and the events that follow them.
[00:07:16.160]For future work, I would like to see a different approach
[00:07:18.250]taken to this that is able to dive into that relationship
[00:07:21.400]even more going off of the findings of this work.
[00:07:24.750]I would also like to see someone focus more
[00:07:26.240]on the sociological side,
[00:07:27.600]bringing a background of sociology
[00:07:29.130]and just be able to apply that to this problem.
[00:07:33.340]I would also like to see this incorporated
[00:07:34.690]into a larger social unrest model.
[00:07:38.950]Now I'd like to thank Dr. Soh for being my advisor
[00:07:41.170]and mentoring me on the research process.
[00:07:43.160]I would like to thank Dr. Samal and Dr. Joshi
[00:07:45.470]and the rest of the student members of the SURGE group
[00:07:47.670]for giving me weekly feedback at our meeting
[00:07:50.250]and just helping me figure out how to
[00:07:53.080]just get better at this process
[00:07:54.720]and improve the work that I'm doing, thank you.

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

Investigating the Effects of Disasters on Social Unrest Through Clustering Analysis

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments