High Throughput Biological Data Analysis
Tate Anderson
Author
08/04/2020
Added
24
Plays
Description
Video presentation of high throughput biological data analysis.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.000]Hi my name is Tate Anderson,
- [00:00:03.341]This summer I worked with Dr. Hasan Otu
- [00:00:05.213]in the department of Electrical and
- [00:00:06.945]Computer Engineering
- [00:00:08.570]and the topic of my research was
- [00:00:10.130]high throughput biological data analysis.
- [00:00:13.102]So to start off, biological experiments
- [00:00:15.214]have become increasingly data-centric.
- [00:00:17.084]We can leverage these large pools of data
- [00:00:19.947]to identify molecules that may
- [00:00:22.177]precede disease.
- [00:00:23.225]In order to do this, we need a way to
- [00:00:25.535]analyze thousands of samples and
- [00:00:27.769]detect these molecules, which are known
- [00:00:29.626]as biomarkers.
- [00:00:30.585]We will create a software tool with Matlab
- [00:00:33.826]that uses machine learning and statistical
- [00:00:36.296]methods to identify these molecules.
- [00:00:38.783]The data that we want to analyze will
- [00:00:42.515]have both sample and feature attributes
- [00:00:44.871]that describe the samples.
- [00:00:46.645]The features will be any type of omic.
- [00:00:49.259]For proteins, it will be proteomics,
- [00:00:51.303]for genes it will be transcriptomics,
- [00:00:53.083]metabolites will be metabolomics,
- [00:00:55.169]and lipids will be lipidomics.
- [00:00:56.698]Any of these can be leveraged
- [00:00:58.462]with the tool to view the data.
- [00:01:00.886]Sample attributes will then further define
- [00:01:03.924]each feature and give more meaning
- [00:01:06.674]to them.
- [00:01:07.562]The user will then have the ability to
- [00:01:09.850]define groups and to view these
- [00:01:11.402]groups for data visualization,
- [00:01:13.051]clustering, differential expression
- [00:01:15.207]analysis, and classification and
- [00:01:17.031]prediction will also be features that
- [00:01:18.793]the user can use to view the data.
- [00:01:23.501]Figures 1 and 2 give us two different
- [00:01:27.475]ways that the user can view the data.
- [00:01:30.539]Figure 1 is a sample boxplot for a subset
- [00:01:33.407]of 72 samples. In this case the user will
- [00:01:36.307]have already defined multiple sample
- [00:01:38.739]groups and then in the next step will
- [00:01:41.369]be able to view them in a box plot
- [00:01:43.333]such as this one.
- [00:01:44.493]Figure 2 similarly is a histogram of
- [00:01:48.261]a subset of samples' signal values.
- [00:01:50.097]So again, the user has already
- [00:01:52.125]defined sample groups
- [00:01:54.085]and is now able to view them in
- [00:01:55.659]a histogram.
- [00:01:59.407]The software was built using
- [00:02:01.020]many of Matlab's UI components.
- [00:02:02.798]That is uifigure, uitable, and uilabel
- [00:02:05.984]to name a few.
- [00:02:07.166]These allow the user to have full
- [00:02:09.133]interactivity and full control over
- [00:02:11.320]the data.
- [00:02:12.353]Logical arrays and indexing were also
- [00:02:14.711]used to extract the defined groups
- [00:02:17.273]from the data set and then
- [00:02:19.747]from there let the user create
- [00:02:21.210]unions and intersections of the data
- [00:02:23.121]groups after they have been defined.
- [00:02:24.879]Problems with time complexity also
- [00:02:27.671]had to be taken into consideration
- [00:02:29.348]when we are working with such large
- [00:02:31.131]data sets.
- [00:02:32.308]Figure 3 gives us a view of the sample
- [00:02:37.595]summary screen in the software tool.
- [00:02:39.818]This sample summary screen includes
- [00:02:43.441]three different tables.
- [00:02:45.738]The first table is the overall summary
- [00:02:47.618]of the samples.
- [00:02:48.633]In this case there are 252 samples and
- [00:02:53.014]we are given the mean, the median,
- [00:02:54.951]and the standard deviation of each
- [00:02:56.567]of the columns.
- [00:02:59.355]The second table is the summary
- [00:03:01.525]of the feature attributes.
- [00:03:04.096]This gives us a quick look at the
- [00:03:05.753]feature attributes that were
- [00:03:07.119]present in the data we were given.
- [00:03:08.777]The rows give us the number of entries,
- [00:03:11.609]the number of unique entries,
- [00:03:13.124]and the number of empty entries
- [00:03:14.616]in the data.
- [00:03:17.715]The software tool also goes through
- [00:03:19.657]and checks all the data for
- [00:03:21.370]missing entries and as we can see
- [00:03:22.965]here we have no missing entries
- [00:03:24.453]but if there was a missing entry,
- [00:03:26.130]it would prompt the user to go back
- [00:03:27.967]and fix that missing value.
- [00:03:31.558]The third and final table is the
- [00:03:33.714]summary of the sample attributes.
- [00:03:36.096]In this case we have four sample
- [00:03:38.303]attributes, and again we are given
- [00:03:39.933]the number of entries,
- [00:03:41.041]the number of unique entries,
- [00:03:42.767]and the number of empty entries
- [00:03:44.260]and then the name for each of the
- [00:03:46.436]sample attributes.
- [00:03:49.466]Moving on to Figure 4 which is the
- [00:03:51.220]next step, this shows us the
- [00:03:53.938]group selection and definition
- [00:03:55.782]screen in the software tool.
- [00:03:57.351]As we can see in the dropdown menu
- [00:03:59.504]we have all four of the
- [00:04:00.784]sample attributes.
- [00:04:01.748]In this case they are the ID, the time,
- [00:04:03.977]the type, and the run.
- [00:04:05.735]Again the sample attributes
- [00:04:07.385]are different ways that the data is
- [00:04:09.712]organized and different ways that
- [00:04:13.281]we can differentiate the samples
- [00:04:15.539]from each other.
- [00:04:16.936]The first table gives the user
- [00:04:19.016]a look at all of the samples and
- [00:04:24.341]once the user chooses one of the
- [00:04:27.432]sample attributes or selects individual
- [00:04:29.889]samples to include or exclude, the user is
- [00:04:33.063]shown the sample group which is the
- [00:04:35.729]table on the right.
- [00:04:36.958]That is a summary of the group that
- [00:04:39.679]the user has selected.
- [00:04:41.488]After the user has selected
- [00:04:43.027]that group they are able to enter
- [00:04:44.761]the group name and then save and
- [00:04:46.905]continue which will allow
- [00:04:48.322]the user to define another group,
- [00:04:49.893]or save and exit which will take the user
- [00:04:52.643]to the next step in the program.
- [00:04:58.627]As of right now, the software can
- [00:05:00.421]take in the data file, show the user
- [00:05:02.480]the summary as we saw in Figure 3,
- [00:05:04.311]allow the user to define the sample
- [00:05:06.873]groups, and then from there
- [00:05:08.432]it will display the resulting
- [00:05:10.311]groups in a box plot after the
- [00:05:12.543]user decides if they want to view
- [00:05:14.175]the union or the intersection of the data.
- [00:05:16.437]In the future, we will add more data
- [00:05:19.589]visualization features, as well as
- [00:05:21.646]clustering and differential expression
- [00:05:23.852]analysis to further analyze the data.
- [00:05:25.835]We will also let the user perform
- [00:05:28.347]classification and prediction with
- [00:05:30.408]machine learning methods in order to
- [00:05:32.338]identify the biomarkers as I
- [00:05:34.873]referenced earlier.
- [00:05:36.304]This has been a quick overview of the
- [00:05:38.450]software that we are building for high
- [00:05:41.064]throughput biological data analysis.
- [00:05:43.476]Thank you for watching.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/13980?format=iframe&autoplay=0" title="Video Player: High Throughput Biological Data Analysis" allowfullscreen ></iframe> </div>
Comments
0 Comments