High Throughput Biological Data Analysis

Tate Anderson Author

08/04/2020 Added

24 Plays

Description

Video presentation of high throughput biological data analysis.

Searchable Transcript

Search:

[00:00:00.000]Hi my name is Tate Anderson,
[00:00:03.341]This summer I worked with Dr. Hasan Otu
[00:00:05.213]in the department of Electrical and
[00:00:06.945]Computer Engineering
[00:00:08.570]and the topic of my research was
[00:00:10.130]high throughput biological data analysis.
[00:00:13.102]So to start off, biological experiments
[00:00:15.214]have become increasingly data-centric.
[00:00:17.084]We can leverage these large pools of data
[00:00:19.947]to identify molecules that may
[00:00:22.177]precede disease.
[00:00:23.225]In order to do this, we need a way to
[00:00:25.535]analyze thousands of samples and
[00:00:27.769]detect these molecules, which are known
[00:00:29.626]as biomarkers.
[00:00:30.585]We will create a software tool with Matlab
[00:00:33.826]that uses machine learning and statistical
[00:00:36.296]methods to identify these molecules.
[00:00:38.783]The data that we want to analyze will
[00:00:42.515]have both sample and feature attributes
[00:00:44.871]that describe the samples.
[00:00:46.645]The features will be any type of omic.
[00:00:49.259]For proteins, it will be proteomics,
[00:00:51.303]for genes it will be transcriptomics,
[00:00:53.083]metabolites will be metabolomics,
[00:00:55.169]and lipids will be lipidomics.
[00:00:56.698]Any of these can be leveraged
[00:00:58.462]with the tool to view the data.
[00:01:00.886]Sample attributes will then further define
[00:01:03.924]each feature and give more meaning
[00:01:06.674]to them.
[00:01:07.562]The user will then have the ability to
[00:01:09.850]define groups and to view these
[00:01:11.402]groups for data visualization,
[00:01:13.051]clustering, differential expression
[00:01:15.207]analysis, and classification and
[00:01:17.031]prediction will also be features that
[00:01:18.793]the user can use to view the data.
[00:01:23.501]Figures 1 and 2 give us two different
[00:01:27.475]ways that the user can view the data.
[00:01:30.539]Figure 1 is a sample boxplot for a subset
[00:01:33.407]of 72 samples. In this case the user will
[00:01:36.307]have already defined multiple sample
[00:01:38.739]groups and then in the next step will
[00:01:41.369]be able to view them in a box plot
[00:01:43.333]such as this one.
[00:01:44.493]Figure 2 similarly is a histogram of
[00:01:48.261]a subset of samples' signal values.
[00:01:50.097]So again, the user has already
[00:01:52.125]defined sample groups
[00:01:54.085]and is now able to view them in
[00:01:55.659]a histogram.
[00:01:59.407]The software was built using
[00:02:01.020]many of Matlab's UI components.
[00:02:02.798]That is uifigure, uitable, and uilabel
[00:02:05.984]to name a few.
[00:02:07.166]These allow the user to have full
[00:02:09.133]interactivity and full control over
[00:02:11.320]the data.
[00:02:12.353]Logical arrays and indexing were also
[00:02:14.711]used to extract the defined groups
[00:02:17.273]from the data set and then
[00:02:19.747]from there let the user create
[00:02:21.210]unions and intersections of the data
[00:02:23.121]groups after they have been defined.
[00:02:24.879]Problems with time complexity also
[00:02:27.671]had to be taken into consideration
[00:02:29.348]when we are working with such large
[00:02:31.131]data sets.
[00:02:32.308]Figure 3 gives us a view of the sample
[00:02:37.595]summary screen in the software tool.
[00:02:39.818]This sample summary screen includes
[00:02:43.441]three different tables.
[00:02:45.738]The first table is the overall summary
[00:02:47.618]of the samples.
[00:02:48.633]In this case there are 252 samples and
[00:02:53.014]we are given the mean, the median,
[00:02:54.951]and the standard deviation of each
[00:02:56.567]of the columns.
[00:02:59.355]The second table is the summary
[00:03:01.525]of the feature attributes.
[00:03:04.096]This gives us a quick look at the
[00:03:05.753]feature attributes that were
[00:03:07.119]present in the data we were given.
[00:03:08.777]The rows give us the number of entries,
[00:03:11.609]the number of unique entries,
[00:03:13.124]and the number of empty entries
[00:03:14.616]in the data.
[00:03:17.715]The software tool also goes through
[00:03:19.657]and checks all the data for
[00:03:21.370]missing entries and as we can see
[00:03:22.965]here we have no missing entries
[00:03:24.453]but if there was a missing entry,
[00:03:26.130]it would prompt the user to go back
[00:03:27.967]and fix that missing value.
[00:03:31.558]The third and final table is the
[00:03:33.714]summary of the sample attributes.
[00:03:36.096]In this case we have four sample
[00:03:38.303]attributes, and again we are given
[00:03:39.933]the number of entries,
[00:03:41.041]the number of unique entries,
[00:03:42.767]and the number of empty entries
[00:03:44.260]and then the name for each of the
[00:03:46.436]sample attributes.
[00:03:49.466]Moving on to Figure 4 which is the
[00:03:51.220]next step, this shows us the
[00:03:53.938]group selection and definition
[00:03:55.782]screen in the software tool.
[00:03:57.351]As we can see in the dropdown menu
[00:03:59.504]we have all four of the
[00:04:00.784]sample attributes.
[00:04:01.748]In this case they are the ID, the time,
[00:04:03.977]the type, and the run.
[00:04:05.735]Again the sample attributes
[00:04:07.385]are different ways that the data is
[00:04:09.712]organized and different ways that
[00:04:13.281]we can differentiate the samples
[00:04:15.539]from each other.
[00:04:16.936]The first table gives the user
[00:04:19.016]a look at all of the samples and
[00:04:24.341]once the user chooses one of the
[00:04:27.432]sample attributes or selects individual
[00:04:29.889]samples to include or exclude, the user is
[00:04:33.063]shown the sample group which is the
[00:04:35.729]table on the right.
[00:04:36.958]That is a summary of the group that
[00:04:39.679]the user has selected.
[00:04:41.488]After the user has selected
[00:04:43.027]that group they are able to enter
[00:04:44.761]the group name and then save and
[00:04:46.905]continue which will allow
[00:04:48.322]the user to define another group,
[00:04:49.893]or save and exit which will take the user
[00:04:52.643]to the next step in the program.
[00:04:58.627]As of right now, the software can
[00:05:00.421]take in the data file, show the user
[00:05:02.480]the summary as we saw in Figure 3,
[00:05:04.311]allow the user to define the sample
[00:05:06.873]groups, and then from there
[00:05:08.432]it will display the resulting
[00:05:10.311]groups in a box plot after the
[00:05:12.543]user decides if they want to view
[00:05:14.175]the union or the intersection of the data.
[00:05:16.437]In the future, we will add more data
[00:05:19.589]visualization features, as well as
[00:05:21.646]clustering and differential expression
[00:05:23.852]analysis to further analyze the data.
[00:05:25.835]We will also let the user perform
[00:05:28.347]classification and prediction with
[00:05:30.408]machine learning methods in order to
[00:05:32.338]identify the biomarkers as I
[00:05:34.873]referenced earlier.
[00:05:36.304]This has been a quick overview of the
[00:05:38.450]software that we are building for high
[00:05:41.064]throughput biological data analysis.
[00:05:43.476]Thank you for watching.

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

High Throughput Biological Data Analysis

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments