Pathway Coverage in Bacterial Species

Kyle Hancock Author

04/05/2021 Added

13 Plays

Description

Bacteria work together in nature. Complementarity is a measure of how well different bacteria work together. Our research begins a program to provide this measure of complimentarity in a simple, numerical format.

Searchable Transcript

Search:

[00:00:01.200]Hi, my name is Kyle Hancock,
[00:00:02.670]and I will be presenting our findings on pathway coverage and bacterial species.
[00:00:07.260]Just a little bit of an introduction.
[00:00:08.740]Bacteria can live near on or even any other bacteria.
[00:00:11.970]These interactions between these bacteria have important consequences for the
[00:00:15.840]organisms they affect. So for a prime example, you can look at humans,
[00:00:19.860]humans have more bacterial cells than they have human cells in their system.
[00:00:23.700]So the interactions between the bacteria on our skin and in our gut are very
[00:00:27.840]critical to our general health and wellbeing.
[00:00:30.990]Complementarity is a measure of how well these bacteria can work together.
[00:00:34.980]And which is the focus of our research project.
[00:00:37.290]We are trying to compose a MATLAB program that will be able to provide this
[00:00:40.380]complementarity measure in a simple numerical format.
[00:00:43.500]So drawing from the keg database, different bacteria,
[00:00:46.140]specifically their individual genomic pathways are compared,
[00:00:49.940]and they're analyzed using the Jakarta index,
[00:00:52.080]which is a simple measure of complimentarity.
[00:00:56.100]So looking a bit more at the keg database, um, that it's a huge online database,
[00:01:00.780]full of thousands of different organisms and their individual genomic
[00:01:04.980]information. And this information is stored in the form of pathways.
[00:01:09.780]So a keg pathway is a map of a biological process that can occur in an organism.
[00:01:14.550]This process can be anything from a basic metabolism to a complex cellular
[00:01:18.270]process.
[00:01:19.350]And they're made up of networks of metabolites proteins and other biomolecules,
[00:01:24.570]each biological process.
[00:01:25.980]And that database has a corresponding map with a unique five digit ID.
[00:01:30.690]So for the image here,
[00:01:31.890]you can see that that is called H S a zero zero six two zero.
[00:01:36.510]So HSA is the abbreviation for homosapiens or humans,
[00:01:40.770]zero zero six two zero is the ID for the process of pyruvate metabolism.
[00:01:46.470]As you can see, not all of the boxes in this map are green,
[00:01:49.770]which means that homosapiens do not possess all of the necessary genes for
[00:01:53.940]pirate VATE metabolism.
[00:01:56.730]So a program is both broken up into two phases.
[00:01:59.460]So phase one was more of a file management.
[00:02:01.650]So keg pathways can be represented using adjacency matrices.
[00:02:06.870]These matrices are obtained from keg through a program called kick to net.
[00:02:10.920]There is simple grids of ones and zeros. However,
[00:02:13.620]they can be quite large up to hundreds of columns and hundreds of rows long,
[00:02:17.850]and each bacteria can contain hundreds of these matrices. So, um,
[00:02:21.750]this lends itself to creating a large complex dataset that we have to manage.
[00:02:26.250]The output of phase one is sorted data,
[00:02:29.190]which is organized by bacteria that can be easily accessed to further
[00:02:32.400]manipulate, to further analyze their calculations.
[00:02:36.810]So looking a.
[00:02:37.140]Little bit more about the file management process,
[00:02:39.630]adjacency matrices are stored as text files as shown here.
[00:02:42.600]So the first column and row are the gene IDs and a one represents an interaction
[00:02:47.250]between genes. So he looks, you say six 69,
[00:02:51.000]77 and 62.
[00:02:52.470]These are our unique gene IDs that correspond to genes and a bacteria for
[00:02:57.390]this specific pathway. Um, and you see a one, um,
[00:03:01.420]you say second column, first row,
[00:03:03.100]you see an interaction between gene 77 and gene 69.
[00:03:08.890]So hundreds of these files,
[00:03:10.330]these adjacency matrices are sorted and then placed into corresponding lists
[00:03:14.200]based on the bacteria they belong to.
[00:03:17.110]So as this program needs to run both on Mac and windows systems,
[00:03:20.410]we had to deal with several idiosyncrasies idiosyncrasies that are unique to the
[00:03:24.160]systems. So Mac and windows have different file management systems,
[00:03:28.510]file storage systems,
[00:03:30.340]and they have different file names and file types that you have to deal with.
[00:03:33.820]So in order to handle this,
[00:03:35.620]you have a large amount of error checking and unique branches in our program.
[00:03:39.700]Um, that trigger based off which operating system you're using.
[00:03:45.400]So as far as an output of phase one, we have here has shown as the M matrix. Um,
[00:03:50.590]every pathway and present in any of the eight bacteria is shown in the first
[00:03:54.400]column. And each of the bacteria analyzed are shown in the first row.
[00:03:59.140]So the numbers in the matrix represent how many genes exist in that certain
[00:04:02.680]bacteria pathway combination.
[00:04:04.960]If you look at the first entry you have Bannie zero zero zero one zero,
[00:04:08.620]and you see a 14, I mean, is there a 14 Trent genes present in that zero zero,
[00:04:13.390]zero one zero pathway for Banny.
[00:04:17.440]This matrix serves as the basis for phase two of our, of our project,
[00:04:22.420]um, all of the other matrices and keg information, um,
[00:04:26.440]that we use to make this M matrix was also stored and a way we can reference it
[00:04:31.120]later.
[00:04:32.130]Okay.
[00:04:34.410]Phase two of our program was moving on from file management into data analysis.
[00:04:38.760]So we wrote another program that uses the output data.
[00:04:41.910]The M matrix from phase one has its input.
[00:04:45.540]So it takes each bacteria and compares it to every other bacteria.
[00:04:48.630]And the dataset is this allows for, um,
[00:04:52.770]combinations of every single possible combinations of bacteria at once in order
[00:04:57.570]to analyze this set and use the Jakarta index,
[00:05:00.480]which is a really simple parameter, easy to calculate by hand and by machine,
[00:05:04.590]which allows us to, um, check our machines work fairly easily. Um,
[00:05:09.270]in order to assess the inner workings of the program,
[00:05:12.420]it also gives us a good idea of complimentarity as well.
[00:05:17.040]So.
[00:05:17.210]The current index, as I said before, is easily calculated.
[00:05:20.040]And it is generally used for gauging the similarity and diversity of two data
[00:05:23.490]sets. Um,
[00:05:24.750]the lowest Jakarta index of zero means the two data sets are entirely unique.
[00:05:28.770]They share nothing, a hydrocarbon decks,
[00:05:31.380]and one means that the data sets are exactly the same.
[00:05:35.550]So how do.
[00:05:35.880]You calculate this Jakarta index? You take all, you take data,
[00:05:39.150]set a and data set B and you calculate their intersection.
[00:05:42.510]Their intersection is all of the data points in a and B that are
[00:05:47.460]shared. And then both, if you take the shared data,
[00:05:51.030]they B and you divide it by the union of a and B that union and of a,
[00:05:55.110]and B is all of a plus of B minus that intersection.
[00:06:00.530]So for our purposes,
[00:06:01.610]we want a low Jakarta index as that'll be a measure of good complimentarity for
[00:06:05.180]us. So looking at the organismal level Jakarta index,
[00:06:10.460]um, we calculate the Jakarta decks for each possible bacteria pair.
[00:06:14.270]So for each combination, Adam's corresponding columns are scanned.
[00:06:17.690]The shared pathways are the ones we're pulled. The indices are not zero.
[00:06:21.440]For example, for [inaudible],
[00:06:26.120]this comparison would count as a shared pathway. Um,
[00:06:29.480]a low Jakarta index here expands the possible biological processes available to
[00:06:33.320]each bacteria
[00:06:36.200]And looking at the pathway level for the Jakarta index. Um,
[00:06:39.380]we calculated it for each pathway, um,
[00:06:42.260]and each back possible bacteria pair in the dataset. So for example,
[00:06:45.800]if you take the first two bacteria and the first pathway you get
[00:06:49.550][inaudible] and [inaudible],
[00:06:53.830]you can see that they have 14 and 18 genes respectively.
[00:06:56.720]So the program poles [inaudible], um,
[00:07:01.280]the adjacency matrices or adjacency matrix for that bacteria pathway
[00:07:06.050]combination and the adjacency matrices for
[00:07:09.600][inaudible] and then scans them and compares
[00:07:14.600]them. It takes all of the genes that are shared in these two adjacency matrices.
[00:07:19.370]And then it divides that by the union between the two chases signature seeds.
[00:07:24.200]So alogia card index here gives us an idea of how likely each bacteria is to
[00:07:28.520]work together,
[00:07:29.030]to complete a certain pathway previously unattainable because of a missing gene.
[00:07:33.230]So, for example, if Bannie was missing gene a for a specific pathway and BBR,
[00:07:37.820]he had that gene for that pathway,
[00:07:40.010]the combination of them both might result in that pathway being attainable
[00:07:44.930]for the combination. So what have we done so far?
[00:07:50.180]Um, we have a database builder program that takes input data,
[00:07:54.080]and it gives you sort of output data that's easily manipulated.
[00:07:57.230]And then we have a program that analyzes this database.
[00:08:00.290]The output of this program right now is merely a file with the resulting Jakarta
[00:08:04.130]index for each pathway and the Jakarta index for each organism for every single
[00:08:08.510]combination. So while this,
[00:08:12.670]this Jakarta index gives us a good, uh, kind of a good idea of complimentarity,
[00:08:16.520]it can definitely be improved upon. And that is what we planned to do.
[00:08:21.440]We need to add many more parameters to make our complimentarity measure much
[00:08:25.400]more robust. So we will find these parameters,
[00:08:27.680]add them to the phase two program.
[00:08:30.080]And then in order to generate the final measure,
[00:08:32.570]we will wait each of our parameters and add all of them up.
[00:08:36.290]And that some will be the output of the program,
[00:08:38.930]which is the final goal of the project. Um,
[00:08:41.900]thank you all for your time and for your consideration.

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

Pathway Coverage in Bacterial Species

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments