Drawing RNA Secondary Structure
Alexander Batelaan
Author
08/04/2020
Added
501
Plays
Description
This research project focused on improving the rna_draw python program to draw RNA secondary structures by using matplotlib.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:01.476]Hello, my name is Alexander Batelaan
- [00:00:04.436]and I will be presenting the research
- [00:00:06.446]that I have been working on this
- [00:00:07.630]summer which is “Drawing RNA
- [00:00:09.398]Secondary Structures”
- [00:00:13.530]RNA is a very essential molecule in
- [00:00:15.440]cell biology. It transports genetic
- [00:00:17.390]instructions stored in DNA to
- [00:00:20.680]the cytoplasm to generate proteins.
- [00:00:23.290]RNA can fold into complex structures
- [00:00:25.700]which have the potential
- [00:00:27.140]to be used for nanomedicine.
- [00:00:30.060]In our research we wanted to
- [00:00:31.790]improve a program that can draw
- [00:00:33.130]the secondary structure of RNA.
- [00:00:35.310]The program, called rna_draw, used
- [00:00:37.300]a cairosvg library to generate images.
- [00:00:41.020]But, cairosvg is not a part of the
- [00:00:43.110]standard python package. So, we
- [00:00:44.990]decided to replace the cairosvg functions
- [00:00:47.577]with matplotlib functions which are part
- [00:00:49.943]of the standard python package.
- [00:00:52.450]It would be helpful to efficiently
- [00:00:54.180]draw the structure of RNA because
- [00:00:56.020]RNA is biologically important.
- [00:00:58.500]This image shows how RNA is
- [00:01:00.120]part of the process of decoding
- [00:01:01.800]genetic information to build proteins.
- [00:01:05.800]RNA folding is essential for its
- [00:01:07.610]biological functions. RNA can fold
- [00:01:10.850]according to primary, secondary,
- [00:01:13.400]and tertiary structure. Primary
- [00:01:16.150]structure is the sequence of the
- [00:01:17.730]RNA nucleotides, secondary
- [00:01:19.890]structure is how the sequence
- [00:01:21.190]forms base pairs in a 2D space,
- [00:01:24.430]and tertiary structure is how the
- [00:01:26.120]2D structure folds in on itself to
- [00:01:28.300]make a 3D shape. There are programs
- [00:01:31.580]that can draw RNA secondary
- [00:01:33.500]structures. The rna_draw python
- [00:01:36.030]files developed by the Yesselman
- [00:01:37.850]Research group is an example of a
- [00:01:39.580]program that can draw
- [00:01:40.810]RNA secondary structures.
- [00:01:44.646]The rna_draw program works
- [00:01:46.236]by translating structure and
- [00:01:47.806]sequence inputs into an image
- [00:01:49.286]of the RNA secondary structure.
- [00:01:52.146]The secondary structure of RNA
- [00:01:53.936]can be represented by
- [00:01:55.256]dot-bracket notation. In
- [00:01:57.486]dot-bracket notation the brackets
- [00:01:59.296]represent paired nucleotides and
- [00:02:01.666]dots represent unpaired nucleotides.
- [00:02:05.086]If you look at the image of the
- [00:02:06.636]dot bracket notation, you can see
- [00:02:08.396]that the arrows show which of
- [00:02:09.986]the brackets are connected to
- [00:02:11.516]each other. For example, the first
- [00:02:14.036]open bracket is paired with the
- [00:02:15.636]last closing bracket. This shows
- [00:02:17.606]that the first G in the sequence is
- [00:02:19.676]paired with the last C in the sequence.
- [00:02:22.206]The rna_draw program reads the
- [00:02:24.636]dot-bracket notation and produces
- [00:02:26.786]a pairmap. Then, the pairmap produces
- [00:02:29.406]a tree. The tree is used to find the
- [00:02:31.746]coordinates for each nucleotide.
- [00:02:34.106]After the coordinates of the
- [00:02:35.826]nucleotide are found, the picture
- [00:02:38.326]of the RNA secondary structure is drawn.
- [00:02:44.403]My research focused on changing
- [00:02:46.523]how the RNA secondary structure
- [00:02:48.333]image was drawn. I replaced cairosvg
- [00:02:51.463]functions with matplotlib functions.
- [00:02:54.403]One of the problems that I ran into
- [00:02:55.983]was determining the correct figure
- [00:02:57.653]size. I solved this problem by testing
- [00:03:00.063]RNA structures with a range of
- [00:03:01.773]different sizes. I printed out the
- [00:03:04.303]area covered by the RNA structures
- [00:03:06.083]according to the coordinates and
- [00:03:07.863]defined a figure size variable which
- [00:03:09.763]I set at a correct figure size for
- [00:03:11.583]each test structure. I plotted
- [00:03:14.333]Area versus the Figure Size Variable
- [00:03:16.303]and fit the curve. I then used the
- [00:03:18.923]curve fit equation to define the
- [00:03:21.493]figure size for all RNA structure sizes.
- [00:03:24.733]The four RNA structures that I tested
- [00:03:26.953]were the hairpin loop, t-RNA,
- [00:03:29.703]corona virus 5’ untranslated region,
- [00:03:33.173]and the 50S-ribosome. I tested from
- [00:03:35.863]one of the smallest RNA structures,
- [00:03:37.883]the hairpin loop, all the way up to
- [00:03:39.373]the 50S-Ribosome which is the
- [00:03:41.123]largest RNA structure known.
- [00:03:43.583]This gives rna_draw the capability
- [00:03:45.873]to plot any size of RNA structure.
- [00:03:49.534]In conclusion, the curve fit that
- [00:03:51.294]I used to control the figure size
- [00:03:52.884]worked well. It successfully drew
- [00:03:54.944]RNA structures with a large
- [00:03:56.624]range of different sizes. The
- [00:03:58.564]quality of the fit was determined
- [00:04:00.364]visually by comparing the resulting
- [00:04:02.074]images. I was also successful
- [00:04:05.324]in modifying rna_draw to use
- [00:04:07.324]matplotlib instead of cairosvg.
- [00:04:10.364]This makes rna_draw a more
- [00:04:11.894]useful tool for RNA chemistry
- [00:04:13.544]research groups. It allows rna_draw
- [00:04:16.404]to be used on Jupyter notebook
- [00:04:17.874]for Mac and Windows computers.
- [00:04:20.024]Here is an example of how to use
- [00:04:21.784]rna_draw. First, we define a secondary
- [00:04:32.084]structure, then we define a sequence, then we define a color
- [00:04:39.580]scheme. Then we run all, and we get
- [00:04:48.860]the resulting image.
- [00:04:55.250]In the future we are planning to
- [00:04:57.080]improve the secondary structure
- [00:04:58.700]images generated by rna_draw
- [00:05:00.670]even further. Currently, some of the
- [00:05:03.080]larger RNA structures are drawn
- [00:05:05.160]with overlapping regions. We are
- [00:05:07.640]planning to reorganize the coordinates
- [00:05:09.950]of the nucleotides into a python class
- [00:05:12.690]to make it easier to manipulate
- [00:05:14.170]and solve this problem.
- [00:05:16.242]Here are my references and I
- [00:05:18.172]would like to thank UCARE for
- [00:05:19.832]sponsoring this research. I would
- [00:05:21.762]also like to thank Dr. Yesselman
- [00:05:23.632]and Chris Jurich for mentoring me
- [00:05:25.512]and helping me so much with
- [00:05:26.632]learning all of the python programming.
- [00:05:29.302]Thank you for listening
- [00:05:30.522]to my presentation.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/13982?format=iframe&autoplay=0" title="Video Player: Drawing RNA Secondary Structure" allowfullscreen ></iframe> </div>
Comments
0 Comments