Presentation on how to divide a complex problem into smaller tasks
Ashu Guru
Author
03/13/2019
Added
99
Plays
Description
Presentation on how to divide a complex problem into smaller tasks by Ashu Guru, Univ of Nebraska Raikes School (2 mins)
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.560]We will now talk about three of the most common variable types or data types that we will
- [00:00:07.760]be using in R and they are vectors, lists, and tables.
- [00:00:14.720]So let's just define what a programming variable is.
- [00:00:20.040]So you can think of programming variable as a container that will maintain a certain value
- [00:00:25.560]which we can refer to later in that same program.
- [00:00:30.540]So for example, I could define a variable called myFirstVariable and assign it a value
- [00:00:37.440]of 24.
- [00:00:38.800]Similarly, I could define another variable which I name it mySecondVariable and I assign
- [00:00:44.700]it a value 2.
- [00:00:47.480]What if I want to store a value of the sum of the myFirstVariable and mySecondVariable.
- [00:00:54.740]If I want to do that, I can do that.
- [00:00:55.540]I will do that as myThirdVariable.
- [00:00:58.580]In this case, I'm naming it as myThirdVariable.
- [00:01:01.620]It contains the value of myFirstVariable plus mySecondVariable.
- [00:01:05.620]So by the time this instruction is run, the ThirdVariable contains the value of 26.
- [00:01:11.980]That is 24 plus 2.
- [00:01:13.580]So one thing that if you have previously programmed in other programming environments, you may
- [00:01:19.500]notice immediately is that the way the assignment operator works in R is
- [00:01:25.520]different.
- [00:01:26.860]In majority of the programming languages, the assignment operator is equal to sign,
- [00:01:31.380]but in R it is a combination of less than and the hyphen.
- [00:01:36.360]I am going to launch my R studio and observe these details interactively with the R console.
- [00:01:44.760]Here is my R studio and I have created a script by creating a new R script and then typing
- [00:01:51.280]the commands that I will be using.
- [00:01:53.520]So my first instruction is similar
- [00:01:55.500]to what you saw in the PowerPoint that I am assigning a value of 24.
- [00:02:00.040]So if I have my cursor here and I click on run, R studio copies this instruction to the
- [00:02:07.900]console and runs it.
- [00:02:10.320]So now I have a variable called my first variable and it has a value of 24 assigned to it.
- [00:02:18.020]Now my cursor on the script is on line number 2.
- [00:02:23.360]So I am going to run the second instruction.
- [00:02:25.480]Now I have a second variable which has a value of 2 assigned to it.
- [00:02:32.720]Similarly I have now a third instruction which actually stores the value of some of my first
- [00:02:41.100]variable and second variable in the variable called my third variable.
- [00:02:45.640]So I could just print the value of my third variable and I do that.
- [00:02:50.800]So it shows me that the value of my third variable is currently 26.
- [00:02:55.460]So one thing you may want to notice is that I use camel case for defining the names of
- [00:02:59.820]the variables.
- [00:03:00.820]So I have my with a capital F and a capital V in my first variable.
- [00:03:06.580]I would recommend that your variables are descriptive of what value are they holding.
- [00:03:12.380]So if I have a variable that needs to maintain value of number of fruits in a box then I
- [00:03:19.140]would name it as my num fruits.
- [00:03:23.200]That way the code is much more readable.
- [00:03:25.440]And easy to maintain or modify.
- [00:03:32.300]Vector is the most fundamental object in R environment.
- [00:03:37.040]And you can think of a vector as a linear container with columns.
- [00:03:40.620]Here is a diagram that you can think of that a vector is basically a container with multiple
- [00:03:46.460]cells and these cells are arranged in a line and each of the cell has its index so the
- [00:03:55.420]first cell has an index 1, the second cell has index 2 and so on.
- [00:03:59.300]So in this case we have 1, 2, 3, 4, 5, 6 cells and of course that means we have 6 column
- [00:04:07.060]indexes and so if I define a vector which is called as my vector a then my vector a
- [00:04:15.460]currently contains 23, 12, 8, 6, 8 and 1 as the integers.
- [00:04:25.400]So the graphical representation and how do you actually create it in R is by defining my vector a and then we
- [00:04:34.220]assign the value with a small c and a parentheses and then the numbers inside
- [00:04:39.520]that parentheses. This means is that my vector a1 contains the value 23 and
- [00:04:45.860]similarly my vector a3 contains the value 8 because this is index 1 so
- [00:04:51.260]my vector a1 has 23, this is index 2, this is
- [00:04:55.380]index 3 and my vector a3 will contain the value 8. Let's try this
- [00:05:01.020]interactively. So here is the RStudio. So I have defined a vector called my vector
- [00:05:07.420]a. Once I run that if I want to print the value of the cell 1 then I run the
- [00:05:15.120]instruction my vector a1 and it shows that the value is 23 and similarly I can run what is the
- [00:05:25.360]value in my vector a cell number 3 and it shows that the value is number 8 like
- [00:05:32.740]we expected. One thing interesting that we should also notice is that even
- [00:05:38.200]though when we define a variable which has a single value so let's say my
- [00:05:42.480]variable a has a value of 25 that I assigned to it, it is equivalent to
- [00:05:47.680]defining it as a my variable with a vector notation so my variable a has a
- [00:05:55.340]single cell with the value of 25. Let's look at it interactively as well so I
- [00:06:02.460]have I'm defining a variable called my variable a and here and I assign it a
- [00:06:11.340]value of 25 so I run that so now my my name variable a has a value of 25. I run
- [00:06:20.340]the next instruction where I print the value. It shows that it has
- [00:06:25.320]25 as the value. I'm defining the same variable with a vector notation so my
- [00:06:32.040]variable a is assigned the value of 25. And next I'm going to print the my
- [00:06:40.260]variable again and you can see that the output is similar so as we expected a
- [00:06:48.800]variable which holds a single cell value even though if you define it
- [00:06:55.300]without an vector notation is actually stored in as a vector. Now let's look at
- [00:07:01.000]lists so many time we want a vector of vectors and in that case we will turn to
- [00:07:09.060]what is called as a list so a list is basically a vector of vectors and here
- [00:07:15.760]is an example. So if I want to define a new list, let's say I want to name a
- [00:07:22.020]variable my list and it should hold
- [00:07:25.280]the value of a list then the way I will define it is by saying my list is
- [00:07:30.000]assigned a value of list. And then this is the index of the list is equal to a
- [00:07:37.160]vector 2, 4, 6 the second vector is I want that to be known as my odds and it is a
- [00:07:46.360]of values 3, 7, 9 and 11 and then similarly I want a third vector in my list which is my primes
- [00:07:53.810]which is a vector of 2, 3, 5, 7 and 11. So if I want to print the second element or the second
- [00:08:05.590]vector of my list then I will type in my list double square bracket 2 because then I'm referring
- [00:08:11.850]it by index 2, 1, 2 and it will give me the odds vector. If I want I can also print it by the index
- [00:08:21.770]string that I have provided. So in this case I'm printing what is the vector for the index string
- [00:08:30.550]my odds. So here is the interactive mode. So here is my definition of the list. I'm calling it as
- [00:08:39.190]my list and I'm assigning
- [00:08:41.230]I'm creating it by the list command and the first element or the first vector of
- [00:08:49.050]this list is that may also be referred by the string my evens and has the
- [00:08:54.470]vector 2, 4, and 6. The second one may be referred by my odds. It has the values
- [00:09:02.530]3, 7, 9 and 11 as a vector. Then I also have a third vector for this list that
- [00:09:10.610]has the values 2, 3, 5, 7 and 11. So now if I run this, I have defined a new
- [00:09:23.010]variable my list. So now if I want to print the second vector by the index
- [00:09:28.550]number, I will run the command my list double square parentheses 2 and it
- [00:09:36.410]prints 3, 7, 9 and 11 as we expected.
- [00:09:39.990]I could also refer to the same vector by the index string and I can do that by my
- [00:09:51.090]list double square parentheses and in double quotes my odds. So when I do that
- [00:09:56.370]I get the same result. Now just let's just print the whole list and you can
- [00:10:07.470]see that it has
- [00:10:09.370]the first vector is my evens, the second one is my odds, the third one is my
- [00:10:15.870]primes. The last data type that I would like to talk about is what is called as
- [00:10:20.890]a table and a table can be thought of as a fast indexed data structure. So in
- [00:10:27.810]many cases we have data which is in in form of a frame or a grid as we see on
- [00:10:34.010]the screen. One of the data structures that R provides to handle
- [00:10:38.750]this kind of data is a table and you can actually have this data stored in a
- [00:10:46.370]regular text file and import the whole CSV file in R and then read that file
- [00:10:55.250]into a variable. So these are the steps that we will do and the command I'm
- [00:11:00.010]using is read dot table and it takes an argument the file name. It
- [00:11:08.130]takes another argument which is header it is true in this case if there is no
- [00:11:13.710]header then we can make it false with the capital F and in this case my since
- [00:11:19.050]we are using a CSV file the separator is a comma. So here is a working example
- [00:11:23.850]code. There is one more concept that I am going to introduce that whenever we work
- [00:11:30.330]with a program or any software it has a working directory or a space on the file
- [00:11:37.050]system
- [00:11:37.510]where it is accessing reading or writing files to if needed. In R if you want to see
- [00:11:43.810]what is the current working directory, print that by typing the command get WD,
- [00:11:50.170]which is short form for get working directory. R also provides a command
- [00:11:56.290]that can allow you to change the working directory so set WD takes an argument
- [00:12:01.650]where you provide the path to that directory. Finally we want to read the
- [00:12:06.890]CSV into a table and we do that by so in this case I am defining a variable
- [00:12:12.590]called my tab data and I use the command read the table from the working
- [00:12:18.710]directory there will be a file called one dot average of interval for CSV. It
- [00:12:25.250]has headers and the separator is a comma. So if I just want to print the
- [00:12:32.210]group on column then I can refer to that by my tab data,
- [00:12:36.270]which is the name of the main variable and then a dollar sign and then the
- [00:12:39.910]header string. Similarly I can also print get access to that column by using a my
- [00:12:48.510]tab data and a square bracket and double quote and the header string. Now let's
- [00:12:53.070]work with an interactive mode so I have... let me print the while current working
- [00:13:01.270]directory so if I click run I see that R
- [00:13:05.650]currently sees users a guru as the working directory and if I want to
- [00:13:11.890]change it to my desktop I will run this statement or this instruction "set WD" and
- [00:13:20.410]I have changed it to users a guru desktop run that instruction once I do
- [00:13:25.630]that it has changed to the desktop I can actually again print get WD and it shows
- [00:13:33.730]that now I am
- [00:13:35.030]able to read files from users a guru and desktop. On my desktop I have a file
- [00:13:42.370]called one dot average of interval dot csv dot txt. I will show that here so
- [00:13:50.330]here is the file it has group on attribute 1 attribute 2 attribute 3
- [00:13:56.130]separated by a comma and then all the data follows after that so I am going to
- [00:14:03.030]import this
- [00:14:04.410]in R by saying read the table one dot average of interval dot csv dot text it
- [00:14:14.610]has a header and is separated by a comma so I will run this instruction and once
- [00:14:26.370]done this variable now has all the value so if I just want to print the value
- [00:14:33.790]in the group on column I will use the instruction my table data dollar group
- [00:14:40.090]on which is going to be a vector so here it is so it actually has 100 100.2
- [00:14:46.970]100.5, 0 which corresponds to 100, 100.2, 100.5, 100.75 and
- [00:14:57.350]so on. Similarly I can print it in of another format by using
- [00:15:03.410]my tab data group on in the square parentheses by the index column and
- [00:15:12.410]here is the printout for that instruction so these are the most common
- [00:15:18.830]data types that one may use for writing R scripts of course there are many
- [00:15:24.710]other data types but we can explore those as we need to use them.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/10791?format=iframe&autoplay=0" title="Video Player: Presentation on how to divide a complex problem into smaller tasks " allowfullscreen ></iframe> </div>
Comments
0 Comments