Presentation describing the merge function in R
Ashu Guru
Author
03/13/2019
Added
106
Plays
Description
Presentation describing the merge function in R , by Ashu Guru, Univ of Nebraska Raikes School (5 mins)
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.280]Many time we have a scenario where we have two tables which have one common
- [00:00:06.500]column and we like to join the 2 tables side by side.
- [00:00:10.920]So for example, suppose we have a data in a table and one
- [00:00:15.635]of them is stored in a file called merge CSV TXT and it looks like it has a column
- [00:00:22.383]merge on attribute 1, attribute 2 and then it has a set of rows
- [00:00:27.586]of values.
- [00:00:29.040]Similarly, we have a second table which is stored in
- [00:00:32.674]another file and in this case I'm saying it's in merge 2 CSV TXT.
- [00:00:37.640]It has two columns merge on attribute 3 and the merge on column has the same
- [00:00:44.163]column header in both the files and this is the column with which we will join
- [00:00:50.857]these two tables side by side and it has values 100.1 and 900.1 and 800.3 and 2.
- [00:00:59.240]One thing you want to notice that I have deliberately added two rows which have
- [00:01:07.094]the same joining columns value in the second table.
- [00:01:13.360]The reason I wanted to do that is just to include that in this tutorial as well.
- [00:01:20.040]So we will know the behavior of R when we like to join tables based on a common
- [00:01:26.163]column and what happens when there are values which are duplicated on the
- [00:01:31.826]joining column and if there are missing ones as well.
- [00:01:35.960]Because in this case in the first table we have a value of 10.
- [00:01:40.102]25 in the joining column, but there is no 10.25 in the joint column.
- [00:01:44.640]So we'll see the behaviour of R.
- [00:01:48.360]So what I am going to do is as always I'm going to set the working directory to the
- [00:01:54.828]folder which contains my input file.
- [00:01:57.600]So it is in the folder which you see on screen and here is the file just to show
- [00:02:05.791]that we have it in merge.
- [00:02:13.480]So this is the first file I have merge on attribute 1,
- [00:02:17.427]attribute 2 and the rows of values, separated.
- [00:02:22.800]Then I have merge 2 again 2 columns, 3 values, 3 rows of values, separated.
- [00:02:38.520]So the first step as always is to set the working directory.
- [00:02:41.960]I am going to run this instruction.
- [00:02:45.280]Next is I'm going to read the file merge dot CSV dot TXT.
- [00:02:49.720]It has headers and the separator is in a variable called tabdata 1.
- [00:02:55.720]So I'm going to run this instruction.
- [00:02:57.800]Similarly, I'm going to read the second table from
- [00:03:01.221]the CSV file in a variable called tabdata 2.
- [00:03:05.120]Now we will use the command merge which takes an argument first table,
- [00:03:10.919]the second table, and the common column with which we have
- [00:03:15.738]to join them.
- [00:03:17.480]Sorry.
- [00:03:18.040]So I'm going to run this instruction and then I will print what is the value of
- [00:03:24.173]the merge table.
- [00:03:27.160]So what we see is that R was able to join the 2 tables because I do see that there
- [00:03:33.026]is 1/3 column now attribute 3 in the merge table.
- [00:03:37.240]And what I do also notice is that it only merges the rows which have common values
- [00:03:46.848]in the joined column.
- [00:03:49.760]So the join column has 100.1, 100.2, 100. 25, 100.3 and 100 .1 and 100.3.
- [00:03:59.080]The common values in the two is 100.1, 100.1 and 100.3.
- [00:04:03.480]So the join column will only have the values which are common in the 2 tables.
- [00:04:11.560]What I also see is that since I had two values for 100.1 in the second table,
- [00:04:19.020]it has actually repeated 100.1 twice.
- [00:04:22.560]So I have 100.1, 20, 29 so 100.1, 20, 29 and then I have 100.1, 20, 28 so 100.1,
- [00:04:32.905]20, 28.
- [00:04:34.320]Then there is no 100. 2 in the second table,
- [00:04:37.265]so skip that there is no 100. 25 in the second table.
- [00:04:40.800]Skip that, there is 100. 3 in the second table,
- [00:04:45.517]so that will be 100.3, 35, 30. 100.3, 35, 30 and then 2.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/10783?format=iframe&autoplay=0" title="Video Player: Presentation describing the merge function in R" allowfullscreen ></iframe> </div>
Comments
0 Comments