Data Management
Luqi Li
Author
08/06/2020
Added
30
Plays
Description
Keenan Amundsen - Data Management
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.840](mellow music)
- [00:00:13.120]I'm Keenan Amundsen, turfgrass geneticist
- [00:00:15.350]at the University of Nebraska-Lincoln.
- [00:00:18.160]Today for the 2020 Nebraska Virtual Turfgrass Field Day,
- [00:00:21.710]I'm gonna chat a little bit about data management.
- [00:00:25.200]I'm gonna talk a little bit about my qualifications,
- [00:00:27.340]why I'm presenting this, why I think it's interesting,
- [00:00:31.280]and just talk about some strategies
- [00:00:33.420]that you might not have thought about before.
- [00:00:39.600]So, to start, I like to think about,
- [00:00:41.810]or just really look at some definitions.
- [00:00:44.300]So, what is data?
- [00:00:46.320]So, some purists make a distinction
- [00:00:48.480]between data and information,
- [00:00:51.800]but data is typically something that's observed,
- [00:00:55.070]measured, a set of values,
- [00:00:57.830]whereas information places that data in context,
- [00:01:01.480]and often comes after the data has been analyzed.
- [00:01:05.090]For example, if I say that I'm six feet, one inch tall,
- [00:01:09.220]it doesn't mean much,
- [00:01:11.890]and it just represents a single data point,
- [00:01:13.800]so it's just data.
- [00:01:15.950]If I say that the average height of male students
- [00:01:18.190]in my class is six foot three,
- [00:01:21.300]I can then complain about my neck pains
- [00:01:23.530]having to look up at my students,
- [00:01:26.810]and then it becomes a little more informative.
- [00:01:29.230]It becomes information.
- [00:01:32.120]Or, if I need a little pick-me-up,
- [00:01:33.770]I can compare myself to the average height
- [00:01:36.870]of US males, at five nine,
- [00:01:39.210]and I can tout my above-averageness.
- [00:01:42.500]The single height measurement,
- [00:01:43.600]again, doesn't mean much by itself,
- [00:01:46.510]but when compared with others, or analyzed,
- [00:01:49.510]it provides information, so that's really the distinction
- [00:01:52.730]between data and information.
- [00:01:56.690]The picture here is of one page
- [00:01:59.040]of my ornamental pearl millet field book.
- [00:02:02.500]The chicken scratch notes don't mean much to you,
- [00:02:05.920]but the shorthand notation is informative to me,
- [00:02:08.530]and helps guide breeding decisions.
- [00:02:11.170]So some of those little notes in there
- [00:02:13.270]are nothing more than data points,
- [00:02:16.200]but combined, it really helps me make decisions
- [00:02:19.060]about future breeding decisions.
- [00:02:22.710]I volunteered to chat about this topic today
- [00:02:24.610]because I have a background in bioinformatics.
- [00:02:26.820]I use computers to analyze biological data sets.
- [00:02:29.620]Here's a file containing
- [00:02:30.630]about 20 million DNA sequencing reads.
- [00:02:33.010]I'm only showing a portion of it here,
- [00:02:35.920]and my research attempts to make some sense of it,
- [00:02:39.970]and put it in a biological context.
- [00:02:43.600]So, I use that data, for example,
- [00:02:45.350]to identify global changes in gene expression,
- [00:02:48.370]and identify genes important for how a plant responds
- [00:02:51.690]to a certain condition.
- [00:02:53.370]This figure here is representing some buffalo grass lines
- [00:02:56.530]that we were looking at to try and understand
- [00:03:00.470]how they respond to leaf spot disease,
- [00:03:04.530]and we either grew them in conditions
- [00:03:06.840]absent of the pathogen, or in the presence of the pathogen,
- [00:03:10.460]and we could see what genes turned on or off,
- [00:03:13.800]and so this is looking at global changes in gene expression,
- [00:03:16.970]so it's looking at, basically,
- [00:03:18.700]like 100,000 different genes,
- [00:03:20.640]but we can see if they're being turned on or off.
- [00:03:24.210]As a note, this isn't gonna be a talk
- [00:03:26.950]about my gene-expression work,
- [00:03:29.330]so just bear with me for a couple of slides here.
- [00:03:35.230]So, once we find out some
- [00:03:36.450]of that gene-expression information,
- [00:03:38.940]we can, for example, compare it to known species,
- [00:03:42.050]and make some decisions about the likely role
- [00:03:44.400]of the genes that we identify
- [00:03:46.170]in that conditional response that we're studying,
- [00:03:51.020]and ultimately what's most important for my program
- [00:03:53.560]is to develop molecular markers that we can use
- [00:03:55.780]to identify plants that respond
- [00:03:57.600]to a condition in a certain way
- [00:04:00.380]that maybe haven't been previously described,
- [00:04:02.900]and then we can use that in our breeding programs
- [00:04:05.680]to make informed decisions,
- [00:04:07.240]and it improves the efficiency
- [00:04:08.960]of our breeding and breeding cycles,
- [00:04:13.690]but thankfully, in turf,
- [00:04:14.720]you aren't likely dealing with millions of data points,
- [00:04:17.320]but the reality is that as computers become more powerful,
- [00:04:21.150]compare your first computer
- [00:04:23.110]to your current smartphone, for example,
- [00:04:25.570]and data becomes easier to collect and store,
- [00:04:28.700]the amount of data we have access to
- [00:04:31.730]is increasing faster than most of us
- [00:04:34.450]can keep up with, or make sense of.
- [00:04:37.610]So, what I wanna chat about today
- [00:04:38.980]is data lifecycle management.
- [00:04:42.120]It's a concept from the IT or computer science world,
- [00:04:45.670]where we need to consider,
- [00:04:46.820]or have a plan to handle our data.
- [00:04:50.600]If we keep storing data, it comes at an extra cost,
- [00:04:54.870]and there is a higher risk if it's lost.
- [00:04:58.320]There are ways to more efficiently store data.
- [00:05:00.610]So, for example, in the sequencing example
- [00:05:02.900]I gave at the beginning,
- [00:05:04.520]a typical study may have 36 files, and each is a text file
- [00:05:08.860]that's approximately three gigabytes in size.
- [00:05:12.170]After analysis, though, all of the data
- [00:05:14.450]is merged to a large table
- [00:05:16.430]that's just about 100 megabytes in size.
- [00:05:19.940]So, just thinking about that,
- [00:05:21.940]which one's better to store?
- [00:05:24.010]Those 36 files at three gigabytes a piece,
- [00:05:27.090]or a single file that's 100 megabytes in size
- [00:05:29.850]that I can store in multiple locations pretty effectively?
- [00:05:34.310]So, data lifecycle management
- [00:05:35.880]is really a plan for handling the data,
- [00:05:39.640]and it's a way of describing the creation of the data,
- [00:05:42.670]storage of data, how it's used,
- [00:05:45.250]sharing the data, archival of the data,
- [00:05:48.270]and when to destroy the data.
- [00:05:51.680]So, for the rest of the talk,
- [00:05:52.640]I'm just gonna briefly talk
- [00:05:54.730]about each of these steps as it applies turf.
- [00:06:00.520]So, data creation, this is the observation step.
- [00:06:04.940]Think about the different types of data
- [00:06:06.790]you encounter frequently on the job.
- [00:06:09.260]You likely encounter personnel records, inventories,
- [00:06:12.750]budgets, application rates, fluid flow,
- [00:06:17.130]and as a side note,
- [00:06:18.130]I'm thinking about irrigation scale here,
- [00:06:20.670]not 12-ounce scale.
- [00:06:23.040]Password management, maybe insurance,
- [00:06:25.820]equipment logs, turf growth measurements,
- [00:06:28.450]pest occurrence, seasonal weather patterns.
- [00:06:31.740]There's tons of different types of data
- [00:06:34.910]that we run into on the job,
- [00:06:37.710]and so the big thing to consider at this stage
- [00:06:41.440]is really that old analogy of garbage in, garbage out.
- [00:06:44.650]So, think about how the data's collected.
- [00:06:47.430]Who's collecting the data?
- [00:06:48.770]If you're getting it from another source,
- [00:06:50.430]like a web source for weather data,
- [00:06:54.380]how reliable and accurate is that source?
- [00:06:58.570]Is it reliable and accurate
- [00:07:01.270]as much as you need it for how you're using the data?
- [00:07:04.320]So, think about some of those kinds of.
- [00:07:06.490]Well, there are a lot of apps out there to help,
- [00:07:08.290]and other programs and things,
- [00:07:11.030]and I have an iPhone.
- [00:07:14.830]I'm not endorsing any specific products,
- [00:07:17.330]but I did a quick search for turf management,
- [00:07:22.440]and there are just a lot of apps.
- [00:07:24.923]So, the big thing is you really just wanna be selective.
- [00:07:28.930]Again, I'm not endorsing any specifically,
- [00:07:32.500]as I haven't really tried any of these.
- [00:07:36.120]QuickBooks, for example,
- [00:07:37.070]is one I've heard a lot of people talk about.
- [00:07:38.960]It's not on this list,
- [00:07:40.020]but it can help with accounting, for example,
- [00:07:45.087]and, you know, really you just have to have to look
- [00:07:48.490]at ones that are useful.
- [00:07:50.640]Probably when you get to these last two,
- [00:07:52.510]this, what was that, "Cooking Fever," or something,
- [00:07:56.020]and SpongeBob apps, they're probably less relevant.
- [00:07:59.260]So, you need to screen through,
- [00:08:00.770]and figure out which apps are useful and important.
- [00:08:06.830]So, next we wanna think about storage.
- [00:08:09.130]How are you gonna store that data after you collect it?
- [00:08:11.670]So, once you collect the data, what do you do with it?
- [00:08:14.190]Pencil and paper, like that pearl millet field book
- [00:08:16.550]I showed an image of earlier?
- [00:08:20.050]That's certainly an effective way for storing the data.
- [00:08:23.010]It's much more difficult to do
- [00:08:24.560]anything with that downstream.
- [00:08:26.950]I can't really run any analysis on that
- [00:08:29.430]without first transcribing it into a spreadsheet,
- [00:08:33.390]or something like that,
- [00:08:36.309]and so do you wanna keep it as pencil and paper,
- [00:08:38.900]and throw it on a bookshelf, or do you wanna digitize it?
- [00:08:41.870]So, consider your short-term storage
- [00:08:45.240]and use options, or what you plan to do with the data.
- [00:08:49.480]So, we'll talk about that in a little bit,
- [00:08:51.400]but the best best example I can think of here
- [00:08:54.110]is that before the 2017 Nebraska Turf Conference,
- [00:08:59.430]I was planning to give three different talks,
- [00:09:02.470]and within the week proceeding the conference,
- [00:09:05.270]my desktop crashed, my backup hard drive failed,
- [00:09:09.010]and my USB key became corrupted,
- [00:09:11.680]and so I had all of my talks in those three locations,
- [00:09:15.260]and I thought I was diligent about backing them up,
- [00:09:17.730]storing them, but they crashed.
- [00:09:21.100]All three, I managed to lose all three.
- [00:09:23.550]Thankfully, I'd emailed one of my colleagues
- [00:09:26.310]to look over my talks to get a little feedback,
- [00:09:29.600]and so I was able to pull the talks out of my email,
- [00:09:32.830]and so I didn't have to start over,
- [00:09:35.080]but the critical piece to think about here
- [00:09:37.530]is how to make the data accessible for use,
- [00:09:41.210]and how should it be stored in the short term?
- [00:09:43.850]So, consider the value of the data,
- [00:09:46.200]use cloud storage, whenever possible,
- [00:09:48.380]not a PC with an attached, external hard drive.
- [00:09:53.740]For example, if you had an office fire,
- [00:09:55.390]or something, you'd lose both, right,
- [00:09:57.280]so only use a USB key to transfer data
- [00:10:00.690]from one PC to another, and never for long-term storage.
- [00:10:04.210]So, just think about how you wanna store that data, right?
- [00:10:07.850]So as you collect that data,
- [00:10:09.750]what's the most effective way to store it?
- [00:10:17.030]Data by itself, like my height measurement,
- [00:10:19.450]isn't all that useful,
- [00:10:21.170]so think about how you wanna use the data.
- [00:10:24.230]Does it need to be analyzed to compare growth over time?
- [00:10:28.900]Labor hours spent on certain jobs, historical records,
- [00:10:33.810]pest trends associated with climate events, for example.
- [00:10:38.120]The use part of the lifecycle
- [00:10:39.830]is where the data becomes useful, so plan ahead.
- [00:10:44.210]When I collect data in the field, for example,
- [00:10:46.300]I use an iPad or my iPhone,
- [00:10:48.890]and collect it right into a spreadsheet application
- [00:10:51.580]where I can upload it directly to the cloud,
- [00:10:54.010]and analyze it with some predefined macros,
- [00:10:57.000]and I can do all of that in a single step,
- [00:10:59.600]and then it's basically analyzed
- [00:11:01.200]when I get back to my computer
- [00:11:03.290]and actually wanna do something with it.
- [00:11:05.280]So, that pre-planning can save a lot of time,
- [00:11:08.840]and really improve the efficiency,
- [00:11:10.880]so it's worth spending that time up front
- [00:11:13.660]to figure out how you wanna use the data.
- [00:11:19.980]Data sharing, so this is the reporting phase.
- [00:11:23.510]In some instances,
- [00:11:24.343]the data is just important to you
- [00:11:26.520]as you're making informed decisions,
- [00:11:28.340]but other times reports, summaries need to be prepared.
- [00:11:32.290]So, create or use templates to simplify the process,
- [00:11:35.530]and use preexisting tools, it really helps.
- [00:11:39.380]Archival, so here it's important to place
- [00:11:42.670]a value on the data or information.
- [00:11:47.380]I worked for a developer in high school,
- [00:11:49.200]mostly pulling weeds,
- [00:11:50.510]but Doug Pearl was a very good business person,
- [00:11:53.970]and could assign a value to pretty much anything,
- [00:11:57.370]and I just remember he used to give me long lectures
- [00:11:59.910]about the value of an object
- [00:12:01.990]versus the time it's in a $20 storage unit.
- [00:12:05.830]The math is simple,
- [00:12:07.640]and the overtime costs are easy to calculate,
- [00:12:10.200]but it's easy to justify that short-term use.
- [00:12:14.610]Another analogy is when you have a meeting,
- [00:12:17.230]consider the folks in the room, and their salaries.
- [00:12:20.390]Is the hours spent worth the X dollars of everyone's time?
- [00:12:25.050]And so you can, you have to think about it a little bit,
- [00:12:28.330]but you can assign value to your data.
- [00:12:32.170]Particularly if you're paying for data storage,
- [00:12:34.410]do you have to?
- [00:12:36.370]Is it really worth storing that in the long-term?
- [00:12:42.090]And again, I'm not sure how well those analogies work here,
- [00:12:44.590]but think about the value of your data.
- [00:12:49.580]Is it worth that X dollars in cloud storage?
- [00:12:54.310]So, what's worth keeping?
- [00:12:56.060]It's important to think about that.
- [00:12:58.440]Raw data is probably not that valuable,
- [00:13:00.650]unless you highly annotate it,
- [00:13:03.100]or it won't make much sense in a year.
- [00:13:06.090]The reports and analyzed information
- [00:13:08.330]is likely much more valuable for accessing
- [00:13:11.090]or assessing long-term trends,
- [00:13:13.490]so consider your storage space,
- [00:13:15.460]and constraints on that space.
- [00:13:22.680]This step is really the simple one,
- [00:13:26.880]but can be challenging for a lot of people.
- [00:13:30.360]If the data has no value anymore,
- [00:13:32.900]actively delete it, remove it.
- [00:13:39.290]At the university, we're going through an exercise now
- [00:13:41.650]of transitioning to a different cloud storage service,
- [00:13:46.150]and in doing so, I'm realizing that I have
- [00:13:49.150]gigabytes of files that I now have to go through
- [00:13:51.900]because I didn't spend time deleting stuff
- [00:13:54.410]that I just don't care about anymore,
- [00:13:56.630]but I have important stuff mixed in there,
- [00:13:59.280]and so I can't just delete everything.
- [00:14:01.670]I need to now go through and figure out
- [00:14:04.960]what has value and what doesn't, and delete it.
- [00:14:06.810]So, if you actively have a plan for doing that,
- [00:14:09.130]and you adhere to this data lifecycle management plan,
- [00:14:13.420]you would delete those things
- [00:14:15.150]as a regular part of that data lifecycle,
- [00:14:20.520]and in routinely deleting that stuff,
- [00:14:23.300]it'll really save a lot of headaches later on.
- [00:14:30.990]So, this was a pretty quick talk.
- [00:14:32.170]What I hoped to accomplish with this talk
- [00:14:33.790]was to get you thinking about your data
- [00:14:35.650]in a way you might not have thought about it before,
- [00:14:39.000]and really the big thing
- [00:14:39.880]is just to develop a plan and stick to it.
- [00:14:43.200]If you have any questions, I work with lots of folks,
- [00:14:45.540]mainly students, but I work with folks all the time
- [00:14:48.250]to tailor solutions, so feel free to reach out.
- [00:14:53.030]I'm not familiar with any of these apps,
- [00:14:54.720]but I'm really good at figuring things out on the computer,
- [00:14:58.500]that's my computer background, and I enjoy doing that,
- [00:15:02.380]so if you do have questions, feel free to reach out.
- [00:15:04.130]My contact information is here.
- [00:15:09.395](mellow music)
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/14051?format=iframe&autoplay=0" title="Video Player: Data Management" allowfullscreen ></iframe> </div>
Comments
0 Comments