Movie Assembly with Artificial Intelligence
Simon Schoenbeck
Author
04/04/2021
Added
22
Plays
Description
Video poster presentation 2021 for the Movie Assembly with Artificial Intelligence project from the Story, Worlds, Speculative Design Lab. This project uses artificial intelligence to generate a new timeline from a template timeline and a set of scenes.
Searchable Transcript
Toggle between list and paragraph view.
- [00:00:00.000]Hello everyone, I am Simon Schoenbeck and welcome to this UCARE presentation on
- [00:00:04.019]Movie Assembly with Artificial Intelligence.
- [00:00:07.889]Through my work with Professor Ash Smith, I created a system that allows for the
- [00:00:12.126]automated creation of novel movies from a template movie and a set of input clips.
- [00:00:17.738]The goal of this project was to research applications of Machine Intelligence
- [00:00:21.648]and Machine Learning in the arts.
- [00:00:24.288]Professor Smith is a filmmaker, and she was interested in the application of
- [00:00:28.114]Machine Learning to film making.
- [00:00:29.835]There are many machine learning techniques that can be used
- [00:00:32.698]for film making.
- [00:00:33.833]The most common are extrapolations of the picture to picture strategy.
- [00:00:37.474]This includes frame interpolation which can be used to increase frame rates.
- [00:00:41.527]Video-to-Video which creates realistic video
- [00:00:44.054]from a sequence of semantic segmentation masks.
- [00:00:46.517]Deep fakes for transferring one actor’s motions to another.
- [00:00:49.789]and AI assisted rotoscoping for editing out backgrounds or adding in elements.
- [00:00:54.939]While many of these techniques can assist in the movie making process
- [00:00:58.459]most are not applicable to scenes.
- [00:01:00.902]I am defining scenes as a continuous series of frames with similar content,
- [00:01:04.779]usually shot from the same camera.
- [00:01:07.396]The use of machine learning on multiple scenes is not common application.
- [00:01:11.697]This developed into the objective for this project.
- [00:01:15.840]The objective for this project was:
- [00:01:17.679]Given a template timeline and a set of scenes,
- [00:01:20.049]order these scenes into a novel timeline.
- [00:01:23.442]The first step in developing a machine learning algorithm
- [00:01:26.382]is understanding the features of the data set.
- [00:01:28.524]For our objective, the input will be mp4 video files.
- [00:01:31.532]Specifically, one long video file that is used as the timeline template
- [00:01:36.448]which we will assume has multiple scenes
- [00:01:39.978]and a set of video files which we assume contain one scene each.
- [00:01:44.471]To create a timeline, we must know when the scenes start and end
- [00:01:49.203]in the timeline template.
- [00:01:51.348]The long video is sent through a scene detector which compares pairs of frames
- [00:01:55.495]and if the frames are different enough
- [00:01:58.003]then the system marks it as a scene change.
- [00:02:00.885]Next, we must analyze each scene.
- [00:02:03.765]While we could cut out each scene and process all of the scenes as mp4 files,
- [00:02:07.865]this would be computationally expensive.
- [00:02:10.741]Our solution is to take the middle frame from each scene and parse these images
- [00:02:14.883]into useful features.
- [00:02:16.778]Since we have the start and end frame of each scene, we can take the average
- [00:02:20.664]to find the middle frame from each scene.
- [00:02:23.490]With these frames we can extract features.
- [00:02:26.367]The features that I selected were color and objects.
- [00:02:29.252]The color detector measures how similar every pixel is to 27 colors
- [00:02:34.766]and the object detection is handled by
- [00:02:37.601]an object detection program “Darknet” running Yolov3.
- [00:02:42.290]These produce a csv file with each row representing a different scene
- [00:02:47.338]and the columns representing different color and object features.
- [00:02:52.581]Now that we have a refined set of features we have a few options for
- [00:02:56.492]creating a new timeline.
- [00:02:58.392]The most straightforward method is directly comparing the given scenes to
- [00:03:02.096]the template timeline and picking the best scene for each.
- [00:03:06.837]This is done with a scene comparison method.
- [00:03:09.607]The difference of each feature is calculated and multiplied by a weight.
- [00:03:13.362]The weights are a measure of which features are more or less important.
- [00:03:17.325]These differences are summed into a difference score.
- [00:03:20.622]The given scene with the lowest
- [00:03:22.230]difference score is selected and placed in the timeline.
- [00:03:24.997]This repeats for each scene in the template timeline.
- [00:03:28.611]The final output of this method is a timeline that is very similar
- [00:03:31.973]to the input movie.
- [00:03:33.411]The second option is to create a machine learning model
- [00:03:36.235]that predicts the next scene.
- [00:03:37.924]Since the scenes occur in chronological order
- [00:03:40.600]this is data is considered a time series.
- [00:03:42.748]One of the best machine learning methods to predict a time series
- [00:03:46.306]is a neural network.
- [00:03:47.940]I used the python package TensorFlow to construct a neural network.
- [00:03:51.551]We train the model by feeding in portions of the template timeline
- [00:03:54.724]and having it predict the next scene.
- [00:03:56.521]This predicted scene is compared to the actual scene
- [00:03:59.494]and the difference is considered loss.
- [00:04:01.578]As the model trains the weights are adjusted to decrease the loss.
- [00:04:05.504]With a trained model we can predict a new timeline.
- [00:04:08.813]We begin with blank lines to indicate the start of the movie and feed this
- [00:04:12.528]into the model.
- [00:04:13.755]The model generates a predicted scene.
- [00:04:16.338]Using the same scene comparison method as before we select the best scene.
- [00:04:20.068]This scene is added to the timeline.
- [00:04:22.285]Then we generate the next scene using the new timeline.
- [00:04:25.914]This process can run infinitely and can be stopped by setting a limit to how many
- [00:04:30.086]scenes are generated or the total runtime of the generated timeline.
- [00:04:34.365]The result is a timeline of scenes that are similar to the template video.
- [00:04:40.521]If the timeline was generated directly from the template,
- [00:04:43.933]then the resulting timeline will have the same cut timings as the original movie
- [00:04:47.760]and the audio can be layered over the new scenes.
- [00:04:51.109]This produces the effect of replacing every scene in a film
- [00:04:54.385]with a different scene but keeps all of the sound timings.
- [00:05:14.719]Using the machine learning model to create a timeline leads to abstract
- [00:05:18.939]reinventions of the film since the model will generate target scenes based on both
- [00:05:23.148]the template movie and the given set of scenes.
- [00:05:26.835]The final process is able to analyze, select, and edit together clips
- [00:05:31.014]into a film with no user input.
- [00:05:34.287]The time taken to edit a set of clips into a feature length film is under
- [00:05:38.457]eight hours of computation.
- [00:05:40.482]The time investment is significantly shorter than editing manually.
- [00:05:44.242]While this process may be of limited usefulness in the film industry
- [00:05:47.512]where each scene is crafted meticulously, it could be used
- [00:05:50.602]in an artistic manner.
- [00:05:52.819]Maintaining timing with sound and scene similarity
- [00:05:56.020]would lend this process to parody and music videos.
- [00:05:58.884]This is a novel application and it needs further refinement,
- [00:06:02.611]but it is a useful step towards AI based editing systems.
- [00:06:07.515]Future work could involve more advanced feature selection.
- [00:06:10.335]These would include
- [00:06:11.595]detecting the shot type, center of attention, and position of objects
- [00:06:15.360]would be helpful in determining if two scenes match.
- [00:06:18.139]Another change would be to increase the number of frames taken from each scene.
- [00:06:22.867]This could further improve scene matching
- [00:06:25.398]by selecting an optimal set of frames from long sequences.
- [00:06:29.507]I would like to thank Professor Ash Smith for letting me work on this project,
- [00:06:33.528]Raber Umphenour for his input and knowledge of movie editing,
- [00:06:37.847]And Sean Strough for assisting me
- [00:06:40.643]in running this system on the university’s machines.
- [00:06:43.632]This has been Simon Schoenbeck, thank you and goodbye.
The screen size you are trying to search captions on is too small!
You can always jump over to MediaHub and check it out there.
Log in to post comments
Embed
Copy the following code into your page
HTML
<div style="padding-top: 56.25%; overflow: hidden; position:relative; -webkit-box-flex: 1; flex-grow: 1;"> <iframe style="bottom: 0; left: 0; position: absolute; right: 0; top: 0; border: 0; height: 100%; width: 100%;" src="https://mediahub.unl.edu/media/16338?format=iframe&autoplay=0" title="Video Player: Movie Assembly with Artificial Intelligence " allowfullscreen ></iframe> </div>
Comments
0 Comments