Computational Video Editing for Dialog Driven Scenes
Non-linear editing is of particular interest to me (given my background as one of the original developers of ProTools digital audio editing system). So when i watched the video below and read the associated paper i got very excited. Let's take a look at a demo and then we can comment further.
So we're looking at an experimental system to edit dialog driven scenes in a video. And conventionally this kind of editing is done in a non-linear video editing system. So Avid or Final Cut or Premiere would be common systems used for almost everything to watch today. All of which have their humble beginnings back in the golden olden times when we were doing that original Pro Tools development effort.
And as they say in the video demo above, hand editing of this kind of scene by a human editor using one of these non-linear editing systems is kind of a pain. Time consuming, tedious. So developing semi-automated systems that can do a lot of the manual grunt work that is needed to created finished edited scenes is really exciting. At least to me, conventional film or video editors might feel somewhat intimidated by them.
This work is an example of what people are coining 'smart media'. It's a pretty wild topic area. One that has all kinds of amazing implications as you think through how it's going to totally transform the media landscape over the next decade.
There is a general issue with any AI automated 'intelligent' system that you want creative professionals and artists to use in their work. It's what i call the marriage of the artist and the intelligent automated system. It's something we spent a lot of time thinking about and developing in Studio Artist, which is a program for digital artists that includes computational intelligence in it.
Now Studio Artist is 20 years old at this point, so we were restricted by the technologies available at the time (enhanced over the years of course). But technology marches on, and the kinds of things you can think about doing today are very exciting.
The work described in the video above is trying to do a few different things. Bring automatic scene segmentation and labeling into the video editing pipeline. Generate composable representations for film and video editing. And move the user interface for non-linear editing out of the frame based basement to higher cognitive levels. With the ultimate goal of developing 'smart' editing systems that let a digital artist work at a much higher conceptual level when editing video.
Now because of my background, when i looked at this work i immediately thought about how it could be applied to digital audio editing. Because non-linear video editors and digital audio editors are essentially the same on a conceptual level. In non-linear video editing you are working with segments of video frames at the basement frame level. In non-linear audio editing you are also working with segments of digital audio files, at the sample level potentially, but usually at a higher bar/beat level of organization (since that is how music in general is structured).
I don't think you should view these systems as replacing the video editor. More as a tool to enhance the creative range of the video editor. Note that the system described in the video above allows a film editor to work at a much higher conceptual level than conventional frame based video editing systems allow.
Here's a link to the paper associated with the work described above if you are interested in learning more about what is going on under the hood of this 'synthetic media' system.