AI Opportunities: Transforming Coverage of Live Events, The AI in Production team at BBC R&D  is looking how Artificial Intelligence & Machine Learning could transform the business of producing media. Can they increase the range of programmers that broadcasters like the BBC could offer? or is it possible for them to broadcast every music festival in the UK?

We started our research with a project aimed squarely at broadening coverage in this way, and opening up access to events that it would be impractical or un-affordable to cover using conventional techniques.  In our prototype system, which we have named “Ed”, a human operator sets up fixed, high resolution cameras pointing at the area in which the action will take place, and then the automated system takes over.  It attempts to create pleasingly framed shots by cropping the raw camera views.  It then switches between those “virtual cameras” to try and follow the action.  In many ways, this project is a successor to our prior work on automated production: the basic concept of covering live events by cutting between virtual cameras was explored previously by our Primer and SOMA projects.

One of the things that working with AI technologies really highlights is that there are big differences between how even “intelligent” computer systems view the world and how people do.  If we think about the “unscripted” genres of television, such as sport, comedy and talk shows, most people would have little difficulty in identifying what they want to see depicted on the screen – it’ll usually be the action around the ball in a game of football, for example, or the people who are talking in a televised conversation.  AI systems have no idea what we humans are going to find interesting, and no easy way of finding out.  We therefore decided to keep things simple: this first iteration of “Ed” looks for human faces, and then tries to show the viewer the face of whoever is talking at any given point in time.  These relatively simple rules are a reasonably good match for any genre consisting of people sitting down and talking – in particular, comedy panel shows, which is therefore the genre we have been targeting.

Our first version of Ed is entirely driven by rules like these.  We generated them by asking BBC editorial staff about how they carried out these tasks in real productions.  To frame its shots, Ed rigidly applies the kinds of guideline that students get taught in film schools: the “rule of thirds”, “looking room”, and so forth.  Selecting which shots to show and when to change shots is similarly rule-based.  Ed tries to show close-ups when people are speaking, and wide shots when they aren’t. It tries not to use the same shot twice in quick succession.  It changes shots every few seconds, and tries not to cut to or from a speaker shortly after they start speaking or shortly before they stop again.

Having created a working system, we needed to test it.  We’re proponents of “user-centred” approaches, and we believe that ultimately, the only test of our system that matters is what real audience members think of it. We want to compare our system’s decision-making, and the quality of the ultimate viewing experience, to that of human programme-makers.  We have a series of formal studies planned to evaluate and improve Ed, and we started with an evaluation of shot-framing.

READ MORE ON(AI Opportunities: Transforming Coverage of Live Events): BBC