YouTube protects copyright with artificial intelligence

Harris Cohen, senior product manager of Content ID at YouTube.

Harris Cohen, senior product manager of Content ID at YouTube. – YouTube protects copyright with artificial intelligence. YouTube is using artificial intelligence to thwart a game of cat and mouse by users circumventing copyright. The Google-owned service already has algorithms for detecting copyright movie, video and music content that users post on YouTube.

Over the years, some users have developed tricks for getting around detection.

Some have posted video with colours reversed, or images of each frame reversed vertically or horizontally. Other techniques include altering colours, changing the aspect ratio, cropping frames and using a halo effect. The idea is to make video unrecognisable as copyright content.

Some users have chosen to alter copyright music by lowering or raising sound by an octave, or storing tracks backwards.

Users can share with others who reverse the effects.

YouTube however is fighting back. It has been delving into the world of artificial intelligence and machine learning to dissemble video and music, and outfox these cunning operators.

“That’s what we’re using machine learning for, to take out these things, and to work out they are the same image,” said Harris Cohen, senior product manager of Content ID at YouTube.

Cohen and his team use a “content ID”, a unique identifier for each slice of media uploaded by a rights’ holder. His system involves storing files about content as well as content itself.

“We have relationships with lots of different rights holders around the world, at this point thousands of them,.” Cohen said.

“They provide us with reference files, media files that are samples of their content. They give us various other attributes, such as what is the title of that content, the nature of their rights, and what territories they own the content in.

“We take that reference file and create a digital fingerprint and we compare that to user uploads.”

The matching process involves checking uploaded media against the version offered by the rights’ holder.

“It always starts with a media file that we can use as that example that we compare to”, he said. “When we find something we follow the instructions that the rights holder has given us.”

He said content owners typically exercised three options. They could block the content to make it unavailable. They may let the content stream but opt to track and collect data about its usage. The third option was to monetise the content, displaying ads whenever the breached content is accessed.

He said rights’ holders chose mainly not to pursue copyright breaches. “These days, about 90 per cent of the time rights holders choose to monetise rather than block a track. This has emerged as a multi-million dollar revenue stream, a win-win situation,” he said.

“It’s about creating new opportunities out of what was primarily a user legal challenge, where fans who want to use content, share it and celebrate it get to do that.”

Google recently announced it had generated $2bn in revenue from music content.

He said YouTube also could extrapolate a melody from a track to find covers songs.

But does the algorithm get it right all the time? What happens if two people independently film the same event?

“That’s where the magic comes in,” said Cohen. “By doing a complicated statistical analysis frame-by-frame, we can often distinguish between two very similar takes.”

YouTube sees machine learning as a way to quickly identify and thwart new techniques for disguising copyright content.

“As soon as we are aware of that we’re can generate a new fingerprint and watch that much more quickly than we could in the past.

“We’ve got dozens of engineers working on this problem all the time so that we can stay on top of these things and identify the content that we need to be identifying.”

He said in the past it might have taken 6 months to revamp the fingerprint technology, now it could be done on a much faster basis.