How Google / YouTube Understands Content in Videos

by Apr 15, 2022Blog

What happens after you post a video to your site or YouTube? How does Google’s search engine understand the content in the video in order to provide accurate search results?

Gary Illyes and Lizzi Sassman from the Google Search Central team and Danielle Marshak, a Google Search Product Manager for Videos, explain the ways that Google understands what’s inside a video in a recent podcast episode.

Extract text from the video.

Google can use the audio from the video file to understand what words are being spoken, and then splice the words into meaningful chunks. Google can also extract text with characters in the video, through optical character recognition. For instance, the video may have a heading saying “We will discuss how Google search finds dogs”, and Google would extract this text heading to understand the important moments of the video.

However, Danielle brings up an important point that “videos are not just speech and text on images. If that’s all, they wouldn’t be quite so powerful and useful like we were talking about. Videos also have a lot of visual information, right?” So how does Google tackle this harder issue of identifying objects or motion in a video?

Use visual information.

Visual information can be extracted, such as objects, animals, or motions. Google uses machine learning to understand images, but the model still has ways to go, as the visual information is hard for Google to handle.

Danielle describes, “That’s exactly the type of thing that our technology needs to keep getting better at, because as humans, even as humans, some of those pictures are a little hard to distinguish, so we can’t necessarily expect computers to do it automatically.” There are different teams at Google working on different aspects of this visual problem, such as a research team focused specifically on improving visual perception.

Structured data is still used.

As a result of this work-in-progress status of visual recognition, Gary recommends for publishers provide more textual data to help the search engines understand the images better. Danielle explains, “Don’t abandon structure data just yet. We love structured data, because as we were talking about, even though we’ve come so far, we’re still at the tip of the iceberg, in terms of being able to really deeply understand videos… We also really rely on structured data and text signals from the page as well to make sure we understand what the video is about and what types of queries it could be useful for.

Google’s Pitfalls

Being able to understand and index videos is still a work in progress, and there are many things that the search engine still does not have the capabilities in doing.

Google wants to be able to show all relevant videos for search queries, but sometimes the relevant section is mentioned halfway through the video, and Google might not know that it’s there. Google relies the most on the title of the video when showing its search results, rather than the actual content within the video.

Another common issue is content protections, which publishers implement to prevent pirating or copying of the video. However, Danielle explains, “if these videos are kind of locked down in that way, if you see these changing URLs, then Google also can’t access the content file.” If the content is locked, Google has to rely solely on textual data.

Video Best Practices

Now that you know about these common pitfalls, here are some best practices to implement as a video publisher for optimization.

  • Use a DNS lookup to identify Google’s IP addresses or other trusted bots. Allow Google and other crawlers to access your content. Danielle says it best, “Let trusted actors understand it while keeping it out of the hands of bad actors.”
  • Use short-form videos. These are videos that are under a minute, and usually in vertical format. Google has been showing short-form videos to users more often in search queries since there is more diversity of content that can be consumed in a shorter period of time. For instance, Lizzi explains, “My go-to example is Physics Girl, the content creator who is talking about quantum mechanics or quantum in general and astrophysics, and she can deliver great content in under three minutes.” Short-form video is expanding to education and many more topics, so now is the time to try it out!


With a Google search query, users should be able to see what is most relevant, and videos are a part of that search experience. Users come to Google to get everything they are looking for all in one place. Understanding how a video is processed and understood by Google can greatly affect your impressions and views.

Still need help with video publishing and optimization? Schedule a discovery call with us today.