Searching video using audio and text

This CNet article on searching video by leveraging audio presents an interesting idea. Sounds like an interesting challenge. Working off of closed caption text makes sense too–assuming it exists.

Now let’s say you have a series of “video” cameras in your home, taking pictures all the time, and after Aunt Mini leaves for an afternoon visit, you want to ask the camera system to produce a picture of you and your aunt–that captures the surprise on her face when you gave her a birthday present. Here there wouldn’t be closed captioning to leverage, but a combination of audio, timeline motion indexing, and interactive selection by the user, in which you might locate just the right time from which you could tell the system to synthesize a picture from a perspective you want.

Here’s another “search” problem I find interesting. Imagine you’ve misplaced your sun glasses. You know it’s somewhere in the house. What if you could call up the vision system within the house to display your path throughout the house over the last two hours. It might sound invasive, but it could also help you find your glasses.

Lots of interesting technological issues here, such as archiving the video, combining mulitple video streams, indexing, a query and command language, synthesizing images from a virtual camera, and on and on.

Discuss in our community

Searching video using audio and text

Latest news

Related news