Using the Microsoft Video Indexer for OSINT

Working on a case in which you have to go through loads of videos? Wouldn’t it be awesome to just download the videos and have them automatically transcribed and indexed?

Imagine you are following a current event that is topic of multiple videos throughout the internet. In some cases, you might not have the time to watch each and every video yourself. Wouldn’t it be great to download all these videos into one database and have them indexed by spoken content, topics and even people that appear in the videos? And wouldn’t it be even better to be able to search for specific content in those indexed videos?

These features, and many more, are part of the tool-set that the Microsoft Video Indexer offers. Microsoft allows a trial account on this platform and it enables you to login with various different account types, among them also Gmail. Let me point out some aspects of this platform, that might be useful during OSINT investigations.

Let’s go back to August 2019. The G7 summit is taking place in France and we’re interested collecting information on this topic. This summit is all over social media and there is also quite some press reporting on it. We download videos from sources like Youtube. For this we can use Y2Mate. Either by copying the Youtube link to their website or by adding ‘pp’ to the original Youtube-URL as shown below. This will automatically redirect you to the site.


Remember, that we’re not just limited to Youtube videos. We can upload Youtube videos and any other video to the Video Indexer. It’s pretty self-explanatory, the only thing to be aware of is the video language. The default value is English. If working with videos in another language, I would advise manually adjusting the input language. I have come across issues when uploading longer videos. In case you come across problems here, trying splitting the videos.


Once the video is uploaded, it will be indexed by the platform and this is where the magic happens. Here are some of the features that are included in this process:

  • Facial recognition
  • Full transcript of the audio, including translations
  • Topic detection
  • Item/setting detection
  • Sentiment detection

Let’s have a look at one of the videos I uploaded:


The panel on the right has two tabs: insights and timeline. Under insights you will find an overview of individuals that were identified in the video and also recognized by the underlining facial recognition software. As you can see, a guy named Stefan de Vries was recognized and the bar below shows the sections in which he appears in the video (highlighted in black). It also links to Bing search results of this person. If a person is not recognized and indexed automatically, you can manually edit this.


Unknown #12 is in fact Angela Merkel. By clicking on the edit button on the top right, we can change the name. By giving the people the same name, they will be automatically merged. The following two insight categories index general topics discussed in the video and also label the scenes by what can be seen. Marking a topic or label will show the section in which this appears in the video. Clicking on that highlighted section will jump forward to that specific part in the video, which is always displayed on the left. Keep in mind, that these results are not always plausible. In my video, a scene showing Donald Trump starting to speak was labeled as toiletry (although some people consider him to be a douche).


Next up, named entities are extracted and the sentiment is evaluated. I assume the sentiment evaluation is based on the words used. Words such as good, great and awesome will likely lead to a positive sentiment rating. Remember that these words are not always used in the proper context by the speaker, so I usually ignore this feature.


Most of the data shown in the insight tab is based off the speaker transcription, which is displayed in the timeline tab. Although it works pretty well, you might need to manually edit some of the data. In this final sentence shown here, I manually edited something.: instead of “my Chrome”, the speaker said “Macron”.


Looking into a video in a foreign language? In this case you can use the translate function to make it (kind of) readable. Just click on the world icon and choose the output language and the complete text will be translated.


So, we’ve uploaded a few videos, manually edited a few things and now have a fully indexed database of videos to run queries on. Going back to the main page of your profile, you will be able to search for anything that has been indexed: text, keywords, people and labels.


Searching for “Trump” will display the search results and categorize them by result types, as they are listed above the search results. This is just an excerpt of all the results, but you can see that a person, spoken text, a named entity and even written text were found. Written text? That’s one point I almost forgot. The Video Indexer also OCRs written text in videos.


That was just a brief overview of the possibilities of Microsoft’s Video Indexer. I think it can be useful for some OSINT investigations and if you really think about using this more intensely, you might want to consider upgrading to a paid account.

I was actually thinking about uploading talks from conferences, so I could create a database in which I could query specific OSINT topics without having to watch the complete videos. A TL;DR for videos 😊

MW-OSINT / 08.03.2020

One thought on “Using the Microsoft Video Indexer for OSINT

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s