Using the Microsoft Video Indexer for OSINT

Working on a case in which you have to go through loads of videos? Wouldn’t it be awesome to just download the videos and have them automatically transcribed and indexed?

Imagine you are following a current event that is topic of multiple videos throughout the internet. In some cases, you might not have the time to watch each and every video yourself. Wouldn’t it be great to download all these videos into one database and have them indexed by spoken content, topics and even people that appear in the videos? And wouldn’t it be even better to be able to search for specific content in those indexed videos?

These features, and many more, are part of the tool-set that the Microsoft Video Indexer offers. Microsoft allows a trial account on this platform and it enables you to login with various different account types, among them also Gmail. Let me point out some aspects of this platform, that might be useful during OSINT investigations.

Let’s go back to August 2019. The G7 summit is taking place in France and we’re interested collecting information on this topic. This summit is all over social media and there is also quite some press reporting on it. We download videos from sources like Youtube. For this we can use Y2Mate. Either by copying the Youtube link to their website or by adding ‘pp’ to the original Youtube-URL as shown below. This will automatically redirect you to the site.


Remember, that we’re not just limited to Youtube videos. We can upload Youtube videos and any other video to the Video Indexer. It’s pretty self-explanatory, the only thing to be aware of is the video language. The default value is English. If working with videos in another language, I would advise manually adjusting the input language. I have come across issues when uploading longer videos. In case you come across problems here, trying splitting the videos.


Once the video is uploaded, it will be indexed by the platform and this is where the magic happens. Here are some of the features that are included in this process:

  • Facial recognition
  • Full transcript of the audio, including translations
  • Topic detection
  • Item/setting detection
  • Sentiment detection

Let’s have a look at one of the videos I uploaded:


The panel on the right has two tabs: insights and timeline. Under insights you will find an overview of individuals that were identified in the video and also recognized by the underlining facial recognition software. As you can see, a guy named Stefan de Vries was recognized and the bar below shows the sections in which he appears in the video (highlighted in black). It also links to Bing search results of this person. If a person is not recognized and indexed automatically, you can manually edit this.


Unknown #12 is in fact Angela Merkel. By clicking on the edit button on the top right, we can change the name. By giving the people the same name, they will be automatically merged. The following two insight categories index general topics discussed in the video and also label the scenes by what can be seen. Marking a topic or label will show the section in which this appears in the video. Clicking on that highlighted section will jump forward to that specific part in the video, which is always displayed on the left. Keep in mind, that these results are not always plausible. In my video, a scene showing Donald Trump starting to speak was labeled as toiletry (although some people consider him to be a douche).


Next up, named entities are extracted and the sentiment is evaluated. I assume the sentiment evaluation is based on the words used. Words such as good, great and awesome will likely lead to a positive sentiment rating. Remember that these words are not always used in the proper context by the speaker, so I usually ignore this feature.


Most of the data shown in the insight tab is based off the speaker transcription, which is displayed in the timeline tab. Although it works pretty well, you might need to manually edit some of the data. In this final sentence shown here, I manually edited something.: instead of “my Chrome”, the speaker said “Macron”.


Looking into a video in a foreign language? In this case you can use the translate function to make it (kind of) readable. Just click on the world icon and choose the output language and the complete text will be translated.


So, we’ve uploaded a few videos, manually edited a few things and now have a fully indexed database of videos to run queries on. Going back to the main page of your profile, you will be able to search for anything that has been indexed: text, keywords, people and labels.


Searching for “Trump” will display the search results and categorize them by result types, as they are listed above the search results. This is just an excerpt of all the results, but you can see that a person, spoken text, a named entity and even written text were found. Written text? That’s one point I almost forgot. The Video Indexer also OCRs written text in videos.


That was just a brief overview of the possibilities of Microsoft’s Video Indexer. I think it can be useful for some OSINT investigations and if you really think about using this more intensely, you might want to consider upgrading to a paid account.

I was actually thinking about uploading talks from conferences, so I could create a database in which I could query specific OSINT topics without having to watch the complete videos. A TL;DR for videos 😊

MW-OSINT / 08.03.2020

Вы понимаете? OSINT in Foreign Languages

It just takes one click in OSINT to land on a website in a foreign language. Investigations don’t have to stop here, if you have the right tools.

In today’s interconnected world, OSINT investigations lead us to foreign language content quite often. This does not mean we have to stop here. Thankfully, a broad variety of tools can support us in translating the content we find.

Before getting into specific tools, I have learned that you will receive the best results if you define the input language manually. Most tools can autodetect the input language, but if you’re working with short sentences or even single words, this might not function reliably. Sometimes translating very long sentences will also produce awkward results, splitting a long sentence into components could help in this case. That said, let’s have a look at some tools I use during my investigations.

First off, I would like to point out DeepL, a German company that trains AI to understand and translate texts. When it comes to translating content in German, English, Spanish, Portuguese, Italian, Dutch, Russian and Polish, DeepL has proven to be more accurate than other tools. You can copy and paste a text or upload a document to have it translated. I let the platform have a try at an excerpt from one of the older Keyfindings’ posts in German.


The next must-have is Google Translate. This extension should be installed in any browser to easily decipher pages on the fly. Next to translating complete webpages, it will show you the original text by hovering the mouse over that passage. In some cases this can be helpful, especially when Google tries to translate names of people, places or companies as well.


What if neither DeepL or the Google Translate extension work? Maybe you’re on a page that does not use the Latin alphabet, e.g. Chinese or Arabic, and some of the content is not ASCII-coded. This happens quite often when looking at Asian websites. Another case might be handwritten information in such languages. One of my favorite tools for this is on the Google Translate website itself. Next to the obvious copying and pasting of text, as well as uploading documents, Google allows you to use a foreign language virtual keyboards to input information.


However, this isn’t always helpful. In Arabic, letters vary in shape depending on their position in the word. This makes it hard for someone not proficient in Arabic to use the keyboard. Luckily, there is a workaround!

The Google Translate page allows you to draw what you see and based on that it will make suggestions and translate them. This works really well with any character-based writing, such as Chinese, Korean and Japanese, as well as with other languages that don’t use the Latin alphabet (Russian, Hindi, etc.). I have added a quick video to demonstrate how it works.

As an alternative, I looked into Windows Ink on the Microsoft Translator, but Microsoft currently doesn’t offer an Arabic handwriting package. However, it does offer Russian, Chinese, Hindi and several others character-based alphabets and languages.

When trying to translate subtitles in Videos, there is a workaround that was shared by Hugo Kamaan on Twitter, showing how you can use your cell phone camera to receive instant translations.

There are definitely more tools out there, so feel free to add anything you use frequently or that you think is missing in the comments.

Я надеюсь, что это было полезно для вашего расследования OSINT!

MW-OSINT / 21.07.2019