How our machines
perceive

Overview

To understand video our machines start by extracting the audio and the image into separate streams. Then they apply a myriad of transformations and filters to enable efficient analysis of the content.

Subsequently, data is distributed to a number of modules that perform fine-grained predictions in the dimensions of Sound, Image, and Motion.

These predictions are then shared with a higher-order system that correlates and coalesces these outputs into human readable Concepts ready for indexing.

Sound

Sounds result from natural recurring vibrations or collisions. Sounds can also be biologically generated for communication or navigation. They can now be artificially synthetized to convey a particular emotion. Our Artificial Intelligence is designed to distinguish and index on all these variations including human speech.

Image

Images comprise of colors distributed in a two-dimensional space. These colors result from radiations that are emitted, reflected, absorbed, and combined into blobs that can represent natural recurring patterns, biological forms, or synthetic shapes. We employ a range of Computer Vision techniques to categorize these impressions.

Motion

Certain movements induce changes in colors being emitted, or reflected. When these changes follow distinctive paths it is likely that specific movements are taking place. When combined with detected sounds and objects/persons, these can describe highly elaborate actions that our machines are able detect and index on.

Concepts

Sounds, images, motions, and even words can have very diverse meanings in different contexts. For this reason our Artificial Intelligence employs a hyper-dimensional concept map to relate, contextualize, and disambiguate what it perceives.