To understand video our machines start by extracting the audio and the image into separate streams. Then they apply a myriad of transformations and filters to enable efficient analysis of the content.
These predictions are then shared with a higher-order system that correlates and coalesces these outputs into human readable Concepts ready for indexing.
Sounds result from natural recurring vibrations or collisions. Sounds can also be biologically generated for communication or navigation. They can now be artificially synthetized to convey a particular emotion. Our Artificial Intelligence is designed to distinguish and index on all these variations including human speech.
Images comprise of colors distributed in a two-dimensional space. These colors result from radiations that are emitted, reflected, absorbed, and combined into blobs that can represent natural recurring patterns, biological forms, or synthetic shapes. We employ a range of Computer Vision techniques to categorize these impressions.
Certain movements induce changes in colors being emitted, or reflected. When these changes follow distinctive paths it is likely that specific movements are taking place. When combined with detected sounds and objects/persons, these can describe highly elaborate actions that our machines are able detect and index on.
Sounds, images, motions, and even words can have very diverse meanings in different contexts. For this reason our Artificial Intelligence employs a hyper-dimensional concept map to relate, contextualize, and disambiguate what it perceives.