To understand video, our machines start by extracting the audio and the image into separate streams.
Then they apply a myriad of transformations and filters to enable efficient analysis of the content.
Subsequently, data is distributed to a number of modules that perform fine-grained predictions in the dimensions of sound, image, and motion.
These predictions are then shared with a higher-order system that correlates and coalesces these detections into human readable concepts ready for indexing.
Sounds result from natural recurring vibrations or collisions. Sounds can also be biologically generated for
communication or navigation. They can now be artificially synthetized to convey a particular emotions or concepts.
Our Artificial Intelligence is designed to distinguish and index on all these variations including human speech.
Images comprise colors distributed in a two-dimensional space.
These colors result from radiations that are emitted, reflected, absorbed, and combined
into blobs that can represent natural recurring patterns, biological forms, or synthetic shapes.
We employ a range of Computer Vision techniques to categorize these impressions.
Certain movements induce changes in colors being emitted, or reflected. When these changes follow
distinctive paths, it is likely that specific movements are taking place.
When combined with detected sounds and objects/persons, these can describe highly elaborate actions that our machines are able detect and index on.
Sounds, images, motions, and even words can have very diverse meanings in different contexts.
For this reason, our Artificial Intelligence employs a hyper-dimensional concept map to relate, contextualize, and disambiguate what it perceives.