To understand video, our machines extract the sound and the image into separate streams.
Then they apply a myriad of transformations and filters to enable efficient analysis of the content.
This data is then distributed to a number of modules that perform fine-grained predictions
in the dimensions of sound, image,
and motion.
Finally, these predictions are then shared with a higher-order system that correlates and
coalesces them into human readable concepts
ready for indexing.
Sounds result from natural recurring vibrations or collisions. Sounds can also be biologically generated for
communication or navigation. They can now be artificially synthetized to convey a particular emotions or concepts.
Our Artificial Intelligence is designed to distinguish and index on all these variations including human speech.
Images comprise colors distributed in a two-dimensional space.
These colors result from radiations that are emitted, reflected, absorbed, and combined
into blobs that can represent natural recurring patterns, biological forms, or synthetic shapes.
We employ a range of Computer Vision techniques to categorize these impressions.
Certain movements induce changes in colors being emitted, or reflected. When these changes follow
distinctive paths, it is likely that specific movements are taking place.
When combined with detected sounds and objects/persons, these can describe highly elaborate actions
that our machines are able detect and index on.
Sounds, images, motions, and even words can have very diverse meanings in different contexts.
For this reason, our Artificial Intelligence employs a hyper-dimensional concept map to relate, contextualize,
and disambiguate what it perceives.