Deep Learning: Image and Video Recognition
Written by Bruce Ho
BigR.io’s Chief Big Data Scientist
This paper illustrates the advancements in implementing Deep Neural Networks for automatic feature extraction in image and video for applications including facial recognition, programmatic video highlights, and image segmentation and object classification. Given the limitations of human abilities in earlier extraction methods, these networks exponentially increase accuracy, output, and available feature selection options for further analysis. BigR.io specializes in the following industry use cases:
- Image Recognition
- Video Highlights
- Anomaly Detection
BigR.io is a technology consulting firm empowering data to drive analytics for revenue growth and operational efficiencies. Our teams deliver software solutions, data science strategies, enterprise infrastructure, and management consulting to the world’s largest companies. We are an elite group with MIT roots, shining when tasked with complex missions: assembling mounds of data from a variety of sources, building high-volume, highly-available systems, and orchestrating analytics to transform technology into perceivable business value. With extensive domain knowledge, BigR.io has teams of architects and engineers that deliver best-in-class solutions across a variety of verticals. This diverse industry exposure and our constant run-in with the cutting edge empowers us with invaluable tools, tricks, and techniques. We bring knowledge and horsepower that consistently delivers innovative, cost-conscious, and extensible results to complex software and data challenges. Learn more at www.bigr.io.
Industry Use Cases
Reference: G Paltoglout, M Thelwall, Seeing Stars of Valence and Arousal in Blog Posts. Issue No. 01 Jan-Mar 2013 Vol. 4, IEEE Transactions on Affective Computing.
Points on the valence arousal plot can be translated to commonly understood emotions.
Today, we can automate programmatic video highlights using video recognition techniques. In addition to applying CNNs to static image features, Recurrent Neural Networks (RNNs) are able to classify video segments using optical flow between image frames. This technique is easily trained not only to extract official stat events, but also to extract any interesting player motion not explicitly logged and indexed — for example, an alley-oop in basketball. Due to the automated nature of these extraction tasks, studios can come up with new ideas at any time to build upon an existing menu of highlights.
Image: Durant eyeing Rihanna after hitting a 3-pointer (she was cheering for LeBron).
Independent Component Analysis (ICA) is one such approach with many proposed variants. An ICA-based deep sparse feature extraction strategy combined with a non-parametric Bayesian approach can automatically determine the most optimal dimension for the latent feature vector, removing the heavy labor in parameter tuning that a full deep learning approach would entail. The reported accuracy improvement exceeds 10% over previous results. Variants of Restricted Boltzmann Machines (RBMs) are another major direction of research for deep-sparse representation. While much progress has been made on the theoretical front, the experimental results thus far lag behind the best ICA models. Reference: Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost for abnormal event detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, pp. 3449–3456