Posts

Investors take note, machine learning is beginning to have a powerful impact on generative art non-fungible tokens (NFTs).

NFTs are cryptographic assets on a blockchain with unique identification codes and metadata that distinguish them from each other. Most often, NFTs are used to represent real-world items like artwork and real estate. “Tokenizing” these real-world tangible assets makes buying, selling, and trading them more efficient while reducing the probability of fraud.

With that in mind, AI is becoming increasingly important in the non-fungible token space. “Generative art” – art that has been created by AI — has quickly emerged as one of the main categories of the NFT market, driving innovative projects and investment in astonishing collections. From the works of AI art legends such as Refik Anadol or Sofia Crespo to Tyler Hobbs’s new QQL project, NFTs have become one of the main vehicles to access AI-powered art.

The rise of generative AI has come as a surprise even to many of the early AI pioneers, who mostly saw this discipline as a relatively obscure area of machine learning. Its leap to dominance in the NFT market has largely been driven by gains in computational power and next-gen MI algorithms that can help models learn without requiring a lot of labeled datasets, which are incredibly limited and expensive to build.

One of the most significant of these advances has been “text to image” (TTI). AI-driven TTI programs such as DALL-E and GLIDE allow users to describe in text what they would like the program to render, and it then creates an interpretive image based on the text, with some profoundly remarkable results. Thus making TTI ideal for the creation of unique and marketable NFTs.

There are also similar generative art solutions such as “text-to-video” or “image-to-image,” but TTI, by far, is having the greatest impact on the NFT market because a disproportionate percentage of digital art collectibles are represented as static images.

Throughout the history of technology, there have been many examples of seemingly disparate trends coming together to form a market symbiosis that benefits both. The most recent example is the social-mobile-cloud revolution, in which each one of those trends expanded the market of the other two.

Generative AI and NFTs are starting to exhibit a similar dynamic. Both trends have been able to bring complex technology to mainstream culture. NFTs complement generative AI with digital ownership and distribution models that would be nearly impossible to implement otherwise. Similarly, generative AI is likely to become one of the most important sources of NFT creation now and into the future.

How BigRio Helps Facilitate Investment in AI Startups

Like what is occurring with NFT and AI-generated art, BigRio looks for and helps to facilitate such market symbiosis.

We like to think of ourselves as a “Shark Tank for AI.”

If you are familiar with the TV series, then you know that, basically, what they do is hyper-accelerate the most important part of the incubation process – visibility. You can’t get better visibility than getting in front of celebrity investors and a TV audience of millions of viewers. Many entrepreneurs who have appeared on that program – even those who did not get picked up by the sharks – succeeded because others who were interested in their concepts saw them on the show.

At BigRio, we may not have a TV audience, but we can do the same. We have the contacts and the expertise to not only weed out the companies that are not ready, as the sharks on the TV show do but also mentor and get those that we feel are readily noticed by the right people in the growing AI investment community.

Because we see so many potential AI innovators, we are also ideally suited to create the kind of synergy between concepts and applications such as is occurring with AI, ML and NFTs.

Rohit Mahajan is a Managing Partner with BigRio. He has a particular expertise in the development and design of innovative solutions for clients in Healthcare, Financial Services, Retail, Automotive, Manufacturing, and other industry segments.

BigRio is a technology consulting firm empowering data to drive innovation, and advanced AI. We specialize in cutting-edge Big Data, Machine Learning, and Custom Software strategy, analysis, architecture, and implementation solutions. If you would like to benefit from our expertise in these areas or if you have further questions on the content of this article, please do not hesitate to contact us.

 

AI is known for its ability to make very accurate predictions. But often, human prognosticators are pretty good at it too! In this article, we take a look at what leading IT experts say they think will be the top five advances in AI and machine learning in 2023, as compiled by The Enterprisers Project.

1. There Will Be Continue Advancement of AI Applications in Healthcare

“AI will yield tremendous breakthroughs in treating medical conditions in the next few years. Just look at the 2021 Breakthrough Prize winner Dr. David Baker. Dr. Baker used AI to design completely new proteins. This ground-breaking technology will continue having huge ramifications in the life sciences, potentially developing life-saving medical treatments for diseases like Alzheimer’s and Parkinson’s.” — Michael Armstrong, Chief Technology Officer, Authenticx.

2. Continued Merging of AI and Quantum Computing

Phil Tee, Co-founder, and CEO, of Moogsoft, says to, “Watch the crossover from fundamental physics into informatics in the guise of quantum and quantum-inspired computing. While I’m not holding my breath for a practical quantum computer, we will see crossover. The mix of advanced mathematics and informatics will unleash a new generation of engineers uniquely placed to exploit the AI wave.”

3. AI Will Not Replace Humans

Despite dozens of sci-fi movies and novels to the contrary, the experts do not believe that AI will replace humans in 2023 and the years ahead, but instead, they expect to see increased interaction between human and artificial intelligence with an increased synergy between the two. “While there will be growing adoption of AI to enhance our collective user experience at scale, it will be balanced with appropriate human intervention. Humans applying the insights provided by AI will be a more effective combination overall than either one doing it alone. How and where this balance is struck will vary depending on the industry and the criticality of the function being performed. For example, radiologists assisted by an AI screen for breast cancer more successfully than they do when they work alone, according to new research. That same AI also produces more accurate results in the hands of a radiologist than it does when operating solo.” – E.G. Nadhan, Global Chief Architect Leader, Red Hat

4. A Move Towards More Ethical AI and an AI Bill of Rights

As we reported earlier this year, the Biden administration had launched a proposed “AI Bill of Rights” to help ensure the ethical use of AI. Not surprisingly, it is modeled after the sort of patient “bills of rights” people have come to expect as they interact with doctors, hospitals, and other healthcare professionals.
David Talby, CTO of John Snow Labs, says to see continued movement in this direction. “We can expect to see a few major AI trends in 2023, and two to watch are responsible AI and generative AI. Responsible or ethical AI has been a hot-button topic for some time, but we’ll see it move from concept to practice next year. Smarter technology and emerging legal frameworks around AI are also steps in the right direction. The AI Act, for example, is a proposed, first-of-its-kind European law set forth to govern the risk of AI use cases. Similar to GDPR for data usage, The AI Act could become a baseline standard for responsible AI and aims to become law next Spring. This will have an impact on companies using AI worldwide.”

5. AI Will Support Increased and “Smarter” Automation

“Everyone understands the value of automation, and, in our software-defined world, almost everything can be automated. The decision point or trigger for the automation, however, is still one of the trickier elements. This is where AI will increasingly come in: AI can make more intelligent, less brittle decisions than automation’s traditional ‘if-this-then-that’ rules.” – Richard Whitehead, CTO, and Chief Evangelist, Moogsoft.

How BigRio Helps Facilitate the Future of AI

At BigRio, we not only agree with these experts on these top five advances in AI that will likely occur in 2023, but we are also actively trying to facilitate them!
We like to think of ourselves as a “Shark Tank for AI.”

If you are familiar with the TV series, then you know that, basically, what they do is hyper-accelerate the most important part of the incubation process – visibility. You can’t get better visibility than getting in front of celebrity investors and a TV audience of millions of viewers. Many entrepreneurs who have appeared on that program – even those who did not get picked up by the sharks – succeeded because others who were interested in their concepts saw them on the show.

At BigRio, we may not have a TV audience, but we can do the same. We have the contacts and the expertise to not only weed out the companies that are not ready, as the sharks on the TV show do but also mentor and get those that we feel are readily noticed by the right people in the growing AI investment community.

Rohit Mahajan is a Managing Partner with BigRio. He has a particular expertise in the development and design of innovative solutions for clients in Healthcare, Financial Services, Retail, Automotive, Manufacturing, and other industry segments.

BigRio is a technology consulting firm empowering data to drive innovation, and advanced AI. We specialize in cutting-edge Big Data, Machine Learning, and Custom Software strategy, analysis, architecture, and implementation solutions. If you would like to benefit from our expertise in these areas or if you have further questions on the content of this article, please do not hesitate to contact us.

The Enterprisers Project is a community and online publication helping CIOs and IT leaders solve problems and drive business value. The Enterprisers Project, supported by Red Hat, also partners with Harvard Business Review.

NLP evolved to be an important way to track and categorize viewership in the age of cookie-less ad targeting. While users resist being identified by a single user ID, they are much less sensitive to and even welcome the chance for advertisers to personalize media content based on discovered preferences. This personalization comes from improvements made upon the original LDA algorithm and incorporate word2vec concepts.

The classic LDA algorithm developed at Columbia University raised industry-wide interest in computerized understanding of documents. It incidentally also launched variational inference as a major research direction in Bayesian modeling. The ability of LDA to process massive amounts of documents, extract their main theme based on a manageable set of topics and compute with relative high efficiency (compared to the more traditional Monte Carlo methods which sometimes run for months) made LDA the de facto standard in document classification.

However, the original LDA approach left the door open on certain desirable properties. It is, at the end, fundamentally just a word counting technique. Consider these two statements:

“His next idea will be the breakthrough the industry has been waiting for.”

“He is praying that his next idea will be the breakthrough the industry has been waiting for.”

After removal of common stop words, these two semantically opposite sentences have almost identical word count features. It would be unreasonable to expect a classifier to tell them apart if that’s all you provide it as inputs.

The latest advances in the field improve upon the original algorithm on several fronts. Many of them incorporate the word2vec concept where an embedded vector is used to represent each word in a way that reflects its semantic meaning. E.g. king – man + woman = queen

Autoencoder variational inference (AVITM) speeds up inference on new documents that are not part of the training set. It’s variant prodLDA uses product of experts to achieve higher topic coherence. Topic-based classification can potentially perform better as a result.

Doc2vec – generates semantically meaningful vectors to represent a paragraph or entire document in a word order preserving manner.

LDA2vec – derives embedded vectors for the entire document in the same semantic space as the word vectors.

Both Doc2vec and LDA2vec provide document vectors ideal for classification applications.

All these new techniques achieve scalability using either GPU or parallel computing. Although research results demonstrate a significant improvement in topic coherence, many investigators now choose to deemphasize topic distribution as the means of document interpretation. Instead, the unique numerical representation of the individual documents became the primary concern when it comes to classification accuracy. The derived topics are often treated as simply intermediate factors, not unlike the filtered partial image features in a convolutional neural network.

With all this talk of the bright future of Artificial Intelligence (AI), it’s no surprise that almost every industry is looking into how they will reap the benefits from the forthcoming (dare I say already existing?) AI technologies. For some, AI will merely enhance the technologies already being used. For others, AI is becoming a crucial component to keeping the industry alive. Healthcare is one such industry.

The Problem: Diminishing Labor Force

Part of the need for AI-based Healthcare stems from the concern that one-third of nurses are baby boomers, who will retire by 2030, taking their knowledge with them. This drastic shortage in healthcare workers poses the imminent need for replacements and, while the enrollment numbers in nursing school stay stable, the demand for experienced workers will continue to increase. This need for additional clinical support is one area where AI comes into play. In fact, these emerging technologies will not only help serve as a multiplier force for experienced nurses, but for doctors and clinical staff support as well.

Healthcare-AI Automation Applications to the Rescue

One of the most notable solutions for this shortage will be automating processes for determining whether or not a patient actually needs to visit a doctor in-person. Doctors’ offices are currently inundated with appointments and patients who’s lower-level questions and concerns could be addressed without a face-to-face consultation via mobile applications. Usually in the from of chatbots, these AI-powered applications can provide basic healthcare support by “bringing the doctor to the patient” and alleviating the need for the patient to leave the comfort of their home, let alone scheduling an appointment to go in-office and visit a doctor (saving time and resources for all parties involved).

Should a patient need to see a doctor,  these applications also contain schedulers capable of determining appointment type, length, urgency, and available dates/times, foregoing the need for constant human-based clinical support and interaction. With these AI schedulers also comes AI-based Physician’s Assistants that provide additional in-office support like scheduling follow-up appointments, taking comprehensive notes for doctors, ordering specific prescriptions and lab testing, providing drug interaction information for current prescriptions, etc. And this is just one high-level AI-based Healthcare solution (albeit with many components).

With these advancements, Healthcare stands to gain significant ground with the help of domain-specific AI capabilities that were historically powered by humans. As a result, the next generation of healthcare has already begun, and it’s being revolutionized by AI.

Sometimes I get to thinking that Alexa isn’t really my friend. I mean sure, she’s always polite enough (well, usually, but it’s normal for friends to fight, right?). But she sure seems chummy with that pickle-head down the hall too. I just don’t see how she can connect with us both — we’re totally different!

So that’s the state of the art of conversational AI: a common shared agent that represents an organization. A spokesman. I guess she’s doing her job, but she’s not really representing me or M. Pickle, and she can’t connect with either of us as well as she might if she didn’t have to cater to both of us at the same time. I’m exaggerating a little bit – there are some personalization techniques (*cough* crude hacks *cough*) in place to help provide a custom experience:

  • There is a marketplace of skills. Recently, I can even ask her to install one for me.
  • I have a user profile. She knows my name and zip code.
  • Through her marketplace, she can access my account and run my purchase through a recommendation engine (the better to sell you with, my dear!)
  • I changed her name to “Echo” because who has time for a third syllable? (If only I were hamming this up for the post; sadly, a true story)
  • And if I may digress to my other good friend Siri, she speaks British to me now because duh.

It’s a start but, if we’re honest, none of these change the agent’s personality or capabilities to fit with all of my quirks, moods, and ever-changing context and situation. Ok, then. What’s on my wishlist?

  • I want my own agent with its own understanding of me, able to communicate and serve as an extension of myself.
  • I want it to learn everything about how I speak. That I occasionally slip into a Western accent and say “ruf” instead of “roof”. That I throw around a lot of software dev jargon; Python is neither a trip to the zoo nor dinner (well, once, and it wasn’t bad. A little chewy.) That Pickle Head means my colleague S… nevermind. You get the idea.
  • I want my agent to extract necessary information from me in a way that fits my mood and situation. Am I running late for a life-changing meeting on a busy street uphill in a snowstorm? Maybe I’m just goofing around at home on a Saturday.
  • I want my agent to learn from me. It doesn’t have to know how to do everything on this list out of the box – that would be pretty creepy – but as it gets to know me it should be able to pick up on my cues, not to mention direct instructions.

Great, sign me up! So how do I get one? The key is to embrace training (as opposed to coding, crafting, and other manual activities). As long as there is a human in the loop, it is simply impossible to scale an agent platform to this level of personalization. There would be a separate and ongoing development project for every single end user… great job security for developers, but it would have to sell an awful lot of stuff.

To embrace training, we need to dissect what goes into training. Let’s over-simplify the “brain” of a conversational AI for a moment: we have NLU (natural language understanding), DM (dialogue management), and NLG (natural language generation). Want an automatically-produced agent? You have to automate all three of these components.

  • NLU – As of this writing, this is the most advanced component of the three. Today’s products often do incorporate at least some training automation, and that’s been a primary enabler that leads to the assistants that we have now. Improvements will need to include individualized NLU models that continually learn from each user, and the addition of (custom, rapid) language models that can expand upon the normal and ubiquitous day-to-day vocabulary to include trade-specific, hobby-specific, or even made-up terms. Yes, I want Alexa to speak my daughter’s imaginary language with her.
  • DM – Sorry developers, if we make plugin skills ala Mobile Apps 2.0 then we aren’t going to get anywhere. Dialogues are just too complex, and rules and logic are just too brittle. This cannot be a programming exercise. Agents must learn to establish goals and reason about using conversation to achieve those goals in an automated fashion.
  • NLG – Sorry marketing folks, there isn’t brilliant copy for you to write. The agent needs the flexibility to communicate to the user in the most effective way, and it can’t do that if it’s shackled by canned phrases that “reflect the brand”.

In my experience, most current offerings are focusing on the NLU component – and that’s awesome! But to realize the potential of MicroAgents (yeah, that’s right. MicroAgents. You heard it here first) we need to automate the entire agent, which is easier said than done. But that’s not to say that it’s not going to happen anytime soon – in fact, it might happen sooner than you think.  

Echo, I’m done writing. Post this sucker.

Doh!


 

In the 2011 Jeopardy! face-off between IBM’s Watson and Jeopardy! champions Ken Jennings and Brad Rutter, Jennings acknowledged his brutal takedown by Watson during the last double jeopardy in stating “I for one welcome our new computer overlords.” This display of computer “intelligence” sparked mass amounts of conversation amongst myriad groups of people, many of whom became concerned at what they perceived as Watson’s ability to think like a human. But, as BigR.io’s Director of Business Development Andy Horvitz points out in his blog “Watson’s Reckoning,” even the Artificial Intelligence technology with which Watson was produced is now obsolete.

The thing is, while Watson was once considered to be the cutting-edge technology of Artificial Intelligence, Artificial Intelligence itself isn’t even cutting-edge anymore. Now, before you start lecturing me about how AI is cutting-edge, let me explain.

Defining Artificial Intelligence

You see, as Bernard Marr points out, Artificial Intelligence is the overarching term for machines having the ability to carry out human tasks. In this regard, modern AI as we know it has already been around for decades – since the 1950s at least (especially thanks to the influence of Alan Turing). Moreso, some form of the concept of artificial intelligence dates back to ancient Greece when philosophers started describing human thought processes as a symbolic system. It’s not a new concept, and it’s a goal that scientists have been working towards for as long as there have been machines.

The problem is that the term “artificial intelligence” has become a colloquial term applied when a machine mimics “cognitive” functions that humans associate with other human minds, such as “learning” and “problem solving.” But the thing is, AI isn’t necessarily synonymous with “human thought capable machines.” Any machine that can complete a task in a similar way that a human might can be considered AI. And in that regard, AI really isn’t cutting-edge.

What is cutting-edge are the modern approaches to Machine Learning, which have become the cusp of “human-like” AI technology (like Deep Learning, but that’s for another blog).

Though many people (scientists and common folk alike) use the terms AI and Machine Learning interchangeably, Machine Learning actually has the narrower focus of using the core ideas of AI to help solve real-world problems. For example, while Watson can perform the seemingly human task of critically processing and answering questions (AI), it lacks the ability to use these answers in a way that’s pragmatic to solve real-world problems, like synthesizing queried information to find a cure for cancer (Machine Learning).

Additionally, as I’m sure you already know, Machine Learning is based upon the premise that these machines train themselves with data rather than by being programmed, which is not necessarily a requirement of Artificial Intelligence overall.

https://xkcd.com/1838/

Why Know the Difference?

So why is it important to know the distinction between Artificial Intelligence and Machine Learning? Well, in many ways, it’s not as important now as it might be in the future. Since the two terms are used so interchangeably and Machine Learning is seen as the technology driving AI, hardly anyone would correct you if were you to use them incorrectly. But, as technology is progressing ever faster, it’s good practice to know some distinction between these terms for your personal and professional gains.

Artificial Intelligence, while a hot topic, is not yet widespread – but it might be someday. For now, when you want to inquire about AI for your business (or personal use), you probably mean Machine Learning instead. By the way, did you know we can help you with that? Find out more here.

We’re seeing and doing all sorts of interesting work in the Image domain. Recent blog posts, white papers, and roundtables capture some of this work, such as image segmentation and classification to video highlights. But an Image area of broad interest that, to this point, we’ve but scratched the surface of is Video-based Anomaly Detection. It’s a challenging data science problem, in part due to the velocity of data streams and missing data, but has wide-ranging solution applicability.

In-store monitoring of customer movements and behavior.

Motion sensing, the antecedent to Video-based Anomaly Detection, isn’t new and there are a multitude of commercial solutions in that area. Anomaly Detection is something different and it opens the door to new, more advanced applications and more robust deployments. Part of the distinction between the two stems from “sensing” what’s usual behavior and what’s different.

Anomaly Detection

Walkers in the park look “normal”. The bicyclist is the anomaly. 

Anomaly detection requires the ability to understand a motion “baseline” and to trigger notifications based on deviations from that baseline. Having this ability offers the opportunity to deploy AI-monitored cameras in many more real-world situations across a wide range of security use cases, smart city monitoring, and more, wherein movements and behaviors can be tracked and measured with higher accuracy and at a much larger scale than ever before.

With 500 million video cameras in the world tracking these movements, a new approach is required to deal with this mountain of data. For this reason, Deep Learning and advances in edge computing are enabling a paradigm shift from video recording and human watchers toward AI monitoring. Many systems will have humans “in the loop,” with people being alerted to anomalies. But others won’t. For example, in the near future, smart cities will automatically respond to heavy traffic conditions with adjustments to the timing of stoplights, and they’ll do so routinely without human intervention.

Human in the Loop

Human in the loop.

As on many AI fronts, this is an exciting time and the opportunities are numerous. Stay tuned for more from BigR.io, and let’s talk about your ideas on Video-based Anomaly Detection or AI more broadly.

A few months back, Treasury Secretary Steve Mnuchin said that AI wasn’t on his radar as a concern for taking over the American labor force and went on to say that such a concern might be warranted in “50 to 100 more years.” If you’re reading this, odds are you also think this is a naive, ill-informed view.

An array of experts, including Mnuchin’s former employer, Goldman Sachs, disagree with this viewpoint. As PwC states, 38% of US jobs will be gone by 2030. On the surface, that’s terrifying, and not terribly far into the future. It’s also a reasonable, thoughtful view, and a future reality for which we should prepare.

Naysayers maintain that the same was said of the industrial and technological revolutions and pessimistic views of the future labor market were proved wrong. This is true. Those predicting doom in those times were dead wrong. In both cases, technological advances drove massive economic growth and created huge numbers of new jobs.

Is this time different?

It is. Markedly so.

The industrial revolution delegated our labor to machines. Technology has tackled the mundane and repetitive, connected our world, and, more, has substantially enhanced individual productivity. These innovations replaced our muscle and boosted the output of our minds. They didn’t perform human-level functions. The coming wave of AI will.

Truckers, taxi and delivery drivers, they are the obvious, low-hanging fruit, ripe for AI replacement. But the job losses will be much wider, cutting deeply into retail and customer service, impacting professional services like accounting, legal, and much more. AI won’t just take jobs. Its impacts on all industries will create new opportunities for software engineers and data scientists. The rate of job creation, however, will lag far behind that of job erosion.

But it’s not all bad! AI is a massive economic catalyst. The economy will grow and goods will be affordable. We’re going to have to adjust to a fundamental disconnect between labor and economic output. This won’t be easy. The equitable distribution of the fruits of this paradigm shift will dominate the social and political conversation of the next 5-15 years. And if I’m right more than wrong in this post, basic income will happen (if only after much kicking and screaming by many). We’ll be able to afford it. Not just that — most will enjoy a better standard of living than today while also working less.

I might be wrong. The experts might be wrong. You might think I’m crazy (let’s discuss in the comments). But independent of specific outcomes, I hope we can agree that we’re on the precipice of another technological revolution and these are exciting times!

Deep Learning: Image and Video Recognition

Written by Bruce Ho

BigR.io’s Chief Big Data Scientist

Abstract

This paper illustrates the advancements in implementing Deep Neural Networks for automatic feature extraction in image and video for applications including facial recognition, programmatic video highlights, and image segmentation and object classification. Given the limitations of human abilities in earlier extraction methods, these networks exponentially increase accuracy, output, and available feature selection options for further analysis. BigR.io specializes in the following industry use cases:

  • Image Recognition

  • Video Highlights

  • Anomaly Detection

 

ABOUT BIGR.IO

BigR.io is a technology consulting firm empowering data to drive analytics for revenue growth and operational efficiencies. Our teams deliver software solutions, data science strategies, enterprise infrastructure, and management consulting to the world’s largest companies. We are an elite group with MIT roots, shining when tasked with complex missions: assembling mounds of data from a variety of sources, building high-volume, highly-available systems, and orchestrating analytics to transform technology into perceivable business value. With extensive domain knowledge, BigR.io has teams of architects and engineers that deliver best-in-class solutions across a variety of verticals. This diverse industry exposure and our constant run-in with the cutting edge empowers us with invaluable tools, tricks, and techniques. We bring knowledge and horsepower that consistently delivers innovative, cost-conscious, and extensible results to complex software and data challenges. Learn more at www.bigr.io.

 

OVERVIEW

Over the past few years, Deep Neural Network (DNN) capabilities have surpassed human parity in recognizing and interpreting images. These DNNs use Convolutional Neural Networks (CNNs) to automatically extract features from an input image with the use of convolution filters. Backpropagation then facilitates the learning by these filters of their kernel functions, starting with random values and ending up with elemental features that best represent the class of images being trained (for instance, nose, eye, and jaw shapes for face images). Image recognition is also where the highly coveted idea of transfer learning got its early foothold. Pre-trained models based on certain categories of images can be repurposed for various classification applications using only a small dataset. Since data preparation and labeling is one of the most challenging steps when carrying out supervised learning, the impact this concept has on accelerating this process cannot be overstated. Published models and datasets by some of the biggest players in the field (Google, Microsoft, etc.) now serve as a strong starting point to build robust application-specific models for businesses with only modest means for development.

 

INDUSTRY USE CASES

Similar to the adoption of best practices in big data and data science across several industry verticals, image video recognition solutions affect business outcomes across diverse government agencies and businesses. In this paper, we specifically examine use cases in the security and professional sports segments, but these solutions illustrate applications across all areas of video content creation, consumption, and monitoring.

 

IMAGE INSIGHTS

FCN8s

 

Image recognition can go beyond classification tasks for an entire image. In dense prediction, we are asking the neural network to detect the semantic context of any given pixel in a document or image. CNNs work by first finding image features that resemble certain filter functions, then floating such features to a top-level representation as a translation-invariant descriptor (e.g., detection of a nose, regardless of its position within the image). By combining both coarse- and fine-grained features at different scales, we obtain both the semantic context and location information of any one pixel. This opens the door for pixel-level semantic segmentation (aka dense prediction). Recent work on Fully Convolutional Networks (FCNs) leverages this capability to extract semantic context of a digitized document. One could, for example, detect whether a particular pixel is a title, section header, figure caption, an image, or part of a long paragraph using FCNs. A mobile user could then easily re-layout or restyle an electronic document using the extracted semantic context. FCNs have also been successfully applied to segment parts of an image, as well as full documents, with remarkable accuracy. How does this system pick potential customers from an image of a crowd, a soccer team, or a room full of event attendees? Given a close-up face shot, is this person happy to be here, in the target age group, or giving a positive response to the last sales message? Being able to answer these audience measurement questions for marketing is one of the hot areas in need of a deep learning solution. Many classic approaches to facial feature extraction and classification, Support Vector Machines, for example, have been devoted to this long-standing problem. Deep learning research in facial identification is relatively new but already outperforming older techniques by a wide margin. This development, and many other impressive improvements achieved by deep learning, are generally attributed to the automatic feature extraction function of neural networks and the incremental accuracy boost that deep learning techniques achieve when given a huge training dataset. In many applications, a high-quality, close-up facial shot is not always available. Picking faces out of an ordinary action photo may be the first step before applying any facial feature analysis. For this, the region-based CNNs (R-CNNs) excel in both speed and accuracy. The R-CNN approach proposes a number of bounding boxes in the original photo using what is called Selective Search. In this method, initial object boundaries are set using a graphical pixel similarity approach. Neighboring boxes with high pixel similarity metrics are then merged to further reduce the object count. Finally, each boxed object can be classified based on a pre-trained image recognition model.
FCN8s

 

In other efforts, researchers have extended facial analysis to emotion detection. Classically, this simply involved image labeling where the subject exhibits a range of facial expressions and a group of volunteers would mark each as happy, sad, angry, etc. — typically up to eight emotions. More recent work also incorporates dynamic facial movements, for example, capturing the complete sequence of facial movements for a smile or frown. A more generalizable model can be developed using linear scoring along the valence- arousal graph. A prediction of valence and arousal scores on future subjects can then be interpreted using a wider range of emotion states instead of the initial selection of about eight.

 

valance arousal plot

Reference: G Paltoglout, M Thelwall, Seeing Stars of Valence and Arousal in Blog Posts. Issue No. 01 Jan-Mar 2013 Vol. 4, IEEE Transactions on Affective Computing.

Points on the valence arousal plot can be translated to commonly understood emotions.

 

VIDEO HIGHLIGHTS

There are numerous highlights in every major sporting event. Manual real-time extraction of these highlights by fully attentive labelers is error-prone, requires significant manpower, is very expensive, and doesn’t scale well. Furthermore, while the most recent games may benefit from manual labeling, there are years of archived footage that remain unprocessed. Most off-stats highlights are overlooked by human observers who are instructed to look for only specific events, for example, looking for a ball boy slipping while chasing a tennis ball or a Major League splitter in a Little League game.

Today, we can automate programmatic video highlights using video recognition techniques. In addition to applying CNNs to static image features, Recurrent Neural Networks (RNNs) are able to classify video segments using optical flow between image frames. This technique is easily trained not only to extract official stat events, but also to extract any interesting player motion not explicitly logged and indexed — for example, an alley-oop in basketball. Due to the automated nature of these extraction tasks, studios can come up with new ideas at any time to build upon an existing menu of highlights.

Going beyond sporting events, any kind of motion picture, video ad, or short-form video opens itself up for potential indexing and repurposing. For example, a DC Comics fan may want the ability to easily find all instances of girl superhero encounters within the DC universe. This task requires automatic video highlight extraction, which is the key to reviving and monetizing unlimited archive contents that would otherwise remain buried and forgotten.

 

Image: Durant eyeing Rihanna after hitting a 3-pointer (she was cheering for LeBron).

 

ANOMALY DETECTION

Independent Component Analysis (ICA) is one such approach with many proposed variants. An ICA-based deep sparse feature extraction strategy combined with a non-parametric Bayesian approach can automatically determine the most optimal dimension for the latent feature vector, removing the heavy labor in parameter tuning that a full deep learning approach would entail. The reported accuracy improvement exceeds 10% over previous results. Variants of Restricted Boltzmann Machines (RBMs) are another major direction of research for deep-sparse representation. While much progress has been made on the theoretical front, the experimental results thus far lag behind the best ICA models. Reference: Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost for abnormal event detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, pp. 3449–3456

The graph on the right is a sparse vector representation of the image on the left. The vector dimensions, called training bases, are laid out along the x-axis, with the bars representing the coefficients for the bases needed to represent the image. A normal sample (top) can be represented as a sparse linear combination of the training bases, while an anomalous sample (bottom) requires a large number of base elements.

 

CONCLUSION

Recent advancements in image and video recognition pave the way for many business applications that would have been unimaginably hard or expensive to implement before. BigR.io excels at the application of deep learning to images and electronic documents for use cases ranging from facial recognition, to programmatic video highlights, to image segmentation and object classification.

For many years, and with rapidly accelerating levels of targeting sophistication, marketers have been tailoring their messaging to our tastes. Leveraging our data and capitalizing upon our shopping behaviors, they have successfully delivered finely-tuned, personalized messaging.

Consumers are curating their media ever more by the day. We’re buying smaller cable bundles, cutting cords, and buying OTT services a la carte. At the same time, we’re watching more and more short-form video. Video media is tilting toward snack-size bites and, of course, on demand.

Cable has been in decline for years and the effects are now hitting ESPN, once the mainstay of a cable package. Even live sports programming, long considered must see and even bulletproof by media executives, has seen declining viewership.

 

So what’s to be done?

To thrive, and perhaps merely to survive, content owners must adapt. Leagues and networks have come a long way toward embracing a “TV Everywhere” distribution model despite the obnoxious gates at every turn. But that’s not enough and the sports leagues know it.

While there are many reasons for declining viewership and low engagement among younger audiences, length of games and broadcasts are a significant factor. The leagues recognize that games are too long. The NBA has made some changes that will speed up the action and the NFL is also considering shortening games to avoid losing viewership. MLB has long been tinkering in the same vein. These changes are small, incremental, and of little consequence to the declining number of viewers.

Most sporting events are characterized by long stretches of calm, less interesting play that is occasionally accented by higher intensity action. Consider for a moment how much actual action there is in a typical football or baseball game. Intuitively, most sports fans know that the bulk of the three-hour event is consumed by time between plays and pitches. Still, it’s shocking to see the numbers from the Wall Street Journal, which point out that there are only 11 minutes of action in a typical football game and a mere 18 minutes in a typical baseball game.

 

A transformational opportunity

There is so much more they can do. Recent advances in neural network technology have enabled an array of features to be extracted from streaming video. The applications are broad and the impacts significant. In this sports media context, the opportunity is nothing short of transformational.

Computers can now be trained to programmatically classify the action in the underlying video. With intelligence around what happens where in the game video, the productization opportunities are endless. Fans could catch all of the action, or whatever plays and players are most important to them, in just a few minutes. With a large indexed database of sports media content, the leagues could present near unlimited content personalization to fans.

Want to see David Ortiz’s last ten home runs? Done.

Want to see Tom Brady’s last ten TD passes? You’re welcome.

Robust features like these will drive engagement and revenue. With this level of control, fans are more likely to subscribe to premium offerings, offering predictable recurring revenue that will outpace advertising in the long run.

Computer-driven, personalized content is going to happen. It’s going to be amazing, and we are one step closer to getting there.