Artificial Intelligence has been promising to revolutionize healthcare for quite some time; however, one look at any modern hospital or healthcare facility, and it is easy to see that the revolution is already here.

In almost every patient touchpoint AI is already having an enormous impact on changing the way healthcare is delivered, streamlining operations, improving diagnostics, and improving outcomes.

Although the deployment of AI in the healthcare sector is still in its infancy, it is becoming a much more common sight. According to technology consulting firm, Gartner, healthcare IT spending for 2021 was a hefty $140 billion worldwide, with enterprises listing “AI and robotic process automation (RPA)” as their lead spending priorities.

Here, in no particular order or importance, are seven of the top areas where healthcare AI solutions are being developed and currently deployed.

1. Operations and Administration
A hospital’s operation and administration expenses can be a major drain on the healthcare system. AI is already providing tools and solutions that are designed to improve and streamline administration. Such AI algorithms are proving to be invaluable for insurers, payers, and providers alike. Specifically, there are several AI programs and AI healthcare startups that are dedicated to finding and eliminating fraud. It has been estimated that healthcare fraud costs insurers anywhere between $70 billion and $234 billion each year, harming both patients and taxpayers.

2. Medical Research
Probably one of the most promising areas where AI is making a major difference in healthcare is in medical research. AI tools and software solutions are making an astounding impact on streamlining every aspect of medical research, from improved screening of candidates for clinical trials, to targeted molecules in drug discovery, to the development of “organs on a chip” – AI combined with the power of ever-improving Natural Language Processing (NLP) is changing the very nature of medical research for the better.

3. Predictive Outcomes and Resource Allocation
AI is being used in hospital settings to better predict patient outcomes and more efficiently allocate resources. This proved extraordinarily helpful during the peak of the pandemic when facilities were able to use AI algorithms to predict upon admission to the ER, which patients would most benefit from ventilators, which were in very short supply. Similarly, a Stanford University pilot project is using AI algorithms to determine which patients are at high risk of requiring ICU care within an 18 to 24 hours period.

4. Diagnostics
AI applications in diagnostics, particularly in the field of medical imaging, are extraordinary. AI can “see” details in MRIs and other medical images far greater than the human eye and, when tied into the enormous volume of medical image databases, can make far more accurate diagnoses of conditions such as breast cancer, eye disease, heart, and lung disease and so much more. AI can look at vast numbers of medical images and then identify patterns in seconds that would take human technicians hours or days to do. AI can also detect minor variations that humans simply could not find, no matter how much time they had. This not only improves patient outcomes but also saves money. For example, studies have found that earlier diagnosis and treatment of most cancers can cut treatment costs by more than 50%.

5. Training
AI is allowing medical students and doctors “hands-on training” via virtual surgeries and other procedures that can provide real-time feedback on success and failure. Such AI-based training programs allow students to learn techniques in safe environments and receive immediate critique on their performance before they get anywhere near a patient. One study found that med students learned skills 2.6 times faster and performed 36% better than those not taught with AI.

6. Telemedicine
Telemedicine has revolutionized patient care, particularly since the pandemic, and now AI is taking remote medicine to a whole new level where patients can tie AI-driven diagnostic tools through their smartphones and provide remote images and monitoring of changes in detectable skin cancers, eye conditions, dental conditions and more. AI programs are also being used to remotely monitor heart patients, diabetes patients, and others with chronic conditions and help to ensure they are complying with taking their medications.

7. Direct treatment
In addition to adding better clinical outcomes with improved diagnostics and resource allocation, AI is already making a huge difference in the direct delivery of treatments. One exciting and extremely profound example of this is robotic/AI-driven surgical procedures. Minimally invasive and non-invasive AI-guided surgical procedures are already becoming quite common. Soon, all but some of the most major surgeries, such as open heart surgeries, can and will be done as minimally invasive procedures, and even the most complex “open procedures” will be made safer, more accurate, and more efficient thanks to surgical AI and digital twins of major organs such as lungs and the heart.

Rohit Mahajan is a Managing Partner with BigRio. He has a particular expertise in the development and design of innovative solutions for clients in Healthcare, Financial Services, Retail, Automotive, Manufacturing, and other industry segments.
BigRio is a technology consulting firm empowering data to drive innovation, and advanced AI. We specialize in cutting-edge Big Data, Machine Learning, and Custom Software strategy, analysis, architecture, and implementation solutions. If you would like to benefit from our expertise in these areas or if you have further questions on the content of this article, please do not hesitate to contact us.

NLP evolved to be an important way to track and categorize viewership in the age of cookie-less ad targeting. While users resist being identified by a single user ID, they are much less sensitive to and even welcome the chance for advertisers to personalize media content based on discovered preferences. This personalization comes from improvements made upon the original LDA algorithm and incorporate word2vec concepts.

The classic LDA algorithm developed at Columbia University raised industry-wide interest in computerized understanding of documents. It incidentally also launched variational inference as a major research direction in Bayesian modeling. The ability of LDA to process massive amounts of documents, extract their main theme based on a manageable set of topics and compute with relative high efficiency (compared to the more traditional Monte Carlo methods which sometimes run for months) made LDA the de facto standard in document classification.

However, the original LDA approach left the door open on certain desirable properties. It is, at the end, fundamentally just a word counting technique. Consider these two statements:

“His next idea will be the breakthrough the industry has been waiting for.”

“He is praying that his next idea will be the breakthrough the industry has been waiting for.”

After removal of common stop words, these two semantically opposite sentences have almost identical word count features. It would be unreasonable to expect a classifier to tell them apart if that’s all you provide it as inputs.

The latest advances in the field improve upon the original algorithm on several fronts. Many of them incorporate the word2vec concept where an embedded vector is used to represent each word in a way that reflects its semantic meaning. E.g. king – man + woman = queen

Autoencoder variational inference (AVITM) speeds up inference on new documents that are not part of the training set. It’s variant prodLDA uses product of experts to achieve higher topic coherence. Topic-based classification can potentially perform better as a result.

Doc2vec – generates semantically meaningful vectors to represent a paragraph or entire document in a word order preserving manner.

LDA2vec – derives embedded vectors for the entire document in the same semantic space as the word vectors.

Both Doc2vec and LDA2vec provide document vectors ideal for classification applications.

All these new techniques achieve scalability using either GPU or parallel computing. Although research results demonstrate a significant improvement in topic coherence, many investigators now choose to deemphasize topic distribution as the means of document interpretation. Instead, the unique numerical representation of the individual documents became the primary concern when it comes to classification accuracy. The derived topics are often treated as simply intermediate factors, not unlike the filtered partial image features in a convolutional neural network.

With all this talk of the bright future of Artificial Intelligence (AI), it’s no surprise that almost every industry is looking into how they will reap the benefits from the forthcoming (dare I say already existing?) AI technologies. For some, AI will merely enhance the technologies already being used. For others, AI is becoming a crucial component to keeping the industry alive. Healthcare is one such industry.

The Problem: Diminishing Labor Force

Part of the need for AI-based Healthcare stems from the concern that one-third of nurses are baby boomers, who will retire by 2030, taking their knowledge with them. This drastic shortage in healthcare workers poses the imminent need for replacements and, while the enrollment numbers in nursing school stay stable, the demand for experienced workers will continue to increase. This need for additional clinical support is one area where AI comes into play. In fact, these emerging technologies will not only help serve as a multiplier force for experienced nurses, but for doctors and clinical staff support as well.

Healthcare-AI Automation Applications to the Rescue

One of the most notable solutions for this shortage will be automating processes for determining whether or not a patient actually needs to visit a doctor in-person. Doctors’ offices are currently inundated with appointments and patients who’s lower-level questions and concerns could be addressed without a face-to-face consultation via mobile applications. Usually in the from of chatbots, these AI-powered applications can provide basic healthcare support by “bringing the doctor to the patient” and alleviating the need for the patient to leave the comfort of their home, let alone scheduling an appointment to go in-office and visit a doctor (saving time and resources for all parties involved).

Should a patient need to see a doctor,  these applications also contain schedulers capable of determining appointment type, length, urgency, and available dates/times, foregoing the need for constant human-based clinical support and interaction. With these AI schedulers also comes AI-based Physician’s Assistants that provide additional in-office support like scheduling follow-up appointments, taking comprehensive notes for doctors, ordering specific prescriptions and lab testing, providing drug interaction information for current prescriptions, etc. And this is just one high-level AI-based Healthcare solution (albeit with many components).

With these advancements, Healthcare stands to gain significant ground with the help of domain-specific AI capabilities that were historically powered by humans. As a result, the next generation of healthcare has already begun, and it’s being revolutionized by AI.

Sometimes I get to thinking that Alexa isn’t really my friend. I mean sure, she’s always polite enough (well, usually, but it’s normal for friends to fight, right?). But she sure seems chummy with that pickle-head down the hall too. I just don’t see how she can connect with us both — we’re totally different!

So that’s the state of the art of conversational AI: a common shared agent that represents an organization. A spokesman. I guess she’s doing her job, but she’s not really representing me or M. Pickle, and she can’t connect with either of us as well as she might if she didn’t have to cater to both of us at the same time. I’m exaggerating a little bit – there are some personalization techniques (*cough* crude hacks *cough*) in place to help provide a custom experience:

  • There is a marketplace of skills. Recently, I can even ask her to install one for me.
  • I have a user profile. She knows my name and zip code.
  • Through her marketplace, she can access my account and run my purchase through a recommendation engine (the better to sell you with, my dear!)
  • I changed her name to “Echo” because who has time for a third syllable? (If only I were hamming this up for the post; sadly, a true story)
  • And if I may digress to my other good friend Siri, she speaks British to me now because duh.

It’s a start but, if we’re honest, none of these change the agent’s personality or capabilities to fit with all of my quirks, moods, and ever-changing context and situation. Ok, then. What’s on my wishlist?

  • I want my own agent with its own understanding of me, able to communicate and serve as an extension of myself.
  • I want it to learn everything about how I speak. That I occasionally slip into a Western accent and say “ruf” instead of “roof”. That I throw around a lot of software dev jargon; Python is neither a trip to the zoo nor dinner (well, once, and it wasn’t bad. A little chewy.) That Pickle Head means my colleague S… nevermind. You get the idea.
  • I want my agent to extract necessary information from me in a way that fits my mood and situation. Am I running late for a life-changing meeting on a busy street uphill in a snowstorm? Maybe I’m just goofing around at home on a Saturday.
  • I want my agent to learn from me. It doesn’t have to know how to do everything on this list out of the box – that would be pretty creepy – but as it gets to know me it should be able to pick up on my cues, not to mention direct instructions.

Great, sign me up! So how do I get one? The key is to embrace training (as opposed to coding, crafting, and other manual activities). As long as there is a human in the loop, it is simply impossible to scale an agent platform to this level of personalization. There would be a separate and ongoing development project for every single end user… great job security for developers, but it would have to sell an awful lot of stuff.

To embrace training, we need to dissect what goes into training. Let’s over-simplify the “brain” of a conversational AI for a moment: we have NLU (natural language understanding), DM (dialogue management), and NLG (natural language generation). Want an automatically-produced agent? You have to automate all three of these components.

  • NLU – As of this writing, this is the most advanced component of the three. Today’s products often do incorporate at least some training automation, and that’s been a primary enabler that leads to the assistants that we have now. Improvements will need to include individualized NLU models that continually learn from each user, and the addition of (custom, rapid) language models that can expand upon the normal and ubiquitous day-to-day vocabulary to include trade-specific, hobby-specific, or even made-up terms. Yes, I want Alexa to speak my daughter’s imaginary language with her.
  • DM – Sorry developers, if we make plugin skills ala Mobile Apps 2.0 then we aren’t going to get anywhere. Dialogues are just too complex, and rules and logic are just too brittle. This cannot be a programming exercise. Agents must learn to establish goals and reason about using conversation to achieve those goals in an automated fashion.
  • NLG – Sorry marketing folks, there isn’t brilliant copy for you to write. The agent needs the flexibility to communicate to the user in the most effective way, and it can’t do that if it’s shackled by canned phrases that “reflect the brand”.

In my experience, most current offerings are focusing on the NLU component – and that’s awesome! But to realize the potential of MicroAgents (yeah, that’s right. MicroAgents. You heard it here first) we need to automate the entire agent, which is easier said than done. But that’s not to say that it’s not going to happen anytime soon – in fact, it might happen sooner than you think.  

Echo, I’m done writing. Post this sucker.

Doh!

 

In the 2011 Jeopardy! face-off between IBM’s Watson and Jeopardy! champions Ken Jennings and Brad Rutter, Jennings acknowledged his brutal takedown by Watson during the last double jeopardy in stating “I for one welcome our new computer overlords.” This display of computer “intelligence” sparked mass amounts of conversation amongst myriad groups of people, many of whom became concerned at what they perceived as Watson’s ability to think like a human. But, as BigR.io’s Director of Business Development Andy Horvitz points out in his blog “Watson’s Reckoning,” even the Artificial Intelligence technology with which Watson was produced is now obsolete.

The thing is, while Watson was once considered to be the cutting-edge technology of Artificial Intelligence, Artificial Intelligence itself isn’t even cutting-edge anymore. Now, before you start lecturing me about how AI is cutting-edge, let me explain.

Defining Artificial Intelligence

You see, as Bernard Marr points out, Artificial Intelligence is the overarching term for machines having the ability to carry out human tasks. In this regard, modern AI as we know it has already been around for decades – since the 1950s at least (especially thanks to the influence of Alan Turing). Moreso, some form of the concept of artificial intelligence dates back to ancient Greece when philosophers started describing human thought processes as a symbolic system. It’s not a new concept, and it’s a goal that scientists have been working towards for as long as there have been machines.

The problem is that the term “artificial intelligence” has become a colloquial term applied when a machine mimics “cognitive” functions that humans associate with other human minds, such as “learning” and “problem solving.” But the thing is, AI isn’t necessarily synonymous with “human thought capable machines.” Any machine that can complete a task in a similar way that a human might can be considered AI. And in that regard, AI really isn’t cutting-edge.

What is cutting-edge are the modern approaches to Machine Learning, which have become the cusp of “human-like” AI technology (like Deep Learning, but that’s for another blog).

Though many people (scientists and common folk alike) use the terms AI and Machine Learning interchangeably, Machine Learning actually has the narrower focus of using the core ideas of AI to help solve real-world problems. For example, while Watson can perform the seemingly human task of critically processing and answering questions (AI), it lacks the ability to use these answers in a way that’s pragmatic to solve real-world problems, like synthesizing queried information to find a cure for cancer (Machine Learning).

Additionally, as I’m sure you already know, Machine Learning is based upon the premise that these machines train themselves with data rather than by being programmed, which is not necessarily a requirement of Artificial Intelligence overall.

https://xkcd.com/1838/

Why Know the Difference?

So why is it important to know the distinction between Artificial Intelligence and Machine Learning? Well, in many ways, it’s not as important now as it might be in the future. Since the two terms are used so interchangeably and Machine Learning is seen as the technology driving AI, hardly anyone would correct you if were you to use them incorrectly. But, as technology is progressing ever faster, it’s good practice to know some distinction between these terms for your personal and professional gains.

Artificial Intelligence, while a hot topic, is not yet widespread – but it might be someday. For now, when you want to inquire about AI for your business (or personal use), you probably mean Machine Learning instead. By the way, did you know we can help you with that? Find out more here.

We’re seeing and doing all sorts of interesting work in the Image domain. Recent blog posts, white papers, and roundtables capture some of this work, such as image segmentation and classification to video highlights. But an Image area of broad interest that, to this point, we’ve but scratched the surface of is Video-based Anomaly Detection. It’s a challenging data science problem, in part due to the velocity of data streams and missing data, but has wide-ranging solution applicability.

In-store monitoring of customer movements and behavior.

Motion sensing, the antecedent to Video-based Anomaly Detection, isn’t new and there are a multitude of commercial solutions in that area. Anomaly Detection is something different and it opens the door to new, more advanced applications and more robust deployments. Part of the distinction between the two stems from “sensing” what’s usual behavior and what’s different.

Anomaly Detection

Walkers in the park look “normal”. The bicyclist is the anomaly. 

Anomaly detection requires the ability to understand a motion “baseline” and to trigger notifications based on deviations from that baseline. Having this ability offers the opportunity to deploy AI-monitored cameras in many more real-world situations across a wide range of security use cases, smart city monitoring, and more, wherein movements and behaviors can be tracked and measured with higher accuracy and at a much larger scale than ever before.

With 500 million video cameras in the world tracking these movements, a new approach is required to deal with this mountain of data. For this reason, Deep Learning and advances in edge computing are enabling a paradigm shift from video recording and human watchers toward AI monitoring. Many systems will have humans “in the loop,” with people being alerted to anomalies. But others won’t. For example, in the near future, smart cities will automatically respond to heavy traffic conditions with adjustments to the timing of stoplights, and they’ll do so routinely without human intervention.

Human in the Loop

Human in the loop.

As on many AI fronts, this is an exciting time and the opportunities are numerous. Stay tuned for more from BigR.io, and let’s talk about your ideas on Video-based Anomaly Detection or AI more broadly.

A few months back, Treasury Secretary Steve Mnuchin said that AI wasn’t on his radar as a concern for taking over the American labor force and went on to say that such a concern might be warranted in “50 to 100 more years.” If you’re reading this, odds are you also think this is a naive, ill-informed view.

An array of experts, including Mnuchin’s former employer, Goldman Sachs, disagree with this viewpoint. As PwC states, 38% of US jobs will be gone by 2030. On the surface, that’s terrifying, and not terribly far into the future. It’s also a reasonable, thoughtful view, and a future reality for which we should prepare.

Naysayers maintain that the same was said of the industrial and technological revolutions and pessimistic views of the future labor market were proved wrong. This is true. Those predicting doom in those times were dead wrong. In both cases, technological advances drove massive economic growth and created huge numbers of new jobs.

Is this time different?

It is. Markedly so.

The industrial revolution delegated our labor to machines. Technology has tackled the mundane and repetitive, connected our world, and, more, has substantially enhanced individual productivity. These innovations replaced our muscle and boosted the output of our minds. They didn’t perform human-level functions. The coming wave of AI will.

Truckers, taxi and delivery drivers, they are the obvious, low-hanging fruit, ripe for AI replacement. But the job losses will be much wider, cutting deeply into retail and customer service, impacting professional services like accounting, legal, and much more. AI won’t just take jobs. Its impacts on all industries will create new opportunities for software engineers and data scientists. The rate of job creation, however, will lag far behind that of job erosion.

But it’s not all bad! AI is a massive economic catalyst. The economy will grow and goods will be affordable. We’re going to have to adjust to a fundamental disconnect between labor and economic output. This won’t be easy. The equitable distribution of the fruits of this paradigm shift will dominate the social and political conversation of the next 5-15 years. And if I’m right more than wrong in this post, basic income will happen (if only after much kicking and screaming by many). We’ll be able to afford it. Not just that — most will enjoy a better standard of living than today while also working less.

I might be wrong. The experts might be wrong. You might think I’m crazy (let’s discuss in the comments). But independent of specific outcomes, I hope we can agree that we’re on the precipice of another technological revolution and these are exciting times!

When I was in graduate school, I designed a construction site of the future. It was in collaboration with Texas Instruments in the late 90s. The big innovation, at the time, was RFID (radio-frequency identification). Not that RFID was new. In fact, it has been around since World War II where it was used to identify allied planes. After the war, it made its way into industry through anti-theft applications. In the 80s, a group of scientists from Los Alamos National Laboratory formed a company using RFID for toll payment systems (still in use today). A separate group of scientists there also created a system for tracking medication management in livestock. From here it made its way into multiple other applications and began to proliferate.

RFID got a boost in 1999 when two MIT professors, David Brock and Sanjay Sarma, reversed the trend of adding more memory and more functionality to the tags and stripped them down to a low-cost, very simple microchip. The data gleaned from the chip was stored in a database and was accessible via the web. This was right at the time that the wireless web emerged (good old CDPD) as well, which really bolstered widespread adoption. This also precipitated funding from large companies, like Procter & Gamble and Gillette (this was before P&G acquired Gillette), to institute the Auto-ID Center at MIT, which furthered the creation of standards and cemented RFID as an invaluable weapon for companies, especially those with complex supply chains.

OK, as you can tell, RFID has a special place in my heart. I even patented the idea of marrying RFID with images, but that is another story. Anyway, up to this point you’ve probably decided this is a post about RFID, but it’s not. It’s a post about RFID to IoT (Internet of Things). The term Internet of Things (IoT) was first coined by British entrepreneur Kevin Ashton in 1999 while working at Auto-ID Labs, specifically referring to a global network of objects connected by RFID. But RFID is just one type of sensor and there are numerous sensors out there. I like this definition from Wikipedia:

In the broadest definition, a sensor is an electronic component, module, or subsystem whose purpose is to detect events or changes in its environment and send the information to other electronics, frequently a computer processor. A sensor is always used with other electronics, whether as simple as a light or as complex as a computer.

Sensors have been around for quite some time in various forms. The first thermostat came to market in 1883, and many consider this the first modern, manmade sensor. Infrared sensors have been around since the late 1940s, even though they’ve really only recently entered the popular nomenclature. Motion detectors have been in use for a number of years as well. Originally invented by Heinrich Hertz in the late 1800s, they were advanced in World War II in the form of radar technology. There are numerous other sensors: biotech, chemical, natural (e.g. heat and pressure), sonar, infrared, microwave, and silicon sensors to name a few.

According to Gartner, there are currently 8 Billion IoT Units worldwide and there will be 20 Billion by 2020. Suffice to say there are numerous sources of data to track “things” within an organization and throughout supply chains. There are also numerous complexities to managing all of these sensors, the data they generate, and the actionable intelligence that is extracted and needs to be acted on. Some major obstacles are networks with time delays, switching topologies, density of units in a bounded region, and metadata management (especially across trading partners and customers). These are all challenges we at BigR.io have helped customers work through and resolve. A great example is our Predictive Maintenance offering.

Let’s get back to RFID to IoT. There is a tight coupling because the IP address of the unit needs to be supplemented with other information about the thing (for example, condition, context, location, security, etc). RFID and other sensors working in unison can provide this supplemental information. This marriage enables advanced analytics including the ability to make predictions. Large sensor networks must be properly architected to enable effective sensor fusion. Machine Learning helps take IoT to the next level of sophistication for predictions and automation for fixes and can help figure out when and where every ”thing” fits in the ecosystem that they play in. A proper IoT agent should monitor the health of the systems individually and in relation to other parts. Consensus filters will help in the analysis of the convergence, noise propagation reduction, and ability to track fast signals.

There are other factors that play into why IoT is so hot right now: the whole Big Data phenomenon has lent itself to the growth, endless compute power has served as a foundation by which advanced applications using IoT can run, and the Machine Learning libraries have been democratized by companies like Google, Facebook, and Microsoft. In general, Machine Learning thrives when mounds of data are available. However, storing all data is cost prohibitive and there is so much data being generated that most companies opt to only store bits of critical data. Some companies only store the data to freeze it from failures. You may not want to store all data, but you don’t want to lose “metadata,” or the key information that the data is trying to tell you, whether from the sensor itself or indirectly through neighboring sensors. I had a stint where we supported Federal and Defense-related sensor fusion initiatives and I picked up a handy classification of data:

  • Data
  • Information
  • Knowledge
  • Intelligence

The flow is moving the metadata being generated down the line into information → knowledge → intelligence that can be acted upon.

There also exists the ABCs of Data Context:

[A]pplication Context: Describes how raw bits are interpreted for use.

[B]ehavioral Context: Information about how data was created and used by real people or systems.

[C]hange Over Time: The version history of the other two forms of data context.

Data context plays a major role in harnessing the power of an IoT network. As we progress to smarter networks, more sophisticated sensors, and artificial intelligence that manages our “things,” the architecture of your infrastructure (enterprise data hub), the cultivation and management of your data flows, and the analytics automation that rides on top of everything become critical for day-to-day operations. The good news is that if this is all done properly, you will reap the rewards of thing harmony (coined here first folks).

Please visit our Deep Learning Neural Networks for IoT white paper for a more technical slant.

For many years, and with rapidly accelerating levels of targeting sophistication, marketers have been tailoring their messaging to our tastes. Leveraging our data and capitalizing upon our shopping behaviors, they have successfully delivered finely-tuned, personalized messaging.

Consumers are curating their media ever more by the day. We’re buying smaller cable bundles, cutting cords, and buying OTT services a la carte. At the same time, we’re watching more and more short-form video. Video media is tilting toward snack-size bites and, of course, on demand.

Cable has been in decline for years and the effects are now hitting ESPN, once the mainstay of a cable package. Even live sports programming, long considered must see and even bulletproof by media executives, has seen declining viewership.

 

So what’s to be done?

To thrive, and perhaps merely to survive, content owners must adapt. Leagues and networks have come a long way toward embracing a “TV Everywhere” distribution model despite the obnoxious gates at every turn. But that’s not enough and the sports leagues know it.

While there are many reasons for declining viewership and low engagement among younger audiences, length of games and broadcasts are a significant factor. The leagues recognize that games are too long. The NBA has made some changes that will speed up the action and the NFL is also considering shortening games to avoid losing viewership. MLB has long been tinkering in the same vein. These changes are small, incremental, and of little consequence to the declining number of viewers.

Most sporting events are characterized by long stretches of calm, less interesting play that is occasionally accented by higher intensity action. Consider for a moment how much actual action there is in a typical football or baseball game. Intuitively, most sports fans know that the bulk of the three-hour event is consumed by time between plays and pitches. Still, it’s shocking to see the numbers from the Wall Street Journal, which point out that there are only 11 minutes of action in a typical football game and a mere 18 minutes in a typical baseball game.

 

A transformational opportunity

There is so much more they can do. Recent advances in neural network technology have enabled an array of features to be extracted from streaming video. The applications are broad and the impacts significant. In this sports media context, the opportunity is nothing short of transformational.

Computers can now be trained to programmatically classify the action in the underlying video. With intelligence around what happens where in the game video, the productization opportunities are endless. Fans could catch all of the action, or whatever plays and players are most important to them, in just a few minutes. With a large indexed database of sports media content, the leagues could present near unlimited content personalization to fans.

Want to see David Ortiz’s last ten home runs? Done.

Want to see Tom Brady’s last ten TD passes? You’re welcome.

Robust features like these will drive engagement and revenue. With this level of control, fans are more likely to subscribe to premium offerings, offering predictable recurring revenue that will outpace advertising in the long run.

Computer-driven, personalized content is going to happen. It’s going to be amazing, and we are one step closer to getting there.

Scientists have been working on the puzzle of human vision for many decades. Convolutional Neural Network (CNN or convnet)-based Deep Learning reached a new landmark for image recognition when Microsoft announced it had beat the human benchmark in 2015. Five days later, Google one-upped Microsoft with a 0.04% improvement.

Figure 1. In a typical convnet model, the forward pass reduces the raw pixels into a vector representation of visual features. In its condensed form, the features can be effectively classified using fully connected layers.

source: BigRio

 

Data Scientists don’t sleep. The competition immediately moved to the next battlefield of object segmentation and classification for embedded image content. The ability to pick out objects inside a crowded image is a precursor to fantastic capabilities, like image captioning, where the model describes a complex image in full sentences. The initial effort to translate full-image recognition to object classification involved different means of localization to efficiently derive bounding boxes around candidate objects. Each bounding box is then processed with a CNN to classify the single object inside the box. A direct pixel-level dense prediction without preprocessing was, for a long time, a highly sought-after prize.

 

Figure 2. Use bounding box to classify embedded objects in an image

source: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html

In 2016, a UC Berkeley group, led by E. Shelhamer, achieved this goal using a technique called Fully Convolutional Neural Network. Instead of using convnet to extract visual features followed by fully connected layers to classify the input image, the fully connected layers are converted to additional layers of convnet. Whereas the fully connected layers completely lose all information on the original pixel locations, the cells in the final layer of a convnet are path-connected to the original pixels through a construct called receptive fields.

Figure 3. During the forward pass, a convnet reduces raw pixel information to condensed visual features which can then be effectively classified using fully connected neural network layers. In this sense, the feature vectors contain the semantic information derived from looking at the image as a whole.

source: BigRio

 

Figure 4. In dense prediction, we want to both leverage the semantic information contained in the final layers of the convnet and assign the semantic meaning back to the pixels that generated the semantic information. The upsampling step, also known as the backward pass, maps the feature representations back onto the original pixels positions.

source: BigRio

 

The upsampling step is something of great interest. In a sense, it deconvolutes the dense representation back to its original resolution and the deconvolution filters can be learned through Stochastic Gradient Descent, just like any forward pass learning process. A good visual demonstration of deconvolution can be found here. The most practical way to implement this deconvolution step is through bilinear interpolation, as discussed later.

The best dense prediction goes beyond just upsampling the last and coarsest convnet layer. By fusing results from shallower layers, the result becomes much more finely detailed. Using a skip architecture as shown in Figure 4, the model is able to make accurate local predictions that respect global structure. The fusion operation is based on concatenating vectors from two layers and perform a 1 x 1 convolution to reduce the vector dimension back down again.

 

Figure 5. Fuse upsampling results from shallower layers push the prediction limits to a finer scale.

source: BigRio

LABELED DATA

As is often the case when working with Deep Learning, collecting high-quality training data is a real challenge. In the image recognition field, we are blessed with open source data from PASCAL VOC Project. The 2011 dataset provides 11,530 images with 20 classes. Each image is pre-segmented with pixel-level precision by academic researchers. Examples of segmented images can be found here.

 

OPEN SOURCE MODELS

Computer vision enthusiasts also benefit hugely from open source projects which implement almost every exciting new development in the deep learning field. The author’s group posted a Caffe implementation of FCNN. For keras implementations, you will find no fewer than 9 FCN projects on GitHub. After trying out a few, we focused on the Aurora FCNproject, which started running with very little modifications. The authors provided rather detailed instruction on environment setup and downloading of datasets. We chose the AstrousFCN_Resnet50_16s model out of the six included in the project. The training took 4 weeks on a two Nvidia 1080 card cluster, which was surprising but perhaps understandable given the huge number of layers. The overall model architecture can be visualized by either a JSON tree or with PNG graphics, although both are too long to fit on one page. The figure below shows just one tiny chunk of the overall model architecture.

Figure 6. Top portion of the FCN model. The portion shown is less than one-tenth of the total.

source: https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

It is important to point out that the authors of the paper and code both leveraged established image recognition models, generally the winning entries of the ImageNet competition, such as the VGG nets, ResNet, AlexNet, and the GoogLeNet. Imaging is the one area where transfer learning applies readily. Researchers without the near infinite resources found at Google and Microsoft can still leverage their training results and retrain high-quality models by adding only small new datasets or make minor modifications. In this case, the proven classification architectures named above are modified by stripping away the fully connected layers at the end and replaced with fully convolutional and upsampling layers.

RESNET (RESIDUAL NETWORK)

In particular, the open source code we experimented with is based on Resnet from Microsoft. Resnet has the distinction of being the deepest network ever presented on ImageNet, with 152 layers. In order to make such a deep network converge, the submitting group had to tackle a well-known problem where error rate tends to rise rather than drop after a certain depth. They discovered that by adding skip (aka highway) connections, the overall network converges much better. The explanation lies with the relative ease in training intermediates to minimize residuals rather the originally intended mapping (thus the name Residual Network). The figure below illustrates the use skip connections used in the original ResNet paper, which are found in the open source FCN model derived from ResNet.

Figure 7a. Resnet uses multiple skip connections to improve the overall error rate of a very deep network

source: https://arxiv.org/abs/1512.03385

 

Figure 7b. Middle portion of the Aurora model displaying skip connections, which is a characteristic of ResNet.

source: https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

The exact intuition behind Residual Network is less than obvious. There is plenty good discussion in this Quora blog.

BILINEAR UPSAMPLING

As alluded to in Figure 4, at the end stage the resolution of the tensor must be brought back to original dimension using an upsampling step. The original paper stated that a simple bilinear interpolation is fast and effective. And this is the approach taken in the Aurora project, as illustrated below.

Figure 8. Only a single upsampling stage was implemented in the open source code.

source https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

Although the paper authors pointed out the improvement achieved by use of skips and fusions in the upsampling stage, it is not implemented by the Aurora FCN project. The diagram for the end stage illustrates that only a single up sampling layer is used. This may leave room for further improvement in error rate.

The code simply makes a TensorFlow call to implement this upsampling stage:

X = tf.image.resize_bilinear(X, new_shape)

 

ERROR RATE

The metrics used to measure segmentation accuracy is intersection over union (IOU). The IOU measured over 21 randomly selected test images are:

[ 0.90853866  0.75403876  0.35943439  0.63641792  0.46839113  0.55811771

0.76582419  0.70945356  0.74176198  0.23796475  0.50426148  0.34436233

0.5800221   0.59974548  0.67946723  0.79982366  0.46768033  0.58926592

0.33912701  0.71760929  0.54273803]

These have a mean of 0.585907. This mean is very close to the number published in the original paper. The pixel level classification accuracy is very high at 0.903266, meaning when a pixel is classified as certain object type, it is correct about 90% of the time.

 

CONCLUSION

The ability to identify image pixels as members of a particular object without a pre-processing step of bounding box detection is a major step forward for deep image recognition. The techniques demonstrated by Shelhamer’s paper achieves this goal by combining coarse-level semantic identification with pixel-level location information. This technique leverages transfer learning based on pre-trained image recognition models that were winning entries in the ImageNet competition. Various open source project replicated the results. Certain implementations require extraordinarily long training time.