With all this talk of the bright future of Artificial Intelligence (AI), it’s no surprise that almost every industry is looking into how they will reap the benefits from the forthcoming (dare I say already existing?) AI technologies. For some, AI will merely enhance the technologies already being used. For others, AI is becoming a crucial component to keeping the industry alive. Healthcare is one such industry.

The Problem: Diminishing Labor Force

Part of the need for AI-based Healthcare stems from the concern that one-third of nurses are baby boomers, who will retire by 2030, taking their knowledge with them. This drastic shortage in healthcare workers poses the imminent need for replacements and, while the enrollment numbers in nursing school stay stable, the demand for experienced workers will continue to increase. This need for additional clinical support is one area where AI comes into play. In fact, these emerging technologies will not only help serve as a multiplier force for experienced nurses, but for doctors and clinical staff support as well.

Healthcare-AI Automation Applications to the Rescue

One of the most notable solutions for this shortage will be automating processes for determining whether or not a patient actually needs to visit a doctor in-person. Doctors’ offices are currently inundated with appointments and patients who’s lower-level questions and concerns could be addressed without a face-to-face consultation via mobile applications. Usually in the from of chatbots, these AI-powered applications can provide basic healthcare support by “bringing the doctor to the patient” and alleviating the need for the patient to leave the comfort of their home, let alone scheduling an appointment to go in-office and visit a doctor (saving time and resources for all parties involved).

Should a patient need to see a doctor,  these applications also contain schedulers capable of determining appointment type, length, urgency, and available dates/times, foregoing the need for constant human-based clinical support and interaction. With these AI schedulers also comes AI-based Physician’s Assistants that provide additional in-office support like scheduling follow-up appointments, taking comprehensive notes for doctors, ordering specific prescriptions and lab testing, providing drug interaction information for current prescriptions, etc. And this is just one high-level AI-based Healthcare solution (albeit with many components).

With these advancements, Healthcare stands to gain significant ground with the help of domain-specific AI capabilities that were historically powered by humans. As a result, the next generation of healthcare has already begun, and it’s being revolutionized by AI.

Sometimes I get to thinking that Alexa isn’t really my friend. I mean sure, she’s always polite enough (well, usually, but it’s normal for friends to fight, right?). But she sure seems chummy with that pickle-head down the hall too. I just don’t see how she can connect with us both — we’re totally different!

So that’s the state of the art of conversational AI: a common shared agent that represents an organization. A spokesman. I guess she’s doing her job, but she’s not really representing me or M. Pickle, and she can’t connect with either of us as well as she might if she didn’t have to cater to both of us at the same time. I’m exaggerating a little bit – there are some personalization techniques (*cough* crude hacks *cough*) in place to help provide a custom experience:

  • There is a marketplace of skills. Recently, I can even ask her to install one for me.
  • I have a user profile. She knows my name and zip code.
  • Through her marketplace, she can access my account and run my purchase through a recommendation engine (the better to sell you with, my dear!)
  • I changed her name to “Echo” because who has time for a third syllable? (If only I were hamming this up for the post; sadly, a true story)
  • And if I may digress to my other good friend Siri, she speaks British to me now because duh.

It’s a start but, if we’re honest, none of these change the agent’s personality or capabilities to fit with all of my quirks, moods, and ever-changing context and situation. Ok, then. What’s on my wishlist?

  • I want my own agent with its own understanding of me, able to communicate and serve as an extension of myself.
  • I want it to learn everything about how I speak. That I occasionally slip into a Western accent and say “ruf” instead of “roof”. That I throw around a lot of software dev jargon; Python is neither a trip to the zoo nor dinner (well, once, and it wasn’t bad. A little chewy.) That Pickle Head means my colleague S… nevermind. You get the idea.
  • I want my agent to extract necessary information from me in a way that fits my mood and situation. Am I running late for a life-changing meeting on a busy street uphill in a snowstorm? Maybe I’m just goofing around at home on a Saturday.
  • I want my agent to learn from me. It doesn’t have to know how to do everything on this list out of the box – that would be pretty creepy – but as it gets to know me it should be able to pick up on my cues, not to mention direct instructions.

Great, sign me up! So how do I get one? The key is to embrace training (as opposed to coding, crafting, and other manual activities). As long as there is a human in the loop, it is simply impossible to scale an agent platform to this level of personalization. There would be a separate and ongoing development project for every single end user… great job security for developers, but it would have to sell an awful lot of stuff.

To embrace training, we need to dissect what goes into training. Let’s over-simplify the “brain” of a conversational AI for a moment: we have NLU (natural language understanding), DM (dialogue management), and NLG (natural language generation). Want an automatically-produced agent? You have to automate all three of these components.

  • NLU – As of this writing, this is the most advanced component of the three. Today’s products often do incorporate at least some training automation, and that’s been a primary enabler that leads to the assistants that we have now. Improvements will need to include individualized NLU models that continually learn from each user, and the addition of (custom, rapid) language models that can expand upon the normal and ubiquitous day-to-day vocabulary to include trade-specific, hobby-specific, or even made-up terms. Yes, I want Alexa to speak my daughter’s imaginary language with her.
  • DM – Sorry developers, if we make plugin skills ala Mobile Apps 2.0 then we aren’t going to get anywhere. Dialogues are just too complex, and rules and logic are just too brittle. This cannot be a programming exercise. Agents must learn to establish goals and reason about using conversation to achieve those goals in an automated fashion.
  • NLG – Sorry marketing folks, there isn’t brilliant copy for you to write. The agent needs the flexibility to communicate to the user in the most effective way, and it can’t do that if it’s shackled by canned phrases that “reflect the brand”.

In my experience, most current offerings are focusing on the NLU component – and that’s awesome! But to realize the potential of MicroAgents (yeah, that’s right. MicroAgents. You heard it here first) we need to automate the entire agent, which is easier said than done. But that’s not to say that it’s not going to happen anytime soon – in fact, it might happen sooner than you think.  

Echo, I’m done writing. Post this sucker.

Doh!

 

In the 2011 Jeopardy! face-off between IBM’s Watson and Jeopardy! champions Ken Jennings and Brad Rutter, Jennings acknowledged his brutal takedown by Watson during the last double jeopardy in stating “I for one welcome our new computer overlords.” This display of computer “intelligence” sparked mass amounts of conversation amongst myriad groups of people, many of whom became concerned at what they perceived as Watson’s ability to think like a human. But, as BigR.io’s Director of Business Development Andy Horvitz points out in his blog “Watson’s Reckoning,” even the Artificial Intelligence technology with which Watson was produced is now obsolete.

The thing is, while Watson was once considered to be the cutting-edge technology of Artificial Intelligence, Artificial Intelligence itself isn’t even cutting-edge anymore. Now, before you start lecturing me about how AI is cutting-edge, let me explain.

Defining Artificial Intelligence

You see, as Bernard Marr points out, Artificial Intelligence is the overarching term for machines having the ability to carry out human tasks. In this regard, modern AI as we know it has already been around for decades – since the 1950s at least (especially thanks to the influence of Alan Turing). Moreso, some form of the concept of artificial intelligence dates back to ancient Greece when philosophers started describing human thought processes as a symbolic system. It’s not a new concept, and it’s a goal that scientists have been working towards for as long as there have been machines.

The problem is that the term “artificial intelligence” has become a colloquial term applied when a machine mimics “cognitive” functions that humans associate with other human minds, such as “learning” and “problem solving.” But the thing is, AI isn’t necessarily synonymous with “human thought capable machines.” Any machine that can complete a task in a similar way that a human might can be considered AI. And in that regard, AI really isn’t cutting-edge.

What is cutting-edge are the modern approaches to Machine Learning, which have become the cusp of “human-like” AI technology (like Deep Learning, but that’s for another blog).

Though many people (scientists and common folk alike) use the terms AI and Machine Learning interchangeably, Machine Learning actually has the narrower focus of using the core ideas of AI to help solve real-world problems. For example, while Watson can perform the seemingly human task of critically processing and answering questions (AI), it lacks the ability to use these answers in a way that’s pragmatic to solve real-world problems, like synthesizing queried information to find a cure for cancer (Machine Learning).

Additionally, as I’m sure you already know, Machine Learning is based upon the premise that these machines train themselves with data rather than by being programmed, which is not necessarily a requirement of Artificial Intelligence overall.

https://xkcd.com/1838/

Why Know the Difference?

So why is it important to know the distinction between Artificial Intelligence and Machine Learning? Well, in many ways, it’s not as important now as it might be in the future. Since the two terms are used so interchangeably and Machine Learning is seen as the technology driving AI, hardly anyone would correct you if were you to use them incorrectly. But, as technology is progressing ever faster, it’s good practice to know some distinction between these terms for your personal and professional gains.

Artificial Intelligence, while a hot topic, is not yet widespread – but it might be someday. For now, when you want to inquire about AI for your business (or personal use), you probably mean Machine Learning instead. By the way, did you know we can help you with that? Find out more here.

We’re seeing and doing all sorts of interesting work in the Image domain. Recent blog posts, white papers, and roundtables capture some of this work, such as image segmentation and classification to video highlights. But an Image area of broad interest that, to this point, we’ve but scratched the surface of is Video-based Anomaly Detection. It’s a challenging data science problem, in part due to the velocity of data streams and missing data, but has wide-ranging solution applicability.

In-store monitoring of customer movements and behavior.

Motion sensing, the antecedent to Video-based Anomaly Detection, isn’t new and there are a multitude of commercial solutions in that area. Anomaly Detection is something different and it opens the door to new, more advanced applications and more robust deployments. Part of the distinction between the two stems from “sensing” what’s usual behavior and what’s different.

Anomaly Detection

Walkers in the park look “normal”. The bicyclist is the anomaly. 

Anomaly detection requires the ability to understand a motion “baseline” and to trigger notifications based on deviations from that baseline. Having this ability offers the opportunity to deploy AI-monitored cameras in many more real-world situations across a wide range of security use cases, smart city monitoring, and more, wherein movements and behaviors can be tracked and measured with higher accuracy and at a much larger scale than ever before.

With 500 million video cameras in the world tracking these movements, a new approach is required to deal with this mountain of data. For this reason, Deep Learning and advances in edge computing are enabling a paradigm shift from video recording and human watchers toward AI monitoring. Many systems will have humans “in the loop,” with people being alerted to anomalies. But others won’t. For example, in the near future, smart cities will automatically respond to heavy traffic conditions with adjustments to the timing of stoplights, and they’ll do so routinely without human intervention.

Human in the Loop

Human in the loop.

As on many AI fronts, this is an exciting time and the opportunities are numerous. Stay tuned for more from BigR.io, and let’s talk about your ideas on Video-based Anomaly Detection or AI more broadly.

A few months back, Treasury Secretary Steve Mnuchin said that AI wasn’t on his radar as a concern for taking over the American labor force and went on to say that such a concern might be warranted in “50 to 100 more years.” If you’re reading this, odds are you also think this is a naive, ill-informed view.

An array of experts, including Mnuchin’s former employer, Goldman Sachs, disagree with this viewpoint. As PwC states, 38% of US jobs will be gone by 2030. On the surface, that’s terrifying, and not terribly far into the future. It’s also a reasonable, thoughtful view, and a future reality for which we should prepare.

Naysayers maintain that the same was said of the industrial and technological revolutions and pessimistic views of the future labor market were proved wrong. This is true. Those predicting doom in those times were dead wrong. In both cases, technological advances drove massive economic growth and created huge numbers of new jobs.

Is this time different?

It is. Markedly so.

The industrial revolution delegated our labor to machines. Technology has tackled the mundane and repetitive, connected our world, and, more, has substantially enhanced individual productivity. These innovations replaced our muscle and boosted the output of our minds. They didn’t perform human-level functions. The coming wave of AI will.

Truckers, taxi and delivery drivers, they are the obvious, low-hanging fruit, ripe for AI replacement. But the job losses will be much wider, cutting deeply into retail and customer service, impacting professional services like accounting, legal, and much more. AI won’t just take jobs. Its impacts on all industries will create new opportunities for software engineers and data scientists. The rate of job creation, however, will lag far behind that of job erosion.

But it’s not all bad! AI is a massive economic catalyst. The economy will grow and goods will be affordable. We’re going to have to adjust to a fundamental disconnect between labor and economic output. This won’t be easy. The equitable distribution of the fruits of this paradigm shift will dominate the social and political conversation of the next 5-15 years. And if I’m right more than wrong in this post, basic income will happen (if only after much kicking and screaming by many). We’ll be able to afford it. Not just that — most will enjoy a better standard of living than today while also working less.

I might be wrong. The experts might be wrong. You might think I’m crazy (let’s discuss in the comments). But independent of specific outcomes, I hope we can agree that we’re on the precipice of another technological revolution and these are exciting times!

When I was in graduate school, I designed a construction site of the future. It was in collaboration with Texas Instruments in the late 90s. The big innovation, at the time, was RFID (radio-frequency identification). Not that RFID was new. In fact, it has been around since World War II where it was used to identify allied planes. After the war, it made its way into industry through anti-theft applications. In the 80s, a group of scientists from Los Alamos National Laboratory formed a company using RFID for toll payment systems (still in use today). A separate group of scientists there also created a system for tracking medication management in livestock. From here it made its way into multiple other applications and began to proliferate.

RFID got a boost in 1999 when two MIT professors, David Brock and Sanjay Sarma, reversed the trend of adding more memory and more functionality to the tags and stripped them down to a low-cost, very simple microchip. The data gleaned from the chip was stored in a database and was accessible via the web. This was right at the time that the wireless web emerged (good old CDPD) as well, which really bolstered widespread adoption. This also precipitated funding from large companies, like Procter & Gamble and Gillette (this was before P&G acquired Gillette), to institute the Auto-ID Center at MIT, which furthered the creation of standards and cemented RFID as an invaluable weapon for companies, especially those with complex supply chains.

OK, as you can tell, RFID has a special place in my heart. I even patented the idea of marrying RFID with images, but that is another story. Anyway, up to this point you’ve probably decided this is a post about RFID, but it’s not. It’s a post about RFID to IoT (Internet of Things). The term Internet of Things (IoT) was first coined by British entrepreneur Kevin Ashton in 1999 while working at Auto-ID Labs, specifically referring to a global network of objects connected by RFID. But RFID is just one type of sensor and there are numerous sensors out there. I like this definition from Wikipedia:

In the broadest definition, a sensor is an electronic component, module, or subsystem whose purpose is to detect events or changes in its environment and send the information to other electronics, frequently a computer processor. A sensor is always used with other electronics, whether as simple as a light or as complex as a computer.

Sensors have been around for quite some time in various forms. The first thermostat came to market in 1883, and many consider this the first modern, manmade sensor. Infrared sensors have been around since the late 1940s, even though they’ve really only recently entered the popular nomenclature. Motion detectors have been in use for a number of years as well. Originally invented by Heinrich Hertz in the late 1800s, they were advanced in World War II in the form of radar technology. There are numerous other sensors: biotech, chemical, natural (e.g. heat and pressure), sonar, infrared, microwave, and silicon sensors to name a few.

According to Gartner, there are currently 8 Billion IoT Units worldwide and there will be 20 Billion by 2020. Suffice to say there are numerous sources of data to track “things” within an organization and throughout supply chains. There are also numerous complexities to managing all of these sensors, the data they generate, and the actionable intelligence that is extracted and needs to be acted on. Some major obstacles are networks with time delays, switching topologies, density of units in a bounded region, and metadata management (especially across trading partners and customers). These are all challenges we at BigR.io have helped customers work through and resolve. A great example is our Predictive Maintenance offering.

Let’s get back to RFID to IoT. There is a tight coupling because the IP address of the unit needs to be supplemented with other information about the thing (for example, condition, context, location, security, etc). RFID and other sensors working in unison can provide this supplemental information. This marriage enables advanced analytics including the ability to make predictions. Large sensor networks must be properly architected to enable effective sensor fusion. Machine Learning helps take IoT to the next level of sophistication for predictions and automation for fixes and can help figure out when and where every ”thing” fits in the ecosystem that they play in. A proper IoT agent should monitor the health of the systems individually and in relation to other parts. Consensus filters will help in the analysis of the convergence, noise propagation reduction, and ability to track fast signals.

There are other factors that play into why IoT is so hot right now: the whole Big Data phenomenon has lent itself to the growth, endless compute power has served as a foundation by which advanced applications using IoT can run, and the Machine Learning libraries have been democratized by companies like Google, Facebook, and Microsoft. In general, Machine Learning thrives when mounds of data are available. However, storing all data is cost prohibitive and there is so much data being generated that most companies opt to only store bits of critical data. Some companies only store the data to freeze it from failures. You may not want to store all data, but you don’t want to lose “metadata,” or the key information that the data is trying to tell you, whether from the sensor itself or indirectly through neighboring sensors. I had a stint where we supported Federal and Defense-related sensor fusion initiatives and I picked up a handy classification of data:

  • Data
  • Information
  • Knowledge
  • Intelligence

The flow is moving the metadata being generated down the line into information → knowledge → intelligence that can be acted upon.

There also exists the ABCs of Data Context:

[A]pplication Context: Describes how raw bits are interpreted for use.

[B]ehavioral Context: Information about how data was created and used by real people or systems.

[C]hange Over Time: The version history of the other two forms of data context.

Data context plays a major role in harnessing the power of an IoT network. As we progress to smarter networks, more sophisticated sensors, and artificial intelligence that manages our “things,” the architecture of your infrastructure (enterprise data hub), the cultivation and management of your data flows, and the analytics automation that rides on top of everything become critical for day-to-day operations. The good news is that if this is all done properly, you will reap the rewards of thing harmony (coined here first folks).

Please visit our Deep Learning Neural Networks for IoT white paper for a more technical slant.

For many years, and with rapidly accelerating levels of targeting sophistication, marketers have been tailoring their messaging to our tastes. Leveraging our data and capitalizing upon our shopping behaviors, they have successfully delivered finely-tuned, personalized messaging.

Consumers are curating their media ever more by the day. We’re buying smaller cable bundles, cutting cords, and buying OTT services a la carte. At the same time, we’re watching more and more short-form video. Video media is tilting toward snack-size bites and, of course, on demand.

Cable has been in decline for years and the effects are now hitting ESPN, once the mainstay of a cable package. Even live sports programming, long considered must see and even bulletproof by media executives, has seen declining viewership.

 

So what’s to be done?

To thrive, and perhaps merely to survive, content owners must adapt. Leagues and networks have come a long way toward embracing a “TV Everywhere” distribution model despite the obnoxious gates at every turn. But that’s not enough and the sports leagues know it.

While there are many reasons for declining viewership and low engagement among younger audiences, length of games and broadcasts are a significant factor. The leagues recognize that games are too long. The NBA has made some changes that will speed up the action and the NFL is also considering shortening games to avoid losing viewership. MLB has long been tinkering in the same vein. These changes are small, incremental, and of little consequence to the declining number of viewers.

Most sporting events are characterized by long stretches of calm, less interesting play that is occasionally accented by higher intensity action. Consider for a moment how much actual action there is in a typical football or baseball game. Intuitively, most sports fans know that the bulk of the three-hour event is consumed by time between plays and pitches. Still, it’s shocking to see the numbers from the Wall Street Journal, which point out that there are only 11 minutes of action in a typical football game and a mere 18 minutes in a typical baseball game.

 

A transformational opportunity

There is so much more they can do. Recent advances in neural network technology have enabled an array of features to be extracted from streaming video. The applications are broad and the impacts significant. In this sports media context, the opportunity is nothing short of transformational.

Computers can now be trained to programmatically classify the action in the underlying video. With intelligence around what happens where in the game video, the productization opportunities are endless. Fans could catch all of the action, or whatever plays and players are most important to them, in just a few minutes. With a large indexed database of sports media content, the leagues could present near unlimited content personalization to fans.

Want to see David Ortiz’s last ten home runs? Done.

Want to see Tom Brady’s last ten TD passes? You’re welcome.

Robust features like these will drive engagement and revenue. With this level of control, fans are more likely to subscribe to premium offerings, offering predictable recurring revenue that will outpace advertising in the long run.

Computer-driven, personalized content is going to happen. It’s going to be amazing, and we are one step closer to getting there.

Scientists have been working on the puzzle of human vision for many decades. Convolutional Neural Network (CNN or convnet)-based Deep Learning reached a new landmark for image recognition when Microsoft announced it had beat the human benchmark in 2015. Five days later, Google one-upped Microsoft with a 0.04% improvement.

Figure 1. In a typical convnet model, the forward pass reduces the raw pixels into a vector representation of visual features. In its condensed form, the features can be effectively classified using fully connected layers.

source: BigRio

 

Data Scientists don’t sleep. The competition immediately moved to the next battlefield of object segmentation and classification for embedded image content. The ability to pick out objects inside a crowded image is a precursor to fantastic capabilities, like image captioning, where the model describes a complex image in full sentences. The initial effort to translate full-image recognition to object classification involved different means of localization to efficiently derive bounding boxes around candidate objects. Each bounding box is then processed with a CNN to classify the single object inside the box. A direct pixel-level dense prediction without preprocessing was, for a long time, a highly sought-after prize.

 

Figure 2. Use bounding box to classify embedded objects in an image

source: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/object_localization_and_detection.html

In 2016, a UC Berkeley group, led by E. Shelhamer, achieved this goal using a technique called Fully Convolutional Neural Network. Instead of using convnet to extract visual features followed by fully connected layers to classify the input image, the fully connected layers are converted to additional layers of convnet. Whereas the fully connected layers completely lose all information on the original pixel locations, the cells in the final layer of a convnet are path-connected to the original pixels through a construct called receptive fields.

Figure 3. During the forward pass, a convnet reduces raw pixel information to condensed visual features which can then be effectively classified using fully connected neural network layers. In this sense, the feature vectors contain the semantic information derived from looking at the image as a whole.

source: BigRio

 

Figure 4. In dense prediction, we want to both leverage the semantic information contained in the final layers of the convnet and assign the semantic meaning back to the pixels that generated the semantic information. The upsampling step, also known as the backward pass, maps the feature representations back onto the original pixels positions.

source: BigRio

 

The upsampling step is something of great interest. In a sense, it deconvolutes the dense representation back to its original resolution and the deconvolution filters can be learned through Stochastic Gradient Descent, just like any forward pass learning process. A good visual demonstration of deconvolution can be found here. The most practical way to implement this deconvolution step is through bilinear interpolation, as discussed later.

The best dense prediction goes beyond just upsampling the last and coarsest convnet layer. By fusing results from shallower layers, the result becomes much more finely detailed. Using a skip architecture as shown in Figure 4, the model is able to make accurate local predictions that respect global structure. The fusion operation is based on concatenating vectors from two layers and perform a 1 x 1 convolution to reduce the vector dimension back down again.

 

Figure 5. Fuse upsampling results from shallower layers push the prediction limits to a finer scale.

source: BigRio

LABELED DATA

As is often the case when working with Deep Learning, collecting high-quality training data is a real challenge. In the image recognition field, we are blessed with open source data from PASCAL VOC Project. The 2011 dataset provides 11,530 images with 20 classes. Each image is pre-segmented with pixel-level precision by academic researchers. Examples of segmented images can be found here.

 

OPEN SOURCE MODELS

Computer vision enthusiasts also benefit hugely from open source projects which implement almost every exciting new development in the deep learning field. The author’s group posted a Caffe implementation of FCNN. For keras implementations, you will find no fewer than 9 FCN projects on GitHub. After trying out a few, we focused on the Aurora FCNproject, which started running with very little modifications. The authors provided rather detailed instruction on environment setup and downloading of datasets. We chose the AstrousFCN_Resnet50_16s model out of the six included in the project. The training took 4 weeks on a two Nvidia 1080 card cluster, which was surprising but perhaps understandable given the huge number of layers. The overall model architecture can be visualized by either a JSON tree or with PNG graphics, although both are too long to fit on one page. The figure below shows just one tiny chunk of the overall model architecture.

Figure 6. Top portion of the FCN model. The portion shown is less than one-tenth of the total.

source: https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

It is important to point out that the authors of the paper and code both leveraged established image recognition models, generally the winning entries of the ImageNet competition, such as the VGG nets, ResNet, AlexNet, and the GoogLeNet. Imaging is the one area where transfer learning applies readily. Researchers without the near infinite resources found at Google and Microsoft can still leverage their training results and retrain high-quality models by adding only small new datasets or make minor modifications. In this case, the proven classification architectures named above are modified by stripping away the fully connected layers at the end and replaced with fully convolutional and upsampling layers.

RESNET (RESIDUAL NETWORK)

In particular, the open source code we experimented with is based on Resnet from Microsoft. Resnet has the distinction of being the deepest network ever presented on ImageNet, with 152 layers. In order to make such a deep network converge, the submitting group had to tackle a well-known problem where error rate tends to rise rather than drop after a certain depth. They discovered that by adding skip (aka highway) connections, the overall network converges much better. The explanation lies with the relative ease in training intermediates to minimize residuals rather the originally intended mapping (thus the name Residual Network). The figure below illustrates the use skip connections used in the original ResNet paper, which are found in the open source FCN model derived from ResNet.

Figure 7a. Resnet uses multiple skip connections to improve the overall error rate of a very deep network

source: https://arxiv.org/abs/1512.03385

 

Figure 7b. Middle portion of the Aurora model displaying skip connections, which is a characteristic of ResNet.

source: https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

The exact intuition behind Residual Network is less than obvious. There is plenty good discussion in this Quora blog.

BILINEAR UPSAMPLING

As alluded to in Figure 4, at the end stage the resolution of the tensor must be brought back to original dimension using an upsampling step. The original paper stated that a simple bilinear interpolation is fast and effective. And this is the approach taken in the Aurora project, as illustrated below.

Figure 8. Only a single upsampling stage was implemented in the open source code.

source https://github.com/aurora95/Keras-FCN/blob/master/Models/AtrousFCN_Resnet50_16s/model.png

Although the paper authors pointed out the improvement achieved by use of skips and fusions in the upsampling stage, it is not implemented by the Aurora FCN project. The diagram for the end stage illustrates that only a single up sampling layer is used. This may leave room for further improvement in error rate.

The code simply makes a TensorFlow call to implement this upsampling stage:

X = tf.image.resize_bilinear(X, new_shape)

 

ERROR RATE

The metrics used to measure segmentation accuracy is intersection over union (IOU). The IOU measured over 21 randomly selected test images are:

[ 0.90853866  0.75403876  0.35943439  0.63641792  0.46839113  0.55811771

0.76582419  0.70945356  0.74176198  0.23796475  0.50426148  0.34436233

0.5800221   0.59974548  0.67946723  0.79982366  0.46768033  0.58926592

0.33912701  0.71760929  0.54273803]

These have a mean of 0.585907. This mean is very close to the number published in the original paper. The pixel level classification accuracy is very high at 0.903266, meaning when a pixel is classified as certain object type, it is correct about 90% of the time.

 

CONCLUSION

The ability to identify image pixels as members of a particular object without a pre-processing step of bounding box detection is a major step forward for deep image recognition. The techniques demonstrated by Shelhamer’s paper achieves this goal by combining coarse-level semantic identification with pixel-level location information. This technique leverages transfer learning based on pre-trained image recognition models that were winning entries in the ImageNet competition. Various open source project replicated the results. Certain implementations require extraordinarily long training time.

Voice Ordering Is Here. Voice Shopping Is Coming… And It’s Far More Interesting

Siri has been with us for years, but it’s in the last few months and largely due to Amazon that voice assistants have won rapid adoption and heightened awareness.

Over these past few months, we’ve been shown the power of a new interaction paradigm. I have an Echo Dot and I love it. Controlling media and the home controls (doing some lights, maybe thermostat soon) seem among the most useful and sticky applications. The Rock, Paper, Scissors skill… yeah, that one’s probably not going to see as much use. But let’s not forget that this slick device is brought to us by the most dominant eCommerce business in the known universe. So it’s great for voice shopping, right? No, not at all, as it doesn’t actually do “shopping.”

“But I heard the story about the six-year-old who ordered herself a dollhouse?” So did I, and it reinforces my point. Let me explain. The current state of commerce via Alexa is most like a broad set of voice operated Dash Buttons. For quick reorders of things you buy regularly and when you’re not interested in price comparisons, it’s fine. What it’s not — voice shopping. Shopping is an exercise in exploration, research, and comparison. That experience requires a friendly and intelligent guide. As such, voice shopping isn’t supported by the ubiquitous directive-driven (do X, response, end) voice assistants.

 

Enter Jaxon and Conversational AI

Shopping is about feature and price comparison, consideration of reviews, suggestions from smart recommendation engines, and more. Voice shopping is enabled by a conversational voice experience, one that understands history and context and delivers a far richer experience than is widely available today.

 

The Mobile Impact

Mobile commerce isn’t new but is still growing fast. Yet, despite consumers spending far more time on mobile devices than on desktops (broadly defined, including laptops), small screen eCommerce spending still lags far behind.

So why can’t merchants close on mobile? The small screen presents numerous challenges. Small screens make promotion difficult and negatively impact upselling and cross-selling. Another major factor, and one you’ve probably experienced, is the often terrible mobile checkout process. Odds are you’ve abandoned a mobile purchase path after fiddling with some poorly designed forms. I have. Maybe you went back via your laptop. Maybe you didn’t. Either way, that’s a terrible user experience.

Our approach to Conversational AI solves these small screen challenges. Merchants can now bring a human commerce experience to the small screen without the mess. It’s a new, unparalleled engagement opportunity — a chance to converse with your customer, capture real intelligence about their needs, and offer just the right thing. It’s an intelligent personal shopper in the hands of every customer.

Come re-imagine voice shopping with us. Imagine product discovery and comparison, driven by voice. Imagine being offered just what you were looking for, based on a natural language description of what you need. Imagine adjusting your cart with your voice. Imagine entering your payment and shipping info quickly and seamlessly, via voice. It’s all possible and it’s happening now with Jaxon.

Customers often ask what gives us the qualifications to work in their industry (industries like these, for example). They wonder whether we are able to able to handle the massive amounts and types of data they have available within their respective industries. Before we answer these questions, consider the following:

Picture in your mind the industry you work for. Do you think you have an ability to offer a unique set of skills of which no other industry can compare? Are your data sources large and unwieldy, seemingly more complex than other industries? Do you feel as though it takes a person within your industry to fully comprehend the data complexities you have to manage?

If you answered “yes” to any of these questions, you’re wrong.

That’s not entirely true, you might not be completely wrong. But chances are that, while your data may be unique in some ways, it’s probably not harder or more complex than most other extant industries. Now you’re saying to yourself “Well, how do you know? You don’t work in my industry, do you?” But you might be surprised to find that we do work in your industry. In fact, we work in all industries.

When it comes to leveraging Big Data, breadth of skill set and ability are key to managing the overwhelmingly complex sets of data that you encounter in your industry. The problem many of these industries face is that they don’t actually have that breadth to work with. Yes, they may be leaders in their industry, but that still means they are held within the confines of only one industry, not knowing what else is out there that might work for them. That is where we come in. You see, our work in multitudinous industries (eCommerce, Healthcare, Finance, Manufacturing, and Life Sciences, to name a few) across myriad platforms has provided us with a vast breadth of skill sets and abilities that pertain not only to the industry in which they were acquired but to innumerable other industries as well.

Often times, problems that may seem unprecedented or distinct within one industry have more than likely already occurred along a similar vein within another industry. Since BigR.io works in multiple organizations across many industries, we have the ability to identify and solve many, many problems and compare them to many other problems experienced within those industries. Additionally, as Country Music Hall of Famer Kenny Rogers so eloquently explains, you got to know when to hold ‘em, know when to fold ‘em, know when to walk away and know when to run. The same principle applies to solving Big Data problems. We have high-horsepower, high-caliber data scientists with good judgment who know when to bridge across organizations and industries, when to focus within the single industry, and when to find another solution entirely.

BigR.io‘s engineering team has extensive experience across many industries and thrives in new environments, and can help you with your company’s Big Data, Machine Learning, and Custom Software needs. For more information on how we can help handle these needs, visit our library full of case studies and white papers.