The Black Arts of Machine Learning

MLFG

As I outline in the Machine Learning Field Guide, the concept of Machine Learning arose from interests in having machines learn from data. The industry has seen cycles of stagnation and resurgence in machine learning/AI research since as early as the 1950s. During the 1980s, we saw the emergence of the Multi-layer Perceptron and it’s back propagation training mechanism, both fundamental to today’s highly sophisticated Deep Learning architecture capable of image recognition and behavior analysis. However, to reach its zenith, this field depended on advancements in data proliferation and acquisition that wouldn’t materialize for many more decades. As promising as the initial results were, early attempts in industrial application of artificial intelligence as a whole fizzled.

Though the practice of Machine Learning only ascended to prominence recently, much of its mathematical foundation dates back centuries. Thomas Bayes, father of the Bayesian method from which we base contemporary statistical inference, wrote his famous equation in the 1700s. Shortly after, in the early 1800s, immortalized academics like Legendre and Gauss developed early forms of the statistical regression models we use today. Statistical analysis as a discipline remained an academic curiosity from this time until the commoditization of low-cost computing in the 1990s and onslaught of social media and sensor data in the 2000s.

What does this mean for Machine Learning today? Enterprises are sitting on data goldmines and collecting more at a staggering rate with ever greater complexity. Today’s Machine Learning is about mining this treasure trove, extracting actionable business insights, predicting future events, and prescribing next best actions, all in laser-sharp pursuit of business goals. In the rush to harvest these gold mines, Machine Learning is entering its golden age, buoyed by Big Data technology and Cloud infrastructure, and abundant access to open source software. Intense competition in the annual ImageNet contest between global leaders like Microsoft, Google, and Tencent rapidly propels machine learning/image recognition technology forward, and source codes for all winning entries are made available to the public free of charge. Most contestants in the Kaggle machine learning site share their work in the same spirit as well. In addition to these source codes, excellent free machine learning tutorials compete for mindshare on Coursera, edX, and Youtube. Hardware suppliers such as Nvidia and Intel further the cause by continuing to push the boundary for denser packaging of high-performance GPU to speed up Neural Networks. Thanks to these abundant resources, any aspiring entrepreneur or lone-wolf researcher has access to petabytes of storage, utility massive parallel computing, open source data, and software libraries. As of 2015, this access has led to developing computer image recognition capabilities that outperform human image recognition abilities.

With recent stunning successes in Deep Learning research, the floodgates open for industrial applications of all kinds. Practitioners enjoy a wide array of options when targeting specific problems. While Neural Networks clearly lead in the high-complexity and high-data volume end of the problem space, classical machine learning still achieves higher prediction and classification quality for low sample count applications, not to mention the drastic cost savings in computing time and gears. Research suggests that the crossover occurs at around one hundred thousand to one million samples. Just a short time ago, numbers like these would have scared away any level-headed project manager. Nowadays, data scientists are asking for more data and are getting it expediently and conveniently. A good Data Lake and data pipeline are necessary precursors to any machine learning practice. Mature data enterprises emphasize the close collaboration of data engineering (infrastructure) teams with data science teams. “Features” are the lingua franca of their interactions, not “files,” “primary keys,” or “provisions”.

Furthermore, execution environments should be equipped with continuous and visual monitoring capabilities, as any long running Neural Network training session (days to weeks) involves frequent mid-course adjustment based on feedback of evolving model parameters. Whether the most common Linear Regression or the deepest Convolutional Neural Network, the challenge of any machine learning experimentation is wading through the maze of configurational parameters and picking out a winning combination. After selecting the candidate models, a competent data scientist navigates a series of decisions from starting point, to learning rate, to sample size, to regularization setting, as well as constant examination of convergence on parallel training runs and various runtime tuning, all in attempt to get the most accurate model in the shortest amount of time.

Like I state in my recent e-book “Machine Learning Field Guide,” Machine Learning is smarter than ever and improving rapidly. This predictive juggernaut is coming fast and furious and will transform any business in its path. For the moment, it’s still a black magic in the hands of the high priests of statistics. As an organization with a mission to deliver its benefits to clients, BigR.io trained an internal team of practitioners, organized an external board of AI advisors, and packaged a Solutions Playbook as a practice guide. We have harnessed best practices, specialty algorithms, experiential guidelines, and training tutorials, all in effort to streamline delivery and concentrate most of our engagement efforts to areas that require specific customizations.

To find out more, check out the Machine Learning Field Guide, by Chief Data Scientist Bruce Ho.