Tag Archive for: Big Data

Cybersyn, an AI startup that is only one year out from its launch, has announced it has received a $62.9 million influx of capital from some of the most well-known investors in the world.

Cybersyn, a data startup founded in 2022 by Alex Izydorczyk, recently announced that it raised multi-million in capital from Snowflake Inc., Coatue Management, and Sequoia Capital.

According to a press release announcing the huge influx of cash, Cybersyn is “a company that sells proprietary economic datasets to investors, government agencies, and corporate clients.”

Cybersyn plans to use the funding to expand its small team — it has eight employees — and acquire additional proprietary data for expanded growth. In an interview with Reuters, Izydorczyk touched on the future of the company.

“We’re not trying to just be a data broker,” he said. “We’re trying to actually add value to the data we acquire and combine it.”

Among the big-name investors in the data space is Snowflake Inc., a publicly traded company and provider of cloud-based data warehousing solutions. It provides services such as data warehouse modernization, data exchange, and engineering and data science.

Christian Kleinerman, senior vice president of products at Snowflake, added the following:

“Cybersyn is a company that was built for this era of data sharing and moving with agility. We think of the marketplace as a core part of our offering. If someone is willing to be strategically aligned with us, we’re happy to invest.”

To date, Cybersyn has released both free and paid data sets on the Snowflake Marketplace. These data sets have potential buyers across various industries ranging from consumer goods to pharmaceuticals.

What Does the Scale of This Investment Mean for AI Startups?

Data has been described as “digital gold.” Some of the largest brands in the world, like Apple Inc., Meta Platforms Inc., and Amazon.com, Inc., all make much of their billions in the trillion-dollar data market. Amazon’s cloud storage does billions in revenue per year, supporting some of the largest companies in the world, and Meta has an advertising empire at their fingertips. This is likely why these AI startups like Cybersyn, whose solution traffic in Big Data, are having little trouble securing tens of millions despite being relatively new companies.

How BigRio Helps Bring Investors to AI Startups

There is no shortage of innovative young AI startups such as Cybersyn out there. Often the challenge is getting investors to see their potential and get them the capital they need to take their AI and data solutions to the next level.

BigRio prides itself on being a facilitator and incubator for such advances in leveraging AI to improve the digital world.

In fact, we like to think of ourselves as a “Shark Tank for AI.”

If you are familiar with the TV series, then you know that, basically, what they do is hyper-accelerate the most important part of the incubation process – visibility. You can’t get better visibility than getting out in front of celebrity investors and a TV audience of millions of viewers. Many entrepreneurs who have appeared on that program – even those who did not get picked up by the Sharks – succeeded because others who were interested in their concepts saw them on the show.

At BigRio, we may not have a TV audience, but we can do the same. We have the expertise to not only weed out the companies that are not ready for the market, as the sharks on the TV show do, but also mentor and get those that we feel are readily noticed by the right people in the AI investment community.

You can read much more about how AI is redefining Big Data in my new book Quantum Care: A Deep Dive into AI for Health Delivery and Research. While the book’s primary focus is on healthcare delivery, it also takes a deep dive into AI in general, with specific chapters on the marriage of AI and data technologies.

Rohit Mahajan is a Managing Partner with BigRio. He has particular expertise in the development and design of innovative solutions for clients in Healthcare, Financial Services, Retail, Automotive, Manufacturing, and other industry segments.

BigRio is a technology consulting firm empowering data to drive innovation and advanced AI. We specialize in cutting-edge Big Data, Machine Learning, and Custom Software strategy, analysis, architecture, and implementation solutions. If you would like to benefit from our expertise in these areas or if you have further questions on the content of this article, please do not hesitate to contact us.

When I was in graduate school, I designed a construction site of the future. It was in collaboration with Texas Instruments in the late 90s. The big innovation, at the time, was RFID (radio-frequency identification). Not that RFID was new. In fact, it has been around since World War II where it was used to identify allied planes. After the war, it made its way into industry through anti-theft applications. In the 80s, a group of scientists from Los Alamos National Laboratory formed a company using RFID for toll payment systems (still in use today). A separate group of scientists there also created a system for tracking medication management in livestock. From here it made its way into multiple other applications and began to proliferate.

RFID got a boost in 1999 when two MIT professors, David Brock and Sanjay Sarma, reversed the trend of adding more memory and more functionality to the tags and stripped them down to a low-cost, very simple microchip. The data gleaned from the chip was stored in a database and was accessible via the web. This was right at the time that the wireless web emerged (good old CDPD) as well, which really bolstered widespread adoption. This also precipitated funding from large companies, like Procter & Gamble and Gillette (this was before P&G acquired Gillette), to institute the Auto-ID Center at MIT, which furthered the creation of standards and cemented RFID as an invaluable weapon for companies, especially those with complex supply chains.

OK, as you can tell, RFID has a special place in my heart. I even patented the idea of marrying RFID with images, but that is another story. Anyway, up to this point you’ve probably decided this is a post about RFID, but it’s not. It’s a post about RFID to IoT (Internet of Things). The term Internet of Things (IoT) was first coined by British entrepreneur Kevin Ashton in 1999 while working at Auto-ID Labs, specifically referring to a global network of objects connected by RFID. But RFID is just one type of sensor and there are numerous sensors out there. I like this definition from Wikipedia:

In the broadest definition, a sensor is an electronic component, module, or subsystem whose purpose is to detect events or changes in its environment and send the information to other electronics, frequently a computer processor. A sensor is always used with other electronics, whether as simple as a light or as complex as a computer.

Sensors have been around for quite some time in various forms. The first thermostat came to market in 1883, and many consider this the first modern, manmade sensor. Infrared sensors have been around since the late 1940s, even though they’ve really only recently entered the popular nomenclature. Motion detectors have been in use for a number of years as well. Originally invented by Heinrich Hertz in the late 1800s, they were advanced in World War II in the form of radar technology. There are numerous other sensors: biotech, chemical, natural (e.g. heat and pressure), sonar, infrared, microwave, and silicon sensors to name a few.

According to Gartner, there are currently 8 Billion IoT Units worldwide and there will be 20 Billion by 2020. Suffice to say there are numerous sources of data to track “things” within an organization and throughout supply chains. There are also numerous complexities to managing all of these sensors, the data they generate, and the actionable intelligence that is extracted and needs to be acted on. Some major obstacles are networks with time delays, switching topologies, density of units in a bounded region, and metadata management (especially across trading partners and customers). These are all challenges we at BigR.io have helped customers work through and resolve. A great example is our Predictive Maintenance offering.

Let’s get back to RFID to IoT. There is a tight coupling because the IP address of the unit needs to be supplemented with other information about the thing (for example, condition, context, location, security, etc). RFID and other sensors working in unison can provide this supplemental information. This marriage enables advanced analytics including the ability to make predictions. Large sensor networks must be properly architected to enable effective sensor fusion. Machine Learning helps take IoT to the next level of sophistication for predictions and automation for fixes and can help figure out when and where every ”thing” fits in the ecosystem that they play in. A proper IoT agent should monitor the health of the systems individually and in relation to other parts. Consensus filters will help in the analysis of the convergence, noise propagation reduction, and ability to track fast signals.

There are other factors that play into why IoT is so hot right now: the whole Big Data phenomenon has lent itself to the growth, endless compute power has served as a foundation by which advanced applications using IoT can run, and the Machine Learning libraries have been democratized by companies like Google, Facebook, and Microsoft. In general, Machine Learning thrives when mounds of data are available. However, storing all data is cost prohibitive and there is so much data being generated that most companies opt to only store bits of critical data. Some companies only store the data to freeze it from failures. You may not want to store all data, but you don’t want to lose “metadata,” or the key information that the data is trying to tell you, whether from the sensor itself or indirectly through neighboring sensors. I had a stint where we supported Federal and Defense-related sensor fusion initiatives and I picked up a handy classification of data:

  • Data
  • Information
  • Knowledge
  • Intelligence

The flow is moving the metadata being generated down the line into information → knowledge → intelligence that can be acted upon.

There also exists the ABCs of Data Context:

[A]pplication Context: Describes how raw bits are interpreted for use.

[B]ehavioral Context: Information about how data was created and used by real people or systems.

[C]hange Over Time: The version history of the other two forms of data context.

Data context plays a major role in harnessing the power of an IoT network. As we progress to smarter networks, more sophisticated sensors, and artificial intelligence that manages our “things,” the architecture of your infrastructure (enterprise data hub), the cultivation and management of your data flows, and the analytics automation that rides on top of everything become critical for day-to-day operations. The good news is that if this is all done properly, you will reap the rewards of thing harmony (coined here first folks).

Please visit our Deep Learning Neural Networks for IoT white paper for a more technical slant.

It’s a very exciting time to be in the data world, with new and groundbreaking technologies released seemingly every day. There is every temptation to pick up today’s new shiny, find an excuse to throw it into production, and call it an architecture. Of course, a more deliberate approach is required for long-term success – but that doesn’t mean that there isn’t a time and place to incorporate the newest technologies!

In this post, we take a look at the different phases of data architecture development: Plan, PoC, Prototype, Pilot, and Production. Formalizing this lifecycle, and the principles behind it, ensure that we deliver low-risk business value… and still get to play with the new shiny.

Phases of data architecture development

Plan

Before a single line of code is written, a single distribution downloaded, or the first line or box drawn on a whiteboard, we need to define and understand a data strategy and use that to derive business objectives. The best way to accomplish this? Start by locking business and technical stakeholders together in a room (it helps to be in the room with them). Success is defined by business value, and we need to combine strategic and tactical business goals with real-world technical and organizational constraints. Considerations such as platform scalability, data governance, and data dynamics are important – but all are in support of the actual business uses for that data.

This is not limited to new “green field” architectures – unless a business is a brand new startup still in the garage, there is data and there is a (perhaps organic, default) data architecture. This architecture can be assessed for points of friction, and then adjusted per business objectives.

PoC

As the business objectives are solidified, the architect will assemble likely combinations of technologies both well-known and, yes, shiny. All such candidate architectures have tradeoffs and unknowns – while the core technologies may be well-understood, it’s a given that the exact application of those technologies to specific, unique business objectives are, well, unique. Don’t believe anybody who says they have a one-size-fits-all solution! While some layers of data architecture are becoming common, if not standard, in 2016 modern data architecture is still very much about gluing together disparate components in specific ways.

To this end, certain riskier possibilities will be identified to apply approaches and technologies to a given business objective. Often a proof of concept (PoC) will be developed to validate the feasibility of these possibilities. This phase should be considered experimental, will often utilize representative “toy” problems, and failure is considered a useful (and not uncommon) outcome. It goes without saying that a PoC is not intended to be a production-quality system.

Prototype

Once areas of technical risk have been addressed with appropriate PoCs and an overall candidate architecture selected, the overall architecture should be tested against more representative use cases. Given the “glue” nature of data architectures, there is plenty of room for the unknown in the overall system even when the individual components are well-understood. A prototype may use manufactured, manageable data sets, but the data and the system should reflect realistic end-to-end business objectives. The prototype is also not intended to be production quality.

Pilot

When a prototype has demonstrated systemic feasibility, it is time to implement a pilot. A pilot is a full-quality production implementation of the architecture, limited in scope to a narrow (but complete) business objective. The Pilot should strategically be a high-win project, that is capable of providing real and visible value, even as a standalone system. Most organizations will use the pilot as means to earn buy-in from all stakeholders to move into full production, which typically impacts the entire organization.

Production

After an architecture has gone into full production, it should continue to be monitored and re-evaluated in an iterative process. Where is the architecture really performing well, and where are the weaker points? What new business objectives arise, and is any new functionality required to support them? Have any new technologies been released that may have impact on “weaker” points of the architecture? What’s different about the business today than when the architecture was originally planned?

 

A recent WSJ article echoes an FTC report released last Wednesday warning of the possible consequences of bias in Big Data applications. The article identifies a number of valid concerns around privacy, equal opportunity, and accuracy. It also rightly hints at possible positive consequences as well.

For example, they quote cases where people judged poor credit risks by conventional means may receive loans as a result of big-data techniques. Good news for those people, and time will tell whether the lenders identified an underserved viable market, or whether bias simply caused them to make a poor investment. All models, including traditional analyses, will have error and ultimately we want to reduce both false positives and false negatives.

So we know that our models will have error, and the theme of this article is that a significant part of that error comes in the form of bias. It is a poor assumption that an analysis of any single data set – social media is a popular case – represents the whole population. Do people of all ages, nationalities, races, income levels, use social media in the same proportion as the general population? Probably not.

So how can we make this work to advantage?

  1. Consider bias as a first-cut classification. A common application of big data techniques is to classify large numbers of people into specific, targeted subgroups. We get our first course-grained categorization for free.
  2. Use the bias to select additional complementary data sets. If you understand the bias in your current data set, then you can strategically select additional data sets that give the best bang-for-the-buck in an effort to broadly analyze the general population. Calibrate your aggregate model by combining complementary data sets.
  3. Monitor production models. As the article observes, blind trust in correlations can be dangerous. Still, correlations can represent opportunities to be exploited. The key to safe utilization without a solid understanding of root cause is to assume those opportunities are temporary. Monitor their performance, and blow whistles as soon as the results begin to deviate from expectations.

George Box had it right: all models are wrong, but some are useful!
Hypothetical Example of complementary data sets
Example of complementary data sets (hypothetical, and for illustration only!).