Big Data Architecture Lifecycle

It’s a very exciting time to be in the data world, with new and groundbreaking technologies released seemingly every day. There is every temptation to pick up today’s new shiny, find an excuse to throw it into production, and call it an architecture. Of course, a more deliberate approach is required for long-term success – but that doesn’t mean that there isn’t a time and place to incorporate the newest technologies!

In this post, we take a look at the different phases of data architecture development: Plan, PoC, Prototype, Pilot, and Production. Formalizing this lifecycle, and the principles behind it, ensure that we deliver low-risk business value… and still get to play with the new shiny.

Phases of data architecture development

Plan

Before a single line of code is written, a single distribution downloaded, or the first line or box drawn on a whiteboard, we need to define and understand a data strategy and use that to derive business objectives. The best way to accomplish this? Start by locking business and technical stakeholders together in a room (it helps to be in the room with them). Success is defined by business value, and we need to combine strategic and tactical business goals with real-world technical and organizational constraints. Considerations such as platform scalability, data governance, and data dynamics are important – but all are in support of the actual business uses for that data.

This is not limited to new “green field” architectures – unless a business is a brand new startup still in the garage, there is data and there is a (perhaps organic, default) data architecture. This architecture can be assessed for points of friction, and then adjusted per business objectives.

PoC

As the business objectives are solidified, the architect will assemble likely combinations of technologies both well-known and, yes, shiny. All such candidate architectures have tradeoffs and unknowns – while the core technologies may be well-understood, it’s a given that the exact application of those technologies to specific, unique business objectives are, well, unique. Don’t believe anybody who says they have a one-size-fits-all solution! While some layers of data architecture are becoming common, if not standard, in 2016 modern data architecture is still very much about gluing together disparate components in specific ways.

To this end, certain riskier possibilities will be identified to apply approaches and technologies to a given business objective. Often a proof of concept (PoC) will be developed to validate the feasibility of these possibilities. This phase should be considered experimental, will often utilize representative “toy” problems, and failure is considered a useful (and not uncommon) outcome. It goes without saying that a PoC is not intended to be a production-quality system.

Prototype

Once areas of technical risk have been addressed with appropriate PoCs and an overall candidate architecture selected, the overall architecture should be tested against more representative use cases. Given the “glue” nature of data architectures, there is plenty of room for the unknown in the overall system even when the individual components are well-understood. A prototype may use manufactured, manageable data sets, but the data and the system should reflect realistic end-to-end business objectives. The prototype is also not intended to be production quality.

Pilot

When a prototype has demonstrated systemic feasibility, it is time to implement a pilot. A pilot is a full-quality production implementation of the architecture, limited in scope to a narrow (but complete) business objective. The Pilot should strategically be a high-win project, that is capable of providing real and visible value, even as a standalone system. Most organizations will use the pilot as means to earn buy-in from all stakeholders to move into full production, which typically impacts the entire organization.

Production

After an architecture has gone into full production, it should continue to be monitored and re-evaluated in an iterative process. Where is the architecture really performing well, and where are the weaker points? What new business objectives arise, and is any new functionality required to support them? Have any new technologies been released that may have impact on “weaker” points of the architecture? What’s different about the business today than when the architecture was originally planned?