Data lakes

The rapid spread of unstructured data is both an opportunity and a challenge, as it provides the possibility of new insights at the cost of changing infrastructure needs and new approaches to managing, storing, and manipulating that data. The real value of Big Data lies in combining historical structured data with this new unstructured data; this combination results in better insights, more powerful analytics, and improved business performance.

Data Lakes - Raw Data Storage and Management
Managing Unstructured Data with Data Lakes

Data Lakes take a store-all approach to data, providing a repository that retains all incoming data in its original raw format. This data can include both structured and unstructured sources, and the Data Lake architecture supports ingest and storage of the largest and fastest data sets:

  • Log files
  • Sensor data
  • IoT device streams
  • Media streams
  • Genomic data

Traditional Data Warehousing approaches are a standard approach for obtaining business intelligence from well-known classes of structured data. Data Warehouses can work well when paired with a Data Lake — for example, using Enterprise Data Hub architecture. But a Data Lake offers a number of flexibility benefits in addition to the incorporation of unstructured data:

  • It isn’t possible to know, at acquisition time, all the questions that may be asked of a data set in the future.
  • It isn’t possible to know, at design­ time, all the data sets that may be acquired in the future.
  • The schema-on-read paradigm allows the application of different semantic and schematic models to be applied to the same data set, providing valuable context and tailored points-of-view to that data.
  • New data can be ingested inexpensively and quickly without an up­front investment in ETL transformation.
  • Support for Data-as-a-Service (DaaS), exposing curated data sets to organizational consumption.

BigR.io can work with your team to design a Data Lake that can grow with your organization’s data needs and utilize the investment that you’ve already made in existing Data Warehouses and other structured storage and analytics systems. Design parameters include:

Upgrading Data Warehouses with Data Lakes
  • Size, scalability, and technology and infrastructure choices for data storage.
  • Ingest and egress processes and transfer validation.
  • Harmonious handling of batch, micro­batch, and streaming data.
  • Discovery and exploration.
  • Governance and security.
  • Context: apples-to-apples schema and semantics for consistent analysis.
  • Data lineage and lifecycle.
  • Integration with existing data systems.

A Data Lake may help your organization reduce cost and time-to-insight for new analytics, and support exponential data growth. If you are considering building a Data Lake, or recognize some of these pains and benefits, talk to BigR.io about how we can help.

Scroll top of the page