- View data as a shared asset. A modern data architecture needs to eliminate departmental data silos and give all stakeholders a complete view of the company: 360 degrees of customer insights and the ability to correlate valuable data signals from all business functions, like manufacturing and logistics.
- Provide user interfaces for consuming data. Beyond breaking down silos, modern data architectures need to provide interfaces that make it easy for users to consume data using tools fit for their jobs. Data must be able to freely move to and from data warehouses, data lakes, and data marts, and interfaces must make it easy for users to consume that data.
- Ensure security and access controls. Modern data architectures must be designed for security, and they must support data policies and access controls directly on the raw data, not in a web of downstream data stores and applications.
- Establish a common vocabulary. Shared data assets, such as product catalogs, fiscal calendar dimensions, and KPI definitions, require a common vocabulary to help avoid disputes during analysis.
- Curate the data. Invest in core functions that perform data curation such as modeling important relationships, cleansing raw data, and curating key dimensions and measures.
- Optimize data flows for agility. Limit the times data must be moved to reduce cost, increase data freshness, and optimize enterprise agility.
Data architecture components
A modern data architecture consists of the following components, according to IT consulting firm BMC:
- Data pipelines. A data pipeline is the process in which data is collected, moved, and refined. It includes data collection, refinement, storage, analysis, and delivery.
- Cloud storage. Not all data architectures leverage cloud storage, but many modern data architectures use public, private, or hybrid clouds to provide agility.
- Cloud computing. In addition to using cloud for storage, many modern data architectures make use of cloud computing to analyze and manage data.
- Application programming interfaces. Modern data architectures use APIs to make it easy to expose and share data.
- AI and machine learning models. AI and ML are used to automate systems for tasks such as data collection and labeling. At the same time, modern data architectures can help organizations unlock the ability to leverage AI and ML at scale.
- Data streaming. Data streaming is data flowing continuously from a source to a destination for processing and analysis in real-time or near real-time.
- Container orchestration. A container orchestration system, such as open-source Kubernetes, is often used to automate software deployment, scaling, and management.
- Real-time analytics. The goal of many modern data architectures is to deliver real-time analytics — the ability to perform analytics on new data as it arrives in the environment.
Data architecture vs. data modeling
According to Data Management Book of Knowledge (DMBOK 2), data architecture defines the blueprint for managing data assets as aligning with organizational strategy to establish strategic data requirements and designs to meet those requirements. On the other hand, DMBOK 2 defines data modeling as, “the process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model.”
While both data architecture and data modeling seek to bridge the gap between business goals and technology, data architecture is about the macro view that seeks to understand and support the relationships between an organization’s functions, technology, and data types. Data modeling takes a more focused view of specific systems or business cases.