Friday, May 14, 2010

Data Vaulting: Store data,quality data

This blog aims to give a basic idea of what Data Vaulting is, what are its components and how it can be constructed.
Definition : The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of enterprise data warehouses.
The Data Vault is architected to meet the needs of the data warehouse, not to be confused with a data mart. It can double as an Operational Data Store (ODS) if the correct hardware and database engine is in place to support it. The Data Vault can handle massive sets of granular data in a smaller, more normalized physical space in comparison to both 3NF and star schema. The Data Vault is foundationally strong. It is based on the mathematical principles that support the normalized data models. Inside the Data Vault model are familiar structures that match traditional definitions of star schema and 3NF that include dimensions, many to many linkages and standard table structures. The differences lie in relationship representations, field structuring and granular time-based data storage. The modeling techniques built into the Data Vault have undergone years of design and testing across many different scenarios providing them with a solid foundational approach to data warehousing.
Data Vault Modeling is a method of designing a database to provide historical storage of data coming in from multiple operational systems with complete tracing of where all the data in the database came from. Data Vault’s philosophy is that all data is relevant data, even if it is “wrong.

Components: components, specifically the Hub, Link and Satellite Entities. The Data Vault design is focused around the functional areas of business with the Hub representing the primary key. The Link Entities provide transaction integration between the Hubs. The Satellite Entities provide the context of the Hub primary key. Each entity is designed to provide maximum flexibility and scalability while retaining most of the traditional skill sets of data modeling expertise.

Building a Data Vault : The Data Vault should be built as follows:
1. Model the Hubs. This requires an understanding of business keys and their usage across the designated scope.
2. Model the Links. Forming the relationships between the keys – formulating an understanding of how the business operates today in context to each business key.
3. Model the Satellites. Providing context to each of the business keys as well as the transactions (Links) that connect the Hubs together. This begins to provide the complete picture of the business.
4. Model the point-in-time tables. This is a Satellite derivative, of which the structure and definition is outside the scope of this blog.

I aim to get into few details in the upcoming blogs regarding the need for Data vaulting, E-vaulting etc…Watch this space…Omnitech with the help of its data centers provides Data Vaulting services.

No comments:

Post a Comment