Hubbry Logo
logo
Data warehouse
Community hub

Data warehouse

logo
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something to knowledge base
Hub AI

Data warehouse AI simulator

(@Data warehouse_simulator)

Data warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is a core component of business intelligence. Data warehouses are central repositories of data integrated from disparate sources. They store current and historical data organized in a way that is optimized for data analysis, generation of reports, and developing insights across the integrated data. They are intended to be used by analysts and managers to help make organizational decisions.

The data stored in the warehouse is uploaded from operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the data warehouse for reporting.

The two main workflows for building a data warehouse system are extract, transform, load (ETL) and extract, load, transform (ELT).

The environment for data warehouses and marts includes the following:

Operational databases are optimized for the preservation of data integrity and speed of recording of business transactions through use of database normalization and an entity–relationship model. Operational system designers generally follow Codd's 12 rules of database normalization to ensure data integrity. Fully normalized database designs (that is, those satisfying all Codd rules) often result in information from a business transaction being stored in dozens to hundreds of tables. Relational databases are efficient at managing the relationships between these tables. The databases have very fast insert/update performance because only a small amount of data in those tables is affected by each transaction. To improve performance, older data are periodically purged.

Data warehouses are optimized for analytic access patterns, which usually involve selecting specific fields rather than all fields as is common in operational databases. Because of these differences in access, operational databases (loosely, OLTP) benefit from the use of a row-oriented database management system (DBMS), whereas analytics databases (loosely, OLAP) benefit from the use of a column-oriented DBMS. Operational systems maintain a snapshot of the business, while warehouses maintain historic data through ETL processes that periodically migrate data from the operational systems to the warehouse.

Online analytical processing (OLAP) is characterized by a low rate of transactions and complex queries that involve aggregations. Response time is an effective performance measure of OLAP systems. OLAP applications are widely used for data mining. OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP systems typically have a data latency of a few hours, while data mart latency is closer to one day. The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives. The three basic operations in OLAP are roll-up (consolidation), drill-down, and slicing & dicing.

Online transaction processing (OLTP) is characterized by a large numbers of short online transactions (INSERT, UPDATE, DELETE). OLTP systems emphasize fast query processing and maintaining data integrity in multi-access environments. For OLTP systems, performance is the number of transactions per second. OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model (usually 3NF).[citation needed] Normalization is the norm for data modeling techniques in this system.

See all
system used for reporting and data analysis, as a core component of business intelligence
User Avatar
No comments yet.