| A data warehouse consists of a set of materialized views that contain derived data from several data sources. Materialized views are beneficial because they allow efficient retrieval of summary data. However, materialized views need to be refreshed periodically in order to avoid staleness. During a materialized view refresh only changes to the base tables are transmitted from the data sources to the data warehouse, where the data warehouse should contain the data from the base tables that is relevant to the refresh. In this paper we explore how this additional data, which is commonly referred to as auxiliary views, can be reduced in size. Novel algorithms that exploit non-trivial integrity constraints and that can handle materialized views defined over queries with grouping and aggregation are presented.
A data warehouse contains aggregated data derived from a number of data sources and is usually used by OnLine Analytical Processing (OLAP) tools and data mining tools for the purpose of decision making (see Figure 1 and [GM95]).
The data sources consist of several databases, which usually contain huge amounts of data (e.g., the day-to-day transactions of a store chain). Conversely, materialized views (MVs) contain summary data compiled from several data sources. The main challenge in implementing the data warehouse architecture is keeping the materialized views up-to-date. |
|
|
|