Managing time consistency for active data warehouse environments

发布时间:2011-06-30 05:18:26

Managing Time Consistency forActive Data Warehouse EnvironmentsRobert M. Bruckner, A M. TjoaInstitute of Software TechnologyVienna University of TechnologyFavoritenstr. 9-11 /188, A-1040 Vienna, Austria{bruckner, tjoa}@ifs.tuwien.ac.atAbstract. Real-world changes are generally discovered delayed by computersystems. The typical update patterns for traditional data warehouses on anovernight or even weekly basis enlarge this propagation delay until theinformation is available to knowledge workers. Typically, traditional datawarehouses focus on summarized data (at some level) rather than detail data.For active data warehouse environments, also detailed data about individualentities are required for checking the data conditions and triggering actions.Hence, keeping data current and consistent in that context is not an easy task.In this paper we present an approach for modeling conceptual time consistencyproblems and introduce a data model that deals with timely delays. It supportsknowledge workers, to find out, why (or why not) an active system respondedto a certain state of the data. Therefore the model enables analytical processingof detail data (enhanced by valid time) based on a knowledge state at aspecified instant of time. All states that were not yet knowable to the system atthat point in time are consistently ignored.1. IntroductionThe observation of real-world events by computer systems is characterized by a delay. This so-called propagation delay is the time interval it takes for a monitoring system to realize an occurred state change. While operational systems are among other things designed to meet well-specified response time requirements, the focus of data warehouses (DWHs) [8] is generally the strategic analysis of data integrated from heterogeneous systems. The data integration process is very complex and covers the acquisition, extraction, transformation (staging area), and loading of the data into the DWH. Traditionally there is no real-time connection between a DWH and its data sources, because the write-once read-many decision support characteristics would conflict with the continuous update workload of operational systems and cause poor analysis concurrency. Data loading is done during frequent update windows (e.g. once a week, every night), when the analysis capabilities of DWHs are not affected.Consequently, up till recently, timeliness requirements [7] (the relative availability of data to support a given process within the timetable required to perform the process) were restricted to mid-term or long-term. W. H. Inmon, known as the founder of data warehousing, cites time variance [8] as one of four silent characteristics of a DWH. Timeliness can be viewed as an aspect of data quality,

Managing time consistency for active data warehouse environments

相关推荐