| The First International Workshop on Data Quality in Collaborative Information Systems was held in conjunction with the 13th International Conference on Database Systems for Advanced Applications, 19-22nd March, 2008, in New Delhi, India. The present volume contains the texts for four accepted papers and two invited papers presented at the workshop, as well as one additional paper presented at the co-located workshop on analysis of high dimensional discrete data.
Poor data quality is known to compromise the credibility and efficiency of commercial as well as public endeavours. Several developments from industry as well as academia have contributed significantly towards addressing the problem. These typically include analysts and practitioners who have contributed to the design of strategies and methodologies for data governance; solution architects including software vendors who have contributed towards appropriate system architectures that promote data integration and; and data experts who have contributed to data quality problems such as duplicate detection, identification of outliers, consistency checking and many more through the use of computational techniques. The attainment of true data quality lies at the convergence of the three aspects, namely organizational, architectural and computational.
At the same time, importance of managing data quality has increased manifold in today’s global information sharing environments, as the diversity of sources, formats and volume of data grows. In this workshop we target data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish. Collaborative settings are evident in enterprise systems, where partner/customer data may pollute enterprise data bases raising the need for data source attribution, as well as in scientific applications, where data lineage across long running collaborative scientific processes needs to be established. Collaborative settings thus warrant a pipeline of data quality methods and techniques that commence with (source) data assessment, data cleansing, methods for sustained quality, integration and linkage, and eventually ability for audit and attribution. |