| Although this book is titled Cody’s Data Cleaning Techniques Using SAS, I hope that it is more than that. It is my hope that not only will you discover ways to detect data errors, but you will also be exposed to some DATA step programming techniques and SAS procedures that might be new to you.
I have been teaching a two-day data cleaning workshop for SAS, based on the first edition of this book, for several years. I have thoroughly enjoyed traveling to interesting places and meeting other SAS programmers who have a need to find and fix errors in their data. This experience has also helped me identify techniques that other SAS users will find useful.
There have been some significant changes in SAS since the first edition was published— specifically, SAS®9. SAS®9 includes many new functions that make the task of finding and correcting data errors much easier. In addition, SAS®9 allows you to create integrity constraints and audit trails. Integrity constraints are rules about your data that are stored in the data descriptor portion of a SAS data set. These rules prevent data that violates any of these constraints to be rejected when you try to add it to an existing data set. In addition, SAS can create an audit trail data set that shows which new observations were added and which observations were rejected, along with the reason for their rejection.
So, besides a new chapter on integrity constraints and audit trails, I have added several macros that might make your data cleaning tasks easier. I also corrected or removed several programs that the compulsive programmer in me could not allow to remain.
Finally, a short description of a SAS product called DataFlux® was added. DataFlux is a comprehensive collection of programs, with an interactive front-end, that perform many advanced data cleaning techniques such as address standardization and fuzzy matching. |