Cody’s Data Cleaning Techniques Using SAS, Second Edition
Product Details
Perfect Paperback: 272 pages
公卫家园 Publisher: SAS Press; 2nd edition (May 13, 2008)
Language: English
ISBN-10: 1599946599
ISBN-13: 978-1599946597
Product Dimensions: 9 x 7.5 x 0.9 inches
Product Description
Thoroughly updated for SAS 9, this second edition addresses tasks that nearly every SAS programmer needs to do - that is, make sure that data errors are located and corrected. Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify for your own special data cleaning needs. Each topic is developed through specific examples, and every program and macro is explained in detail.
公卫考场 You'll learn how to
find and correct errors in character and numeric values
develop programming techniques related to dates and missing values
use SQL approaches to data cleaning
develop techniques for correcting your data errors
use integrity constraints and audit trails to prevent errors from being added to a clean data set 公卫家园
Novice and experienced SAS users will discover ways to detect and correct data errors while learning how to apply DATA step programming techniques and SAS procedures.
Review
"Clean data is critical to accurate analysis. By implementing programs and macros in Cody's Data Cleaning Techniques Using SAS, Second Edition, you can achieve the goal of a clean SAS data set. Easy-to-follow examples identify invalid, missing, or out-of-range data. Also included are chapters working with dates and matching Primary Key (Identifier) variables across multiple files. This Second Edition incorporates new features in SAS9.
公卫考场 This book is a valuable tool for all SAS users to prepare data for analysis." --Karol Katz, MS, Programmer/Analyst, Yale University School of Medicine
"Many veteran coders become comfortable - sometimes too comfortable - with the coding techniques that they learned early on in their careers. They believe that there is no need to adopt enhanced features since their old skills continue to provide an adequate return. Dr. Ron Cody is NOT one of those people; his published works on SAS embrace the changes that have occurred in the SAS language over the years. Some of his books, most notably SAS Functions by Example and Learning SAS by Example: A Programmer's Guide, are benchmarks by which other books should be measured. He's now taken one of his earlier works, Cody's Data Cleaning Techniques Using SAS Software and updated it to take advantage of what SAS has introduced in the 9 years since the original version was published. 公卫家园
Folks who purchased his original volume should be prepared to put their first copy away and begin to use the newer work at their earliest opportunity." --Andrew T. Kuligowski, SouthEast SAS Users Group
Contents
1 Checking Values of Character Variables
公卫家园 2 Checking Values of Numeric Variables
3 Checking for Missing Values
4 Working with Dates
5 Looking for Duplicates and "n" Observations per Subject
6 Working with Multiple Files
7 Double Entry and Verification (PROC COMPARE)
8 Some PROC SQL Solutions to Data Cleaning 公卫考场
9 Correcting Errors
10 Creating Integrity Constraints and Audit Trails
11 DataFlux and dfPower Studio[/color]
Table of Contents
List of Programs ix
Preface xv
Acknowledgments xvii
Checking Values of Character Variables
公卫人
Introduction 1
Using PROC FREQ to List Values 1
Description of the Raw Data File PATIENTS.TXT 2
Using a DATA Step to Check for Invalid Values 7
Describing the VERIFY, TRIM, MISSING, and NOTDIGIT Functions 9
Using PROC PRINT with a WHERE Statement to List Invalid Values 13
公卫考场
Using Formats to Check for Invalid Values 15
Using Informats to Remove Invalid Values 18
Che Checking Values of Numeric Variables
Introduction 23
Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look
for Outliers 24
Using an ODS SELECT Statement to List Extreme Values 34
公卫家园
Using PROC UNIVARIATE Options to List More Extreme Observations 35
Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage 37
Using PROC RANK to Look for Highest and Lowest Values by Percentage 43
Presenting a Program to List the Highest and Lowest Ten Values 47 公卫论坛
Presenting a Macro to List the Highest and Lowest "n" Values 50
Using PROC PRINT with a WHERE Statement to List Invalid Data Values 52
Using a DATA Step to Check for Out-of-Range Values 54
Identifying Invalid Values versus Missing Values 55Listing Invalid (Character) Values in the Error Report 57
公卫论坛
Creating a Macro for Range Checking 60
Checking Ranges for Several Variables 62
Using Formats to Check for Invalid Values 66
Using Informats to Filter Invalid Values 68
Checking a Range Using an Algorithm Based on Standard Deviation 71
Detecting Outliers Based on a Trimmed Mean and Standard Deviation 73 公卫百科
Presenting a Macro Based on Trimmed Statistics 76
Using the TRIM Option of PROC UNIVARIATE and ODS to Compute
Trimmed Statistics 80
Checking a Range Based on the Interquartile Range 86
Checking for Missing Values
Introduction 91
Inspecting the SAS Log 91 公卫家园
Using PROC MEANS and PROC FREQ to Count Missing Values 93
Using DATA Step Approaches to Identify and Count Missing Values 96
Searching for a Specific Numeric Value 100
Creating a Macro to Search for Specific Numeric Values 102
Working with Dates 公卫人
Introduction 105
Checking Ranges for Dates (Using a DATA Step) 106
Checking Ranges for Dates (Using PROC PRINT) 107
Checking for Invalid Dates 108
Working with Dates in Nonstandard Form 111
Creating a SAS Date When the Day of the Month Is Missing 113 公卫论坛
Suspending Error Checking for Known Invalid Dates 114
Checking a Range Using an Algorithm Based on the Standard Deviation 169
Checking for Missing Values 170
Range Checking for Dates 172
Checking for Duplicates 173
Identifying Subjects with "n" Observations Each 174 公卫人
Checking for an ID in Each of Two Files 174
More Complicated Multi-File Rules 176
Corr Correcting Errors
Introduction 181
Hardcoding Corrections 181
Describing Named Input 182
Reviewing the UPDATE Statement 184
Corr Creating Integrity Constraints and Audit Trails
公卫论坛
Introducing SAS Integrity Constraints 187
Demonstrating General Integrity Constraints 188
Deleting an Integrity Constraint Using PROC DATASETS 193
Creating an Audit Trail Data Set 193
Demonstrating an Integrity Constraint Involving More than One Variable 200 公卫论坛
Demonstrating a Referential Constraint 202
Attempting to Delete a Primary Key When a Foreign Key Still Exists 205
Attempting to Add a Name to the Child Data Set 207
Demonstrating the Cascade Feature of a Referential Constraint 208
Demonstrating the SET NULL Feature of a Referential Constraint 210
公卫人
Demonstrating How to Delete a Referential Constraint 211
附件列表
您所在的用户组无法下载或查看附件
词条内容仅供参考,如果您需要解决具体问题
(尤其在法律、医学等领域),建议您咨询相关领域专业人士。
如果您认为本词条还有待完善,请 编辑
上一篇 Analysis of Observational Health Care Data Using SAS 下一篇 Data Preparation for Analytics Using SAS