A Better Way to Perform Quality Control on Statistical Analysis System Programs

By Timothy Harrington, DataCeutics Senior SAS® Programmer

Having worked on a very large number of Statistical Analysis System (SAS®) programs, I’ve discovered some better ways to do things, and am sharing some of these discoveries with you. SAS LOG files are scanned to monitor, evaluate, and improve the quality of data processes. Programmers are trained to look for ERRORS and WARNINGS in the data reports when these Quality Control (QC) scannings are performed, but all too often, they stop there. To conduct a more thorough QC, programmers must be very diligent to also look for many other less obvious discrepancies. The following is a checklist of some of the “red flags” that should also be searched for:

1. The value of _ERROR_ being 1 instead of 0 (Amongst the causes of runtime errors are an input data set, or variable name is not found, attempted division by zero, or an invalid subscript in a SCAN or SUBSTR function. The SAS automatic variable _N_ indicates the observation number where the erroroccurred.)

2. Any variable which is uninitialized (Uninitialized variables are undefined or unused variables. Suchan indication may be the result of a mistyped variable name, or a variable was defined in a LENGTH statement or given a FORMAT, but was never used, or a variable no longer needed was not deleted.)

3. Any repeats of BY variables in a MERGE (This is an attempt at a 'many to many' MERGE, which may produce a large Cartesian product. This is also an indication of one or more duplicate key values which should be unique.)

4. Implicit conversion (A message such as 'Numeric values were converted to character at ...' indicates an incorrect data type, or an improperly formatted PUT or INPUT. Values may be truncated, or be set to an incorrect numeric precision, or be set to missing.)

5. Generated missing values. (The message 'Missing values were generated at ...' may indicate a section of code or a SAS function has come across a missing value as input.)

These messages are usually the result of 'loose ends' and unaccounted for situations such as a temporary variable not being deleted after use, a failing SELECT value, not allowing for unexpected data such as a character in something which should be numeric, or a longer than expected text string exceeding its defined LENGTH or format.

Taking these extra steps when checking a log file demonstrates pro-activity and a professional approach. A client seeing a log file free of any questionable messages, not just no errors or warnings, is a satisfied client.

Because SAS programmers working on clinical reporting projects are often under intense pressure to meet tight timelines and to produce the best quality SAS code as per regulatory guidelines, quality assurance can get lesser priorities. Quality Control can also sometimes be under-funded, or overlooked, but cannot stress enough, that for a project to be successful, it must include effective Quality Control. Analytics are there, in part, to alert programmers to potential performance and quality issues, but programmers must be diligent in looking for these alerts and tending to the issues.

For more detailed information on this topic, please visit here- What Makes a Good QC Programmer? and download the White Paper presented at PharmaSUG 2018.