cdisc

Verify SAS datasets against CDISC standards

%cdisc (datlib = data library, 
        datname = dataset name);

Where

Is Type...

And represents...

datlib C (200) Library name reference the location where the dataset resides.
datname C (200) Name of the dataset to be verified.  Wild cards can be specified such as ae*.  You can also specify datasets individually separated by spaces.

Details
This tool verifies SAS datasets against CDISC submission data domain models version 3.1.1 as specified at: SDTM--3.1.1ImplementationGuide.pdf.  It is intended to catch deviations of standards including the following:

  1. Required Fields: Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
  2. Subject Variable: (4.1.2.3) For variable names, labels and comments, use the word "Subject" when referring to "patients" or "healthy volunteer".
  3. Variable Length: (4.1.2.1) Variable names are limited to 8 characters with labels up to 40 characters.
  4. Yes/No: (4.1.3.7) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
  5. Date Time Format: (4.1.4.1) Date or Datetime must be in ISO 8601 format.
  6. Study Day Variable: (4.1.4.4) Study day variable has the name ---DY.
  7. Variable Names: (3.2.3) If any variable names used matches CDISC variables, the associated label has to match.
  8. Variable Label: (3.2.3) If any variable labels match that of CDISC labels, the associated variable has to match.
  9. Variable Type: (3.2.3) If any variables match that of CDISC variables, the associated type has to match.
  10. Dataset Names: (3.2.3) If any of the dataset names match CDISC, the associated data label has to match.
  11. Dataset Labels: (3.2.3) If any of the dataset label match CDISC, the associated dataset name  has to match.
  12. Abbreviations: (10.3.1) (10.4) The following abbreviations are suggested for variable names and data sets.
     

    Acronym

    Descriptive Text

    AE Adverse Events
    AU Autopsy
    BM Bone Mineral Density (BMD) Data
    BR Biopsy
    CM Concomitant Meds
    CO Comments
    DA Drug Accountability
    DC Disease Characteristics
    DM Demographics
    DS Disposition
    DV Protocol Deviations
    EE EEG
    EG EEG
    EX Exposure
    HU Healthcare Resource Utilization
    IE Inclusion/Exclusion
    IM Imaging
    LB Laboratory Data
    MB Microbiology Specimens
    MH Medical History
    ML Meal Data
    MS Microbiology Susceptibility
    OM Organ Measurements
    PC PK Concentration
    PE Physical Exam
    PP PK Parameters
    PG Pharmacogenomics
    QS Questionnaires
    SC Subject Characteristics
    SE Subject Elements
    SG Surgery
    SK Skin Test
    SL Sleep (Polysomnography) Data
    SL Signs and Symptoms
    ST Stress (Exercise) Test Data
    SU Substance Use
    SV Subject Visits
    TA Trial Arms
    TE Trial Elements
    TI Trial Inclusion/Exclusion Criteria
    TS Trial Summary
    TV Trial Visits
    VS Vital Signs
    CAN ACTION
    ADJ ADJUSTMENT
    ADJ ANALYSIS DATASET
    BL BASELINE
    BRTH BIRTH
    BOD BODY
    CAN CANCER
    CAT CATEGORY
    C CHARACTER
    CND CONDITION
    CLAS CLASS
    CD CODE
    COM COMMENT
    CON CONCOMITANT
    CONG CONGENTTAL
    DTC DATE TIME - CHARACTER
    DY DAY
    DTH DEATH
    DECOD DECODE
    DRV DERIVED
    DESC DESCRIPTION
    DISAB DISABILITY
    DOS DOSE
    DOS DOSAGE
    DOSE DOSE
    DOSE DOSAGE
    DUR DURATION
    EL ELAPSED
    ET ELEMENT
    EM EMERGENT
    END END
    EN END
    ETHNIC ETHNICITY
    X EXTERNAL
    EVAL EVALUATOR
    EVL EVALUATION
    FAST FASTING
    FN FILENAME
    FL FLAG
    FRM FORMULATION, FORM
    FREQ FREQUENCY
    GR GRADE
    GRP GROUP
    HI HIGHER LIMIT
    HOSP HOSPITALIZATION
    ID IDENTIFIER
    INDC INDICATION
    INDC INDICATOR
    INT INTERVAL
    INTP INTERPRETATION
    INV INVESTIGATOR
    LIFE LIFE-THREATENING
    LOC LOCATION
    LOINC LOINC CODE
    LO LOWER LIMIT
    MIE MEDICALLY-IMPORTANT EVENT
    NAM NAME
    NST NON-STUDY THERAPY
    NR NORMAL RANGE
    ND NOT DONE
    NUM NUMBER
    N NUMERIC
    ONGO ONGOING
    ORD ORDER
    ORIG ORIGIN
    OR ORIGINAL
    OTH OTHER
    O OTHER
    OUT OUTCOME
    OD OVERDOSE
    PARM PARAMETER
    PATT PATTERN
    POP POPULATION
    POS POSITION
    QUAL QUALIFIER
    REAS REASON
    REF REFERENCE
    RF REFERENCE
    RGM REGIMEN
    REL RELATED
    R RELATED
    REL RELATIONSHIP
    R RELATIONSHIP
    RES RESULT
    RL RULE
    SEQ SEQUENCE
    S SERIOUS
    SER SERIOUS
    SEV SEVERITY
    SPEC SPECIMEN
    SPC SPECIMEN
    SPEC SPONSOR
    SPC SPONSOR
    ST STANDARD
    STD STANDARD
    ST START
    STD START
    STAT STATUS
    SCAT SUBCATEGORY
    SUBJ SUBJECT
    SUPP SUPPLEMENTAL
    SYS SYSTEM
    TXT TEXT
    TM TIME
    TPT TIMEPOINT
    TOT TOTAL
    TOX TOXICITY
    TRANS TRANSITION
    TRT TREATMENT
    U UNIT
    U UNIQUE
    UP UNPLANNED
    VAR VARIABLE
    VAL VALUE
    V VEHICLE
     
  13. SEQ Values: When the --SEQ variable is used, it must have unique values for each USUBJID within each domain.
  14. Label Casing: For Dataset labels and variable labels, all non trivial words (more than three characters) must start with a capital letter with the rest of the characters lowercase.
  15. Required Values: (4.1.1.5) For required fields such as the ones specified in number 1, check to see if there are values.  If there are any missing, values, report the observation number where it is missing.
  16. Similar Parenthesis:  For labels with matching values inside parenthesis such as (Yes/No) within the same dataset, it will check to see if the variables have the same type and length.  If not, it will report the differences.
  17. Required Variables: (4.1.1.5) A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables should always be included in the dataset and cannot be null for any record.
  18. Expected Variable: (4.1.1.5) An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Columns for Expected variables are assumed to be present in each submitted dataset even if some values are null.
  19. Zero Rows: (IR4000) Identifies domain table that has zero rows and therefore contains no data.
  20. Empty Value: (IR4001)  Identifies a null (empty) value found in a column where (Standard) Core attribute is 'Req' which means required.
  21. No Record Baseline: (IR4005) For Findings domains, this identifies subjects where there are no records with a value of 'Y' in the baseline flag variable (Baseline Flag).
  22. Consistent Lab Values: (IR4006) For LAB domains, this identifies Short Name of Measurement, Test or Examination values where standard units value (Standard Units) is not consistent across all records.
  23. MedDRA Term Mismatch: (IR4007) For AE domains, this identifies records where the value for the Preferred Term could not be found in the MedDRA dictionary.
  24. Serious AE: (IR4008) For AE domains, this identifies records where Serious Event='Y' but none of Involves Cancer, Congenital Anomaly or Birth Defect, Persist or Signif Disability/Incapacity, Results in Death, Requires or Prolongs Hospitalization, Is Life Threatening, Other Medically Important Serious Event, or Occurred with Overdose equals 'Y'.
  25. Unit and Status Null: (IR4009) Identifies records where Result or Finding in Original Units and Status both have a value, or where both are null.
  26. Visit Number Decimal : (IR4010) Identifies records where the value for Visit Number is formatted to more than two decimal places.
  27. DM Arm Code: (IR4011) For DM domain, this identifies records that violate the condition [If Arm Code='SCRNFAIL' then Description of Arm must equal 'Screen Failure', and vice versa].
  28. TA Arm Code: (IR4012) For TA domain, this identifies records that violate the condition [If Arm Code='SCRNFAIL' then Description of Arm must equal 'Screen Failure', and vice versa].
  29. Study Day End: (IR4100) For all timing variables, this identifies records that violate the condition [(Study Day of Start of Observation less than or equal to Study Day of End of Observation)], limited to records where [Study Day of Start of Observation is not null and Study Day of End of Observation is not null].
  30. Start End Records: (IR4101) For all timing variables, this identifies records that violate the condition [(Start Date/Time of Observation less than or equal to End Date/Time of Observation)], limited to records where [Start Date/Time of Observation is not null and End Date/Time of Observation is not null].
  31. Baseline Null - (IR4102) For all Findings domains, this identifies records that violate the condition [Baseline Flag either 'Y' or null].
  32. Derived Flag Null (IR4103)  For findings domains, this identifies records that violate the condition [Derived Flag either 'Y' or null].
  33. Reference Periods (IR4104) For Events and Interventions domains, this identifies records that violate the condition [End Relative to Reference Period in('BEFORE','DURING','AFTER','DURING/AFTER','U')], limited to records where [End Relative to Reference Period is not null].
  34. Fasting Status Null  (IR4105) For findings domains, this identifies records that violate the condition [Fasting Status in ('Y','N','U')], limited to records where [Fasting Status is not null].
  35. Occurrence Null - (IR4106) For Events and Interventions domains, this identifies records that violate the condition [Occurrence in ('Y','N')], limited to records where [Occurrence is not null].
  36. Status Not Done (IR4107) Identifies records that violate the condition [Status='NOT DONE'], limited to records where [Status is not null].
  37. Reference Period (IR4108) For Events and Interventions domains, this identifies records that violate the condition [Start Relative to Reference Period in ('BEFORE','DURING','AFTER')], limited to records where [Start Relative to Reference Period is not null].
  38. Dose Null (IR4109) For Interventions domains, this identifies records that violate the condition [Dose greater than or equal to 0], limited to records where [Dose is not null].
  39. Duration Zero (IR4110) For All (Timing) variables, this identifies records that violate the condition [Duration greater than or equal to 0], limited to records where [Duration is not null].
  40. Original Units Null (IR4111) For Findings domains, this identifies records that violate the condition [Result or Finding in Original Units is null], limited to records where [Derived Flag='Y'].
  41. Format Null (IR4112) For Findings domains, this identifies records that violate the condition [Result or Finding in Standard Format is not null], limited to records where [Derived Flag='Y'].
  42. Test Name Length (IR4113) For Findings domains, this identifies records that violate the condition [LENGTH (Name of Measurement, Test or Examination) less than or equal to 40 characters].
  43. Short Name Length (IR4114) For Findings domains, this identifies records that violate the condition [LENGTH(Short Name of Measurement, Test or Examination) less than or equal to 8 chars, cannot start with a number or contain special chars].
  44. Trial Summary Length (IR4115) For TS domains, this identifies records that violate the condition [LENGTH(Trial Summary Parameter) less than or equal to 40 chars].
  45. Trial Summary Short (IR4116) For TS domains, this identifies records that violate the condition [LENGTH(Trial Summary Parameter Short Name) less than or equal to 8 chars, cannot start with a number or contain special chars].
  46. End Reference Period (IR4117) For All (Timing) variables, this identifies records that violate the condition [End Relative to Reference Period is not null], limited to records where [End Date/Time of Observation is null].
  47. Start Period Null (IR4118) For All (Timing) variables, this identifies records that violate the condition [Start Relative to Reference Period is not null], limited to records where [Start Date/Time of Observation is null].
  48. Elapse Time Zero (IR4119) For EX domains, this identifies records that violate the condition [Planned Elapsed Time from Reference Pt greater than or equal to 0], limited to records where [Planned Elapsed Time from Reference Pt is not null].
  49. Evaluation Interval Zero (IR4120) For All (Timing) variables, this identifies records that violate the condition [Evaluation Interval greater than or equal to 0], limited to records where [Evaluation Interval is not null].
  50. Toxicity Grade Valid (IR4121) For Events domains, this identifies records that violate the condition [Toxicity Grade is a valid number], limited to records where [Toxicity Grade is not null].
  51. Reason Done Null (IR4122) For All domains, this identifies records that violate the condition [Reason Not Done is null], limited to records where [Status is null].
  52. Date Collection Null (IR4123) For Findings domains, this identifies records that violate the condition [Date/Time of Collection is not null], limited to records where [End Date/Time of Observation is not null].
  53. Date Less End Date (IR4124) For Findings domains, this identifies records that violate the condition [Date/Time of Collection less than or equal to End Date/Time of Observation], limited to records where [Date/Time of Collection is not null and End Date/Time of Observation exists].
  54. Results Units Null (IR4125) For Findings domains, this identifies records that violate the condition [Result or Finding in Original UnitsU is not null], limited to records where [Result or Finding in Original Units is not null].
  55. Original Units Null (IR4126) For Findings domains, this identifies records that violate the condition [Result or Finding in Original UnitsU is null], limited to records where [Result or Finding in Original Units is null].
  56. Upper Limit Range (IR4127) For Findings domains, this identifies records that violate the condition [Normal Range Upper Limit-Standard Units greater than or equal to Normal Range Lower Limit-Standard Units], limited to records where [Normal Range Upper Limit-Standard Units is not null and STNRHI is not null].
  57. Standard Unit Null (IR4128) For Findings domains, this identifies records that violate the condition [Standard Units is not null], limited to records where [Result or Finding in Standard Format is not null].
  58. Standard Format Null (IR4129) For Findings domains, this identifies records that violate the condition [Standard Units is null], limited to records where [Result or Finding in Standard Format is null].
  59. Start Observation Null (IR4130) For All (Timing) variables, this identifies records that violate the condition [Start Date/Time of Observation is not null], limited to records where [End Date/Time of Observation is not null].
  60. Time Name Null (IR4131) For All (Timing) variables, this identifies records that violate the condition [Planned Time Point Name is not null], limited to records where [Planned Time Point Number is not null].
  61. Time Number Null (IR4132) For All (Timing) variables, this identifies records that violate the condition [Planned Time Point Number is not null], limited to records where [Planned Time Point Name is not null].
  62. Time Reference Null (IR4133) For All (Timing) variables, this identifies records that violate the condition [Time Point Reference is not null], limited to records where [Elapsed Time from Reference Point is not null].
  63. Dose Unit Null (IR4134) For Interventions domains, this identifies records that violate the condition [Dose Units is not null], limited to records where [Dose is not null].
  64. Result Format Null (IR4135) For Findings domains, this identifies records that violate the condition [Result or Finding in Standard Format is not null], limited to records where [Result or Finding in Original Units is not null].
  65. Code List Found (IR4136) For All domains, this identifies records where values are not found in the study-specific codelist attached to a variable.
  66. Study Day Zero (IR4137) For All domains, this identifies records that violate the condition [Study Day of Visit/Collection/Exam doesn't equal 0].
  67. Treatment Emergent AE - (IR4255) For AE domains, this identifies a Sponsor-provided Flag Variable for 'Treatment emergent AE' where the derivation can't be executed.
  68. Clinically Significant Lab (IR4256) For LB domains, this identifies a Sponsor-provided Flag Variable for 'Clinically Significant Lab' where the derivation can't be executed.
  69. Clinically Significant Vitals(IR4257) For VS domains, this identifies a Sponsor-provided Flag Variable for 'Clinically Significant Vital Sign' where the derivation can't be executed.
  70. SUPPQUAL USBJID (IR4258) For Supplemental Qualifiers domains, this identifies a domain that appears to contain supplemental qualifier data but does not contain the Unique Subject Identifier variable.
  71. SAS Label - (IR4260) For All domains, this identifies a variable present in SAS dataset but not present in (study specific) description file.
  72. DM Sequence (IR4500) For All domains, this identifies non-Sequence Number domain subjects not found in the Demographics domain.
  73. Subject Visit (IR4501) For All domains, this identifies Unique Subject Identifier+Visit Name+Visit Number combinations not found in the SV domain.
  74. Arm Code TA (IR4502) For DM domains, this identifies records where the value for Arm Code is not found in the TA domain.
  75. Subject Element Code (IR4503) For All domains, this identifies records where the value for Subject Element Code is not found in the TE domain.
  76. IE Short Name (IR4504) For IE domains, this identifies records where the value for Inclusion/Exclusion Criterion Short Name is not found in the TI domain.
  77. DS Subject (IR4505) For DM domains, this identifies Sequence Number subjects where no record for the subject is found in the Disposition domain.
  78. DM Sequence(IR4506) For DM domains, this identifies Sequence Number subjects where no record for the subject is found in the Exposure domain.
  79. Arm Code DM (IR4507) For DM domains, this identifies Sequence Number treatment arms (Description of Arm+Arm Code combination) not found in the TA domain.
  80. Unknown CO (IR4508) For CO domains, this identifies CO domain reference to an unknown related domain.
  81. Related Records Unknown (IR4509) For RELREC domains, this identifies Related Records domain reference to an unknown related domain.
  82. Unknown SUPQUAL (IR4510) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to an unknown related domain.
  83. RELREC Key (IR4511) For RELREC domains, this identifies Related Records domain reference to a key variable that isn't defined in the target domain.
  84. SUPPQUAL Key (IR4512) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to a key variable that isn't defined in the target domain.
  85. RELREC Target Domain (IR4513) For RELREC domains, this identifies Related Records domain reference to a record that doesn't exist in the target domain.
  86. SUPPQUAL Target Domain (IR4514) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to a record that doesn't exist in the target domain.
  87. DM Unique Subject (R4005) For DM domains, this identifies records where values for Unique Subject ID variable(s) are not unique, limited to records where [Unique Subject ID is not null].
  88. Age Greater Zero (R4006) For DM domains, this identifies records that violate the condition [AGE greater than or equal to 0], limited to records where [AGE is not null].
  89. Sex Code List (R4007) For DM domains, this identifies records where value for [SEX] is not found in Codelist [SEX].
  90. Country Codelist (R4008) For DM domains, this identifies records where value for [COUNTRY] is not found in Codelist [COUNTRY].
  91. Yes No Codelist(R4019) For AE domains, this identifies records where value for [Serious Event] is not found in Codelist [YESNO].
  92. Birth Defect Codelist (R4023) For AE domains, this identifies records where value for [Congenital Anomaly or Birth Defect] is not found in Codelist [YESNO], limited to records where [Congenital Anomaly or Birth Defect is not null].
  93. Disability Codelist (R4024) For AE domains, this identifies records where value for [Persist or Signif Disability/Incapacity] is not found in Codelist [YESNO], limited to records where [Persist or Signif Disability/Incapacity is not null].
  94. Death Codelist (R4025) For AE domains, this identifies records where value for [Results in Death] is not found in Codelist [YESNO], limited to records where [Results in Death is not null].
  95. Hospitalization Codelist (R4026) For AE domains, this identifies records where value for [Requires or Prolongs For Hospitalization] is not found in Codelist [YESNO], limited to records where [Requires or Prolongs Hospitalization is not null].
  96. Life Threatening Codelist (R4027) For AE domains, this identifies records where value for [Is Life Threatening] is not found in Codelist [YESNO], limited to records where [Is Life Threatening is not null].
  97. Inclusion Exclusion Codelist (R4031) For IE domains, this identifies records where value for [Inclusion/Exclusion Category] is not found in Codelist [INCEX], limited to records where [Inclusion/Exclusion Category is not null].
  98. Conmed Codelist (R4043) For AE domains, this identifies records where value for [Concomitant or Additional Trtmnt Given] is not found in Codelist [YESNO].
  99. Cancer Codelist (R4045) For AE domains, this identifies records where value for [Involves Cancer] is not found in Codelist [YESNO], limited to records where [Involves Cancer is not null].
    (R4046) For AE domains, this identifies records where value for [Other Medically Important Serious Event] is not found in Codelist [YESNO], limited to records where [Other Medically Important Serious Event is not null].
  100. Overdose Codelist (R4047) For AE domains, this identifies records where value for [Occurred with Overdose] is not found in Codelist [YESNO], limited to records where [Occurred with Overdose is not null].
  101. Age Unit Codelist (R4062) For DM domains, this identifies records where value for [AGEU] is not found in Codelist [AGEUNITS2], limited to records where [AGEU is not null].
  102. IE Units Codelist (R4071) For IE domains, this identifies records where value for [I/E Criterion Original Result] is not found in Codelist [YESNO], limited to records where [I/E Criterion Original Result is not null].
  103. Study Format Codelist (R4072) For IE domains, this identifies records where value for [I/E Criterion Result in Std Format] is not found in Codelist [YESNO], limited to records where [I/E Criterion Result in Std Format is not null].
  104. IE Results Codelist (R4073) For IE domains, this identifies records that violate the condition [I/E Criterion Original Result = I/E Criterion Result in Std Format].
  105. SUPPQUAL Study ID (R4083) For Supplemental Qualifiers domains, this identifies records where values for [Study Identifier, Unique Subject Identifier, Identifying Variable, Identifying Variable Value, Qualifier Variable Name] variable(s) are not unique.
  106. DM Reference Date (R4096) For DM domains, this identifies records that violate the condition [Subject Reference Start Date/Time is not null], limited to records where [Arm Code doesn't equal 'SCRNFAIL'].
  107. DM End Date (R4097) For DM domains, this identifies records that violate the condition [Subject Reference End Date/Time is not null], limited to records where [Arm Code doesn't equal 'SCRNFAIL'].
  108. TE End Element (R4101) For TE domains, this identifies records that violate the condition [Rule for End of Element is not null or Planned Duration of Element is not null].
  109. AE Death (R4102) For AE domains, this identifies records that violate the condition [Results in Death='Y'], limited to records where [Outcome of Adverse Event='FATAL'].
  110. AE Fatal (R4103) For AE domains, this identifies records that violate the condition [Outcome of Adverse Event='FATAL'], limited to records where [Results in Death='Y'].
  111. DV Count (R4104) For DV domains, this identifies records that violate the condition [count (distinct Protocol Deviation Term) = count(distinct Protocol Deviation Coded Term)], limited to records where [Protocol Deviation Term is not null].
  112. SE Unplanned Element (R4105) For SE domains, this identifies records that violate the condition [Description of Unplanned Element is not null], limited to records where [Subject Element Code='UNPLAN'].
  113. Age Unit Null (R4106) For DM domains, this identifies records that violate the condition [AGEU is not null], limited to records where [AGE is not null].

The findings from the above evaluation will be stored in a dataset named WORK.CDISC.   Each test case will be identified by a column named "case" which corresponds to each item listed above.  An HTML report will be generated documenting the findings.  The report name will be the same name as the current program with the (.html) file extension.

If the macro is called without any parameters, a dialog box will be opened to allow users to select the datasets to be evaluated.

Example

%cdisc (datlib=mylib, 
        datname=ae);
%cdisc (datlib=mylib, 
        datname=ae*);
%cdisc (datlib=mylib, 
        datname=ae demog term);
    CDISC Builder - CDISC Data Tools Software,  Meta-Xceed Inc.© 2009
Bookmark and Share