cdisc

Verify SAS datasets against CDISC standards

%cdisc (datlib = data library, 
        datname = dataset name);
Where Is Type... And represents...
datlib C (200) Library name reference the location where the dataset resides.
datname C (200) Name of the dataset to be verified.  Wild cards can be specified such as ae*.

Details
This tool verifies SAS datasets against CDISC submission data domain models version 3.0 as specified at: http://www.cdisc.org/pdf/V3CRTStandardV1_2.pdf.  It is intended to catch deviations of standards including the following:

  1. Required Fields: (2.4.5) Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
  2. Subject Variable: (3.5.1.2.8) For variable names, labels and comments, use the word "Subject" when referring to "patients" or "healthy volunteer".
  3. Variable Length: (3.5.1.2.6) Variable names are limited to 8 characters with labels up to 40 characters.
  4. Yes/No: (3.5.1.3.18) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
  5. Date Time Format: (3.5.1.4.19) Use yymmdd10. but yymmdd8. is acceptable.
  6. Study Day Variable: (3.5.1.4.22) Study day variable has the name ---DY.
  7. Variable Names: (3.5.2) If any variable names used matches CDISC variables, the associated label has to match.
  8. Variable Label: (3.5.2) If any variable labels match that of CDISC labels, the associated variable has to match.
  9. Variable Type: (3.5.2) If any variables match that of CDISC variables, the associated type has to match.
  10. Dataset Names: (3.5.2) If any of the dataset names match CDISC, the associated data label has to match.
  11. Dataset Labels: (3.5.2) If any of the dataset label match CDISC, the associated dataset name has to match.
  12. Abbreviations: (3.5.2) The following abbreviations are suggested for variable names and data sets.

     
    Acronym Descriptive Text
    ae Adverse Events
    bl Baseline
    brth Birth
    cat Category
    clas Class
    cd Code
    cm Concomitant Medications
    dt Date
    dy Day
    dm Demographics
    ds Disposition
    dos Dose
    dur Duration
    eg ECG
    eval Evaluator
    ex Exposure
    fl Flag
    frm Form
    frq Frequency
    gr Grade
    hosp Hospital
    id Identifier
    ie Inclusion/Exclusion Exceptions
    ind Indication
    ind Indicator
    inv Investigator
    lb Labs
    loc Location
    lo Lower
    mh Medical History
    num Number
    occur Occurrence
    ord Order
    or Original
    or Original
    pe Physical Exam
    pt Point
    pos Position
    reas Reason
    ref Reference
    rgm Regimen
    r Related
    rel Relationship
    rl Rule
    seq Sequence
    stat Status
    scat Subcategory
    subj Subject
    sc Subject Characteristics
    su Substance Use
    txt Text
    tm Time
    tot Total
    tox Toxicity
    tran Transition
    trt Treatment
    ex Treatment or Exposure
    u Unique
    u Units
    up Unplanned
    hi Upper
    val Value
    var Variable
    vs Vital Signs

     

  13. SEQ Values: (4.3.2.1) When the --SEQ variable is used, it must have unique values for each USUBJID within each domain.
  14. Label Casing: For Dataset labels and variable labels, all non trivial words (more than three characters) must start with a capital letter with the rest of the characters lowercase.
  15. Character Length: CDISC standardizes on character variables having length of 100 characters.  For those variables with length 40 or greater, it will be recommended to have length of 100.
  16. Required Values: For required fields such as the ones specified in number 1, check to see if there are values.  If there are any missing, values, report the observation number where it is missing.
  17. Similar Parenthesis:  For labels with matching values inside parenthesis such as (Yes/No) within the same dataset, it will check to see if the variables have the same type and length.  If not, it will report the differences.

The findings from the above evaluation will be stored in a dataset named WORK.CDISC.   Each test case will be identified by a column named "case" which corresponds to each item listed above.  An HTML report will be generated documenting the findings.  The report name will be the same name as the current program with the (.html) file extension.

Example

%cdisc (datlib=mylib, 
        datname=ae);
%cdisc (datlib=mylib, 
        datname=ae*);