Verify SAS datasets against CDISC standards
%cdisc (datlib = data library,
datname = dataset name);
Where
|
Is Type...
|
And represents...
|
datlib |
C (200) |
Library
name reference the location where the dataset resides. |
datname |
C (200) |
Name of the dataset to be verified.
Wild cards can be
specified such as ae*. You can also specify datasets
individually separated by spaces. |
Details
This
tool verifies SAS datasets against CDISC submission data domain models
version 3.1.1 as specified at: SDTM--3.1.1ImplementationGuide.pdf.
It is intended to catch deviations of standards including the
following:
- Required
Fields: Required identifier variables including:
DOMAIN, USUBJID, STUDYID and --SEQ.
- Subject
Variable: (4.1.2.3) For variable names, labels and comments, use
the word "Subject" when referring to "patients" or "healthy
volunteer".
- Variable
Length: (4.1.2.1) Variable names are limited to 8 characters
with labels up to 40 characters.
- Yes/No:
(4.1.3.7) Variables where the response is Yes or No (Y/N) should
normally be populated for both Yes and No responses.
- Date Time
Format: (4.1.4.1) Date or Datetime must be in ISO 8601 format.
- Study Day
Variable: (4.1.4.4) Study day variable has the name
---DY.
- Variable
Names: (3.2.3) If any variable names used matches CDISC
variables, the associated label has to match.
- Variable Label:
(3.2.3) If any variable labels match that of CDISC labels, the
associated variable has to match.
- Variable Type:
(3.2.3) If any variables match that of CDISC variables, the
associated type has to match.
- Dataset Names:
(3.2.3) If any of the dataset names match CDISC, the associated data
label has to match.
- Dataset Labels:
(3.2.3) If any of the dataset label match CDISC, the associated
dataset name has to match.
- Abbreviations:
(10.3.1)
(10.4) The following abbreviations are suggested for variable names
and data sets.
Acronym
|
Descriptive Text
|
AE |
Adverse Events |
AU |
Autopsy |
BM |
Bone Mineral Density (BMD) Data |
BR |
Biopsy |
CM |
Concomitant Meds |
CO |
Comments |
DA |
Drug Accountability |
DC |
Disease Characteristics |
DM |
Demographics |
DS |
Disposition |
DV |
Protocol Deviations |
EE |
EEG |
EG |
EEG |
EX |
Exposure |
HU |
Healthcare Resource Utilization |
IE |
Inclusion/Exclusion |
IM |
Imaging |
LB |
Laboratory Data |
MB |
Microbiology Specimens |
MH |
Medical History |
ML |
Meal Data |
MS |
Microbiology Susceptibility |
OM |
Organ Measurements |
PC |
PK Concentration |
PE |
Physical Exam |
PP |
PK Parameters |
PG |
Pharmacogenomics |
QS |
Questionnaires |
SC |
Subject Characteristics |
SE |
Subject Elements |
SG |
Surgery |
SK |
Skin Test |
SL |
Sleep (Polysomnography) Data |
SL |
Signs and Symptoms |
ST |
Stress (Exercise) Test Data |
SU |
Substance Use |
SV |
Subject Visits |
TA |
Trial Arms |
TE |
Trial Elements |
TI |
Trial Inclusion/Exclusion Criteria |
TS |
Trial Summary |
TV |
Trial Visits |
VS |
Vital Signs |
CAN |
ACTION |
ADJ |
ADJUSTMENT |
ADJ |
ANALYSIS DATASET |
BL |
BASELINE |
BRTH |
BIRTH |
BOD |
BODY |
CAN |
CANCER |
CAT |
CATEGORY |
C |
CHARACTER |
CND |
CONDITION |
CLAS |
CLASS |
CD |
CODE |
COM |
COMMENT |
CON |
CONCOMITANT |
CONG |
CONGENTTAL |
DTC |
DATE TIME - CHARACTER |
DY |
DAY |
DTH |
DEATH |
DECOD |
DECODE |
DRV |
DERIVED |
DESC |
DESCRIPTION |
DISAB |
DISABILITY |
DOS |
DOSE |
DOS |
DOSAGE |
DOSE |
DOSE |
DOSE |
DOSAGE |
DUR |
DURATION |
EL |
ELAPSED |
ET |
ELEMENT |
EM |
EMERGENT |
END |
END |
EN |
END |
ETHNIC |
ETHNICITY |
X |
EXTERNAL |
EVAL |
EVALUATOR |
EVL |
EVALUATION |
FAST |
FASTING |
FN |
FILENAME |
FL |
FLAG |
FRM |
FORMULATION, FORM |
FREQ |
FREQUENCY |
GR |
GRADE |
GRP |
GROUP |
HI |
HIGHER LIMIT |
HOSP |
HOSPITALIZATION |
ID |
IDENTIFIER |
INDC |
INDICATION |
INDC |
INDICATOR |
INT |
INTERVAL |
INTP |
INTERPRETATION |
INV |
INVESTIGATOR |
LIFE |
LIFE-THREATENING |
LOC |
LOCATION |
LOINC |
LOINC CODE |
LO |
LOWER LIMIT |
MIE |
MEDICALLY-IMPORTANT EVENT |
NAM |
NAME |
NST |
NON-STUDY THERAPY |
NR |
NORMAL RANGE |
ND |
NOT DONE |
NUM |
NUMBER |
N |
NUMERIC |
ONGO |
ONGOING |
ORD |
ORDER |
ORIG |
ORIGIN |
OR |
ORIGINAL |
OTH |
OTHER |
O |
OTHER |
OUT |
OUTCOME |
OD |
OVERDOSE |
PARM |
PARAMETER |
PATT |
PATTERN |
POP |
POPULATION |
POS |
POSITION |
QUAL |
QUALIFIER |
REAS |
REASON |
REF |
REFERENCE |
RF |
REFERENCE |
RGM |
REGIMEN |
REL |
RELATED |
R |
RELATED |
REL |
RELATIONSHIP |
R |
RELATIONSHIP |
RES |
RESULT |
RL |
RULE |
SEQ |
SEQUENCE |
S |
SERIOUS |
SER |
SERIOUS |
SEV |
SEVERITY |
SPEC |
SPECIMEN |
SPC |
SPECIMEN |
SPEC |
SPONSOR |
SPC |
SPONSOR |
ST |
STANDARD |
STD |
STANDARD |
ST |
START |
STD |
START |
STAT |
STATUS |
SCAT |
SUBCATEGORY |
SUBJ |
SUBJECT |
SUPP |
SUPPLEMENTAL |
SYS |
SYSTEM |
TXT |
TEXT |
TM |
TIME |
TPT |
TIMEPOINT |
TOT |
TOTAL |
TOX |
TOXICITY |
TRANS |
TRANSITION |
TRT |
TREATMENT |
U |
UNIT |
U |
UNIQUE |
UP |
UNPLANNED |
VAR |
VARIABLE |
VAL |
VALUE |
V |
VEHICLE |
- SEQ Values: When the --SEQ variable is
used, it must have unique values for each USUBJID within each domain.
- Label Casing: For Dataset labels and variable
labels, all non trivial words (more than three characters) must start
with a capital letter with the rest of the characters lowercase.
- Required Values:
(4.1.1.5)
For required fields such as
the ones specified in number 1, check to see if there are values.
If there are any missing, values, report the observation number where it
is missing.
- Similar Parenthesis: For labels with
matching values inside parenthesis such as (Yes/No) within the same
dataset, it will check to see if the variables have the same type and
length. If not, it will report the differences.
- Required Variables:
(4.1.1.5)
A
Required
variable
is any variable that is basic to the identification of a data record
(i.e., essential
key variables and a topic variable) or is necessary to make the record
meaningful. Required variables should always be included in the dataset
and cannot be null for any record.
-
Expected Variable:
(4.1.1.5)
An
Expected
variable
is any variable necessary to make a record useful in the context of a
specific
domain. Columns for Expected variables are assumed
to be present in each submitted dataset even if some values are null.
-
Zero Rows: (IR4000) Identifies domain table that has zero rows and therefore contains no data.
-
Empty Value: (IR4001) Identifies a null
(empty) value found in a column where (Standard) Core attribute is 'Req'
which means required.
-
No Record Baseline: (IR4005) For Findings
domains, this identifies subjects where there are no records with a
value of 'Y' in the baseline flag variable (Baseline Flag).
-
Consistent Lab Values: (IR4006) For LAB domains,
this identifies Short Name of Measurement, Test or Examination values
where standard units value (Standard Units) is not consistent across all
records.
-
MedDRA Term Mismatch:
(IR4007) For AE domains, this identifies records where the value for the
Preferred Term could not be found in the MedDRA dictionary.
-
Serious AE: (IR4008) For AE domains, this
identifies records where Serious Event='Y' but none of Involves Cancer,
Congenital Anomaly or Birth Defect, Persist or Signif
Disability/Incapacity, Results in Death, Requires or Prolongs
Hospitalization, Is Life Threatening, Other Medically Important Serious
Event, or Occurred with Overdose equals 'Y'.
-
Unit and Status Null: (IR4009) Identifies
records where Result or Finding in Original Units and Status both have a
value, or where both are null.
-
Visit Number Decimal : (IR4010) Identifies
records where the value for Visit Number is formatted to more than two
decimal places.
-
DM Arm Code: (IR4011) For DM domain, this
identifies records that violate the condition [If Arm Code='SCRNFAIL'
then Description of Arm must equal 'Screen Failure', and vice versa].
-
TA Arm Code: (IR4012) For TA domain, this
identifies records that violate the condition [If Arm Code='SCRNFAIL'
then Description of Arm must equal 'Screen Failure', and vice versa].
-
Study Day End: (IR4100) For all timing
variables, this identifies records that violate the condition [(Study
Day of Start of Observation less than or equal to Study Day of End of
Observation)], limited to records where [Study Day of Start of
Observation is not null and Study Day of End of Observation is not
null].
-
Start End Records: (IR4101) For all timing
variables, this identifies records that violate the condition [(Start
Date/Time of Observation less than or equal to End Date/Time of
Observation)], limited to records where [Start Date/Time of Observation
is not null and End Date/Time of Observation is not null].
-
Baseline Null - (IR4102) For all Findings
domains, this identifies records that violate the condition [Baseline
Flag either 'Y' or null].
-
Derived Flag Null (IR4103) For
findings domains, this identifies records that violate the condition
[Derived Flag either 'Y' or null].
-
Reference Periods (IR4104) For Events and
Interventions domains, this identifies records that violate the
condition [End Relative to Reference Period
in('BEFORE','DURING','AFTER','DURING/AFTER','U')], limited to records
where [End Relative to Reference Period is not null].
-
Fasting Status Null (IR4105)
For findings domains, this identifies records that violate the condition
[Fasting Status in ('Y','N','U')], limited to records where [Fasting
Status is not null].
-
Occurrence Null - (IR4106) For Events and
Interventions domains, this identifies records that violate the
condition [Occurrence in ('Y','N')], limited to records where
[Occurrence is not null].
-
Status Not Done (IR4107) Identifies records that
violate the condition [Status='NOT DONE'], limited to records where
[Status is not null].
-
Reference Period (IR4108) For Events and Interventions domains, this identifies records that violate the condition [Start Relative to Reference Period in ('BEFORE','DURING','AFTER')], limited to records where [Start Relative to Reference Period is not null].
-
Dose Null (IR4109) For Interventions domains, this identifies records that violate the condition [Dose greater than or equal to 0], limited to records where [Dose is not null].
-
Duration Zero (IR4110) For All (Timing) variables, this identifies records that violate the condition [Duration greater than or equal to 0], limited to records where [Duration is not null].
-
Original Units Null (IR4111) For Findings domains, this identifies records that violate the condition [Result or Finding in Original Units is null], limited to records where [Derived Flag='Y'].
-
Format Null (IR4112) For Findings domains, this identifies records that violate the condition [Result or Finding in Standard Format is not null], limited to records where [Derived Flag='Y'].
-
Test Name Length (IR4113) For Findings domains, this identifies records that violate the condition
[LENGTH (Name of Measurement, Test or Examination) less than or equal to 40 characters].
-
Short Name Length (IR4114) For Findings domains, this identifies records that violate the condition [LENGTH(Short Name of Measurement, Test or Examination) less than or equal to 8 chars, cannot start with a number or contain special chars].
-
Trial Summary Length (IR4115) For TS domains, this identifies records that violate the condition [LENGTH(Trial Summary Parameter) less than or equal to 40 chars].
-
Trial Summary Short (IR4116) For TS domains, this identifies records that violate the condition [LENGTH(Trial Summary Parameter Short Name) less than or equal to 8 chars, cannot start with a number or contain special chars].
-
End Reference Period (IR4117) For All (Timing) variables, this identifies records that violate the condition [End Relative to Reference Period is not null], limited to records where [End Date/Time of Observation is null].
-
Start Period Null (IR4118) For All (Timing) variables, this identifies records that violate the condition [Start Relative to Reference Period is not null], limited to records where [Start Date/Time of Observation is null].
-
Elapse Time Zero (IR4119) For EX domains, this identifies records that violate the condition [Planned Elapsed Time from Reference Pt greater than or equal to 0], limited to records where [Planned Elapsed Time from Reference Pt is not null].
-
Evaluation Interval Zero (IR4120) For All (Timing) variables, this identifies records that violate the condition [Evaluation Interval greater than or equal to 0], limited to records where [Evaluation Interval is not null].
-
Toxicity Grade Valid (IR4121) For Events domains, this identifies records that violate the condition [Toxicity Grade is a valid number], limited to records where [Toxicity Grade is not null].
-
Reason Done Null (IR4122) For All domains, this identifies records that violate the condition [Reason Not Done is null], limited to records where [Status is null].
-
Date Collection Null (IR4123) For Findings domains, this identifies records that violate the condition [Date/Time of Collection is not null], limited to records where [End Date/Time of Observation is not null].
-
Date Less End Date (IR4124) For Findings domains, this identifies records that violate the condition [Date/Time of Collection less than or equal to End Date/Time of Observation], limited to records where [Date/Time of Collection is not null and End Date/Time of Observation exists].
-
Results Units Null (IR4125) For Findings domains, this identifies records that violate the condition [Result or Finding in Original UnitsU is not null], limited to records where [Result or Finding in Original Units is not null].
-
Original Units Null (IR4126) For Findings domains, this identifies records that violate the condition [Result or Finding in Original UnitsU is null], limited to records where [Result or Finding in Original Units is null].
-
Upper Limit Range (IR4127) For Findings domains, this identifies records that violate the condition [Normal Range Upper Limit-Standard Units greater than or equal to Normal Range Lower Limit-Standard Units], limited to records where [Normal Range Upper Limit-Standard Units is not null and STNRHI is not null].
-
Standard Unit Null (IR4128) For Findings domains, this identifies records that violate the condition [Standard Units is not null], limited to records where [Result or Finding in Standard Format is not null].
-
Standard Format Null (IR4129) For Findings domains, this identifies records that violate the condition [Standard Units is null], limited to records where [Result or Finding in Standard Format is null].
-
Start Observation Null (IR4130) For All (Timing) variables, this identifies records that violate the condition [Start Date/Time of Observation is not null], limited to records where [End Date/Time of Observation is not null].
-
Time Name Null (IR4131) For All (Timing) variables, this identifies records that violate the condition [Planned Time Point Name is not null], limited to records where [Planned Time Point Number is not null].
-
Time Number Null (IR4132) For All (Timing) variables, this identifies records that violate the condition [Planned Time Point Number is not null], limited to records where [Planned Time Point Name is not null].
-
Time Reference Null (IR4133) For All (Timing) variables, this identifies records that violate the condition [Time Point Reference is not null], limited to records where [Elapsed Time from Reference Point is not null].
-
Dose Unit Null (IR4134) For Interventions domains, this identifies records that violate the condition [Dose Units is not null], limited to records where [Dose is not null].
-
Result Format Null (IR4135) For Findings domains, this identifies records that violate the condition [Result or Finding in Standard Format is not null], limited to records where [Result or Finding in Original Units is not null].
-
Code List Found (IR4136) For All domains, this identifies records where values are not found in the study-specific codelist attached to a variable.
-
Study Day Zero (IR4137) For All domains, this identifies records that violate the condition [Study Day of Visit/Collection/Exam doesn't equal 0].
-
Treatment Emergent AE - (IR4255) For AE domains, this identifies a Sponsor-provided Flag Variable for 'Treatment emergent AE' where the derivation can't be executed.
-
Clinically Significant Lab (IR4256) For LB domains, this identifies a Sponsor-provided Flag Variable for 'Clinically Significant Lab' where the derivation can't be executed.
-
Clinically Significant Vitals(IR4257) For VS domains, this identifies a Sponsor-provided Flag Variable for 'Clinically Significant Vital Sign' where the derivation can't be executed.
-
SUPPQUAL USBJID (IR4258) For Supplemental Qualifiers domains, this identifies a domain that appears to contain supplemental qualifier data but does not contain the Unique Subject Identifier variable.
-
SAS Label - (IR4260) For All domains, this identifies a variable present in SAS dataset but not present in (study specific) description file.
-
DM Sequence (IR4500) For All domains, this identifies non-Sequence Number domain subjects not found in the Demographics domain.
-
Subject Visit (IR4501) For All domains, this identifies Unique Subject Identifier+Visit Name+Visit Number combinations not found in the SV domain.
-
Arm Code TA (IR4502) For DM domains, this identifies records where the value for Arm Code is not found in the TA domain.
-
Subject Element Code (IR4503) For All domains, this identifies records where the value for Subject Element Code is not found in the TE domain.
-
IE Short Name (IR4504) For IE domains, this identifies records where the value for Inclusion/Exclusion Criterion Short Name is not found in the TI domain.
-
DS Subject (IR4505) For DM domains, this identifies Sequence Number subjects where no record for the subject is found in the Disposition domain.
-
DM Sequence(IR4506) For DM domains, this identifies Sequence Number subjects where no record for the subject is found in the Exposure domain.
-
Arm Code DM (IR4507) For DM domains, this identifies Sequence Number treatment arms (Description of Arm+Arm Code combination) not found in the TA domain.
-
Unknown CO (IR4508) For CO domains, this identifies CO domain reference to an unknown related domain.
-
Related Records Unknown (IR4509) For RELREC domains, this identifies Related Records domain reference to an unknown related domain.
-
Unknown SUPQUAL (IR4510) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to an unknown related domain.
-
RELREC Key (IR4511) For RELREC domains, this identifies Related Records domain reference to a key variable that isn't defined in the target domain.
-
SUPPQUAL Key (IR4512) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to a key variable that isn't defined in the target domain.
-
RELREC Target Domain (IR4513) For RELREC domains, this identifies Related Records domain reference to a record that doesn't exist in the target domain.
-
SUPPQUAL Target Domain (IR4514) For Supplemental Qualifiers domains, this identifies Supplemental Qualifiers domain reference to a record that doesn't exist in the target domain.
-
DM Unique Subject (R4005) For DM domains, this identifies records where values for Unique Subject ID variable(s) are not unique, limited to records where [Unique Subject ID is not null].
-
Age Greater Zero (R4006) For DM domains, this identifies records that violate the condition [AGE greater than or equal to 0], limited to records where [AGE is not null].
-
Sex Code List (R4007) For DM domains, this identifies records where value for [SEX] is not found in Codelist [SEX].
-
Country Codelist (R4008) For DM domains, this identifies records where value for [COUNTRY] is not found in Codelist [COUNTRY].
-
Yes No Codelist(R4019) For AE domains, this identifies records where value for [Serious Event] is not found in Codelist
[YESNO].
-
Birth Defect Codelist (R4023) For AE domains, this identifies records where value for [Congenital Anomaly or Birth Defect] is not found in Codelist [YESNO], limited to records where [Congenital Anomaly or Birth Defect is not null].
-
Disability Codelist (R4024) For AE domains, this identifies records where value for [Persist or Signif Disability/Incapacity] is not found in Codelist [YESNO], limited to records where [Persist or Signif Disability/Incapacity is not null].
-
Death Codelist (R4025) For AE domains, this identifies records where value for [Results in Death] is not found in Codelist [YESNO], limited to records where [Results in Death is not null].
-
Hospitalization Codelist (R4026) For AE domains, this identifies records where value for [Requires or Prolongs For Hospitalization] is not found in Codelist [YESNO], limited to records where [Requires or Prolongs Hospitalization is not null].
-
Life Threatening Codelist (R4027) For AE domains, this identifies records where value for [Is Life Threatening] is not found in Codelist [YESNO], limited to records where [Is Life Threatening is not null].
-
Inclusion Exclusion Codelist (R4031) For IE domains, this identifies records where value for [Inclusion/Exclusion Category] is not found in Codelist [INCEX], limited to records where [Inclusion/Exclusion Category is not null].
-
Conmed Codelist (R4043) For AE domains, this identifies records where value for [Concomitant or Additional Trtmnt Given] is not found in Codelist
[YESNO].
-
Cancer Codelist (R4045) For AE domains, this identifies records where value for [Involves Cancer] is not found in Codelist [YESNO], limited to records where [Involves Cancer is not null].
(R4046) For AE domains, this identifies records where value for [Other Medically Important Serious Event] is not found in Codelist [YESNO], limited to records where [Other Medically Important Serious Event is not null].
-
Overdose Codelist (R4047) For AE domains, this identifies records where value for [Occurred with Overdose] is not found in Codelist [YESNO], limited to records where [Occurred with Overdose is not null].
-
Age Unit Codelist (R4062) For DM domains, this identifies records where value for [AGEU] is not found in Codelist [AGEUNITS2], limited to records where [AGEU is not null].
-
IE Units Codelist (R4071) For IE domains, this identifies records where value for [I/E Criterion Original Result] is not found in Codelist [YESNO], limited to records where [I/E Criterion Original Result is not null].
-
Study Format Codelist (R4072) For IE domains, this identifies records where value for [I/E Criterion Result in Std Format] is not found in Codelist [YESNO], limited to records where [I/E Criterion Result in Std Format is not null].
-
IE Results Codelist (R4073) For IE domains, this identifies records that violate the condition [I/E Criterion Original Result = I/E Criterion Result in Std Format].
-
SUPPQUAL Study ID (R4083) For Supplemental Qualifiers domains, this identifies records where values for [Study Identifier, Unique Subject Identifier, Identifying Variable, Identifying Variable Value, Qualifier Variable Name] variable(s) are not unique.
-
DM Reference Date (R4096) For DM domains, this identifies records that violate the condition [Subject Reference Start Date/Time is not null], limited to records where [Arm Code doesn't equal
'SCRNFAIL'].
-
DM End Date (R4097) For DM domains, this identifies records that violate the condition [Subject Reference End Date/Time is not null], limited to records where [Arm Code doesn't equal
'SCRNFAIL'].
-
TE End Element (R4101) For TE domains, this identifies records that violate the condition [Rule for End of Element is not null or Planned Duration of Element is not null].
-
AE Death (R4102) For AE domains, this identifies records that violate the condition [Results in Death='Y'], limited to records where [Outcome of Adverse Event='FATAL'].
-
AE Fatal (R4103) For AE domains, this identifies records that violate the condition [Outcome of Adverse Event='FATAL'], limited to records where [Results in Death='Y'].
-
DV Count (R4104) For DV domains, this identifies records that violate the condition [count (distinct Protocol Deviation Term) = count(distinct Protocol Deviation Coded Term)], limited to records where [Protocol Deviation Term is not null].
-
SE Unplanned Element (R4105) For SE domains, this identifies records that violate the condition [Description of Unplanned Element is not null], limited to records where [Subject Element Code='UNPLAN'].
-
Age Unit Null (R4106) For DM domains, this identifies records that violate the condition [AGEU is not null], limited to records where [AGE is not null].
The findings from the above
evaluation will be stored in a dataset named WORK.CDISC. Each
test case will be identified by a column named "case" which corresponds to
each item listed above. An HTML report will be generated
documenting the findings. The report name will be the same name as
the current program with the (.html) file extension.
If the macro is called without any
parameters, a dialog box will be opened to allow users to select the
datasets to be evaluated.
Example %cdisc (datlib=mylib,
datname=ae); %cdisc (datlib=mylib,
datname=ae*); %cdisc (datlib=mylib,
datname=ae demog term); |