Verify SAS datasets against
CDISC standards
%cdisc (datlib = data library,
datname = dataset name);
Where |
Is Type... |
And represents... |
datlib |
C
(200) |
Library name
reference the location where the dataset resides. |
datname |
C
(200) |
Name
of the dataset to be verified. Wild cards can be specified such
as ae*. |
Details
This tool verifies SAS datasets against CDISC submission data
domain models version 3.0 as specified at: http://www.cdisc.org/pdf/V3CRTStandardV1_2.pdf.
It is intended to catch deviations of standards including the
following:
- Required Fields: (2.4.5)
Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
- Subject Variable: (3.5.1.2.8)
For variable names, labels and comments, use the word
"Subject" when referring to "patients" or
"healthy volunteer".
- Variable Length:
(3.5.1.2.6) Variable names are limited to 8 characters with labels up to
40 characters.
- Yes/No: (3.5.1.3.18) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
- Date Time Format: (3.5.1.4.19)
Use yymmdd10. but yymmdd8. is acceptable.
- Study Day Variable:
(3.5.1.4.22) Study day variable has the name ---DY.
- Variable Names: (3.5.2) If
any variable names used matches CDISC variables, the associated label
has to match.
- Variable Label: (3.5.2) If
any variable labels match that of CDISC labels, the associated variable
has to match.
- Variable Type: (3.5.2) If
any variables match that of CDISC variables, the associated type has to
match.
- Dataset Names: (3.5.2) If
any of the dataset names match CDISC, the associated data label has to
match.
- Dataset Labels: (3.5.2) If
any of the dataset label match CDISC, the associated dataset name has to match.
- Abbreviations: (3.5.2) The
following abbreviations are suggested for variable names and data sets.
Acronym |
Descriptive Text |
ae |
Adverse Events |
bl |
Baseline |
brth |
Birth |
cat |
Category |
clas |
Class |
cd |
Code |
cm |
Concomitant Medications |
dt |
Date |
dy |
Day |
dm |
Demographics |
ds |
Disposition |
dos |
Dose |
dur |
Duration |
eg |
ECG |
eval |
Evaluator |
ex |
Exposure |
fl |
Flag |
frm |
Form |
frq |
Frequency |
gr |
Grade |
hosp |
Hospital |
id |
Identifier |
ie |
Inclusion/Exclusion Exceptions |
ind |
Indication |
ind |
Indicator |
inv |
Investigator |
lb |
Labs |
loc |
Location |
lo |
Lower |
mh |
Medical History |
num |
Number |
occur |
Occurrence |
ord |
Order |
or |
Original |
or |
Original |
pe |
Physical Exam |
pt |
Point |
pos |
Position |
reas |
Reason |
ref |
Reference |
rgm |
Regimen |
r |
Related |
rel |
Relationship |
rl |
Rule |
seq |
Sequence |
stat |
Status |
scat |
Subcategory |
subj |
Subject |
sc |
Subject Characteristics |
su |
Substance Use |
txt |
Text |
tm |
Time |
tot |
Total |
tox |
Toxicity |
tran |
Transition |
trt |
Treatment |
ex |
Treatment or Exposure |
u |
Unique |
u |
Units |
up |
Unplanned |
hi |
Upper |
val |
Value |
var |
Variable |
vs |
Vital Signs |
- SEQ Values: (4.3.2.1) When the --SEQ variable is used, it must
have unique values for each USUBJID within each domain.
- Label Casing: For Dataset labels and variable labels, all non trivial words (more
than three characters) must start with a capital letter with the rest of
the characters lowercase.
- Character Length: CDISC standardizes on
character variables having length of 100 characters. For those
variables with length 40 or greater, it will be recommended to have length
of 100.
- Required Values: For required fields such as
the ones specified in number 1, check to see if there are values. If
there are any missing, values, report the observation number where it is
missing.
- Similar Parenthesis: For labels with
matching values inside parenthesis such as (Yes/No) within the same
dataset, it will check to see if the variables have the same type and
length. If not, it will report the differences.
The findings from the above
evaluation will be stored in a dataset named WORK.CDISC. Each
test case will be identified by a column named "case" which corresponds to
each item listed above. An HTML report will be generated documenting the
findings. The report name will be the same name as the current
program with the (.html) file extension.
Example
%cdisc (datlib=mylib,
datname=ae);
%cdisc (datlib=mylib,
datname=ae*);
|