Skip to main content
Study phase:
Dimension:
8
There is no duplication in the data set: data has not been entered twice for the same participant.
Examples

1: Duplication identification process in place to assure single entry of codes assigned to unique subjects by institution across repository data sets (e.g., blood samples, imaging sequences and other assessments).

2: Audits are conducted to reconcile record linkages and assure no duplicates are entered (e.g., image sequence files not submitted twice; subject CRF only entered once in database; or biospecimen have unique IDs at different timepoints and only entered once).

3: For data across multiple platforms assure same subject ID is used in the database. For example, Global unique identifier (GUID). 

Study phase:
Dimension:
7
Each individual has a unique identifier.
Examples

1: The GUID (Global Unique IDentifier) or the GUPI (Generated Unique Patient Identifier) is a method of ensuring that patients are not duplicated across datasets. 

Study phase:
Dimension:
6
Relational databases have been appropriately normalised: steps have been taken to eliminate redundant data and remove potentially inconsistent or overly complex data dependencies.
Examples

1: Similar data will be collected in one table.

2: Repeating data is not collected in separate fields, but rather identified by a time point indicator. 

Study phase:
Dimension:
5
Variables are named and encoded in a way that is easy to understand.
Examples

1: When data is extracted from the database, it should be easy to recognize what the variable names refer to.

2: When there are many variables of the same kind, the names follow a pattern that is consistent.

3: The same name should be used across the CRF and database design so that confusion is minimised. 

Study phase:
Dimension:
4
Data-types are specified for each variable.
Examples

1: Specifications for floating point, integer, factor or free text

Study phase:
Dimension:
3
The data ontology is consistent with published standards (common data elements) to the greatest extent possible.
Examples

1: Data base fields conform as appropriate to FITBIR NINDS Common Data Elements for Traumatic Brain Injury; CDISC; CDASH; and Abbreviated Injury Scores.

2: Pre-specified data ontology at design-time.

3: Published data elements used as far as possible. However, when not possible, deviations from published specifications clearly documented and have robust scientific justification.

Subscribe to Design-time