Data Standardization
stan-dard-ize – 1: to compare with a standard; 2: to bring into conformity with a standard
trans-form – 1a : to change in composition or structure; b : to change the outward form or appearance of; c : to change in character or condition, convert; 2: to subject to mathematical transformation
Analysts of clinical data will often find it useful to standardize or transform data for various reasons. For example, it is often necessary to compare or combine data sets for validation, to increase sample size, or to test the generalizability of clinical conclusions. Some data may require standardization before comparisons with other data sets. Data standardization is also often necessary to comply with data sharing requirements of public repositories, such as those of the NIH’s National Center for Biotechnology Information, and for publication in peer-reviewed journals. Portable electronic medical records also require standardization, via protocols such as HL7. Ultimately, data standardization can be useful to enable statistical validity as well as clinical insight.
There are a variety of important analytical issues that should be considered during the data standardization process. Some data may be directly comparable, after assessing qualitative criteria for comparisons. The saying, “comparing apples to apples instead of apples to oranges” reflects such a qualitative judgment. For example, data from different studies will often have batch, time, center, assay or other types of variability that may affect its comparability. Therefore, when combining data sets and performing meta-analyses, it can be useful to define meta-data (i.e., data about the data) so that other analysts and reviewers can judge for themselves whether comparisons are valid. As an example, an analyst may wish to compare serum protein levels from two different clinical populations. In this case, the data would be the actual measured values, while the meta-data might include information such as the age and gender of the subjects, the manufacturer of the test, and the units in which results are reported. The question of how similar meta-data needs to be to compare data is ultimately a subjective one.
