3 Sources of Real-World Data

3.1 Overview

This chapter covers the primary sources of real-world data (RWD), highlighting their characteristics, strengths, limitations, and appropriate use cases in clinical and translational research.

3.1.1 Learning Objectives

  • Understand the major categories of RWD sources.
  • Compare the advantages and limitations of different data sources.
  • Identify appropriate use cases for each source of RWD.
  • Recognize the implications of data provenance for study design and interpretation.

3.2 Categories of RWD Sources

3.2.1 Electronic Health Records (EHR)

  • Captured during routine clinical care.
  • Structured data (e.g., lab results, diagnosis codes) and unstructured data (e.g., clinical notes).
  • Source for detailed clinical information and longitudinal patient histories.
  • Limitations: data quality, missingness, variation across systems, not collected for research purposes.

3.2.2 Administrative Claims Data

  • Generated for billing and reimbursement purposes.
  • Includes information on diagnoses, procedures, and prescriptions.
  • National coverage (e.g., Medicare, Medicaid, private insurers).
  • Strengths: standardization, large populations, consistent coding.
  • Limitations: lacks clinical detail, potential miscoding.

3.2.3 Registries

  • Disease-specific or procedure-specific data collections.
  • Often curated with specific inclusion criteria and data standards.
  • Examples: cancer registries, transplant registries.
  • High-quality and structured, but may lack generalizability.

3.2.4 Patient-Generated Health Data (PGHD)

  • Includes data from wearable devices, apps, home monitoring tools.
  • Provides real-time insights outside of clinical settings.
  • Challenges: data volume, reliability, integration with clinical systems.

3.2.5 Other Sources

  • Public health surveillance databases.
  • Social determinants of health datasets.
  • Biobanks and genomic databases.
  • Data from pragmatic trials and learning health systems.

3.3 Comparing Sources

Data Source Strengths Limitations Best Use Cases
EHR Rich clinical data, longitudinal Incomplete, messy, variable Clinical outcomes, phenotyping
Claims Large scale, consistent coding Limited clinical granularity Utilization, economic outcomes
Registries Focused, high-quality data Selection bias, limited population scope Quality improvement, comparative effectiveness
PGHD Real-world behavior, continuous data Variable quality, integration challenges Adherence, behavior monitoring

3.4 Considerations for Researchers

  • Provenance: Understand who collected the data, how, and for what purpose (further discussed in chapter __).
  • Data Quality: Assess missingness, timeliness, accuracy, and validation (further discussed in chapter __).
  • Access and Governance: Determine legal, ethical, and institutional requirements (further discussed in chapter __).
  • Population Representativeness: Understand which groups may be underrepresented (further discussed in chapter __).

3.5 Summary

  • Different RWD sources serve different research needs.
  • Understanding their provenance, structure, and limitations is key to rigorous study design.
  • Triangulation of multiple data sources may improve validity but introduces additional complexity.

3.6 Suggested Readings