3 Sources of Real-World Data
3.1 Overview
This chapter covers the primary sources of real-world data (RWD), highlighting their characteristics, strengths, limitations, and appropriate use cases in clinical and translational research.
3.2 Categories of RWD Sources
3.2.1 Electronic Health Records (EHR)
- Captured during routine clinical care.
- Structured data (e.g., lab results, diagnosis codes) and unstructured data (e.g., clinical notes).
- Source for detailed clinical information and longitudinal patient histories.
- Limitations: data quality, missingness, variation across systems, not collected for research purposes.
3.2.2 Administrative Claims Data
- Generated for billing and reimbursement purposes.
- Includes information on diagnoses, procedures, and prescriptions.
- National coverage (e.g., Medicare, Medicaid, private insurers).
- Strengths: standardization, large populations, consistent coding.
- Limitations: lacks clinical detail, potential miscoding.
3.2.3 Registries
- Disease-specific or procedure-specific data collections.
- Often curated with specific inclusion criteria and data standards.
- Examples: cancer registries, transplant registries.
- High-quality and structured, but may lack generalizability.
3.3 Comparing Sources
Data Source | Strengths | Limitations | Best Use Cases |
---|---|---|---|
EHR | Rich clinical data, longitudinal | Incomplete, messy, variable | Clinical outcomes, phenotyping |
Claims | Large scale, consistent coding | Limited clinical granularity | Utilization, economic outcomes |
Registries | Focused, high-quality data | Selection bias, limited population scope | Quality improvement, comparative effectiveness |
PGHD | Real-world behavior, continuous data | Variable quality, integration challenges | Adherence, behavior monitoring |
3.4 Considerations for Researchers
- Provenance: Understand who collected the data, how, and for what purpose (further discussed in chapter __).
- Data Quality: Assess missingness, timeliness, accuracy, and validation (further discussed in chapter __).
- Access and Governance: Determine legal, ethical, and institutional requirements (further discussed in chapter __).
- Population Representativeness: Understand which groups may be underrepresented (further discussed in chapter __).
3.5 Summary
- Different RWD sources serve different research needs.
- Understanding their provenance, structure, and limitations is key to rigorous study design.
- Triangulation of multiple data sources may improve validity but introduces additional complexity.
3.6 Suggested Readings
- FDA Framework for RWD and RWE
- Gliklich et al. (Registries for Evaluating Patient Outcomes: A User’s Guide, AHRQ). Link to PDF