Derive
The derive module is the main phenotype construction layer. It converts raw UKB fields and multi-source records into analysis-ready covariates, disease outcomes, event dates, timing variables, and follow-up durations.
Scope
| Group | Functions |
|---|---|
| General variables | derive_missing(), derive_covariate(), derive_cut() |
| Disease sources | derive_selfreport(), derive_hes(), derive_first_occurrence(), derive_cancer_registry(), derive_death_registry() |
| Case definitions | derive_icd10(), derive_case() |
| Survival variables | derive_timing(), derive_age(), derive_followup() |
Workflow Role
Use derive_missing() early to normalise UKB special missing values. Build source-specific disease variables next, combine them with derive_icd10() or derive_case(), and finish by deriving timing, age, and follow-up variables for downstream association models.
Review Focus
- consistent date parsing across source types;
- unambiguous prevalent and incident case definitions;
- stable naming conventions for status, date, age, timing, and follow-up columns;
- clear warnings when requested source columns are absent.