Skip to content

Adding pred_measure feature#363

Draft
florence-bockting wants to merge 21 commits into
loo-v3.0.0from
pred_measure
Draft

Adding pred_measure feature#363
florence-bockting wants to merge 21 commits into
loo-v3.0.0from
pred_measure

Conversation

@florence-bockting

@florence-bockting florence-bockting commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Fixes #281

Summary

Adds a unified predictive performance measures API on top of loo-v3.0.0. Density scores, scoring rules, and point-prediction metrics can be computed under in-sample, PSIS-LOO, k-fold, and holdout evaluation via *_pred_measure().

Design log: developer-notes.Rmd (rendered: developer-notes.md)
Related: #363

Issues addressed

This PR consolidates work that was tracked across several related issues:

How to navigate this PR

This is a large PR. These documents explain design, scope, and current status:

What's new

  • Entry points: insample_pred_measure(), loo_pred_measure(), kfold_pred_measure(), test_pred_measure(), pred_measure()
  • Built-in measures: measure_*() + supported_measures_list() (incl. r2, classification metrics, Brier, MAE/MSE/RMSE, ELPD/IC)
  • Unified scoring rule: measure_rps() / measure_srps() — single ypred matrix, PWM/ECDF estimator; covers continuous and ordered categorical outcomes (CRPS/RPS and scaled variants in the new API)
  • Custom measures: pass a function (or named list of functions) to measure
  • PSIS reuse: supply psis_object or reuse weights from a prior loo_pred_measure() result via predperf
  • S3 print methods for pred_measure objects (incl. Pareto-$\hat{k}$ for LOO)
  • Docs: migration-guide vignette + website-only overview-measures and pred-measure-workflow articles
  • Tests & CI: test suite, pre-fitted .Rds fixtures, pkgdown vignette-data generation step

Deprecations (implementations retained)

Existing functions are deprecated but still work with their current APIs:

Deprecated New workflow
elpd() measure_elpd() / *_pred_measure()
crps(), scrps() (x, x2) measure_rps(), measure_srps() (ypred)
loo_crps(), loo_scrps() loo_pred_measure(..., measure = "rps" | "srps")
loo_predictive_metric() loo_pred_measure()

In *_pred_measure(), elpd is computed automatically when ylp is supplied; request measure = "ic" separately for the information criterion.

The new RPS functions use a different estimator (one draw matrix, PWM) than deprecated CRPS (two draw matrices, permutation). Results are highly correlated but not identical — see the comparison section and figures in developer-notes.Rmd. Comparison tests are in tests/testthat/test_crps.R.

Dependencies

Vignettes, test-data generation, and pkgdown CI install brms from GitHub master (paul-buerkner/brms), because k-fold support for categorical / multinomial models (brms#1890, fixing brms#1889) is not yet on CRAN. Once a CRAN release includes that fix, the GitHub pin can be dropped.

Known limitations

  • group_ids grouping not yet implemented
  • loo_compare integration still outstanding
  • Additional documentation still in progress (formula derivations article, glossary extension)
  • brms GitHub pin remains until a CRAN release includes the categorical k-fold fix

See developer-notes.Rmd for open design questions and
remaining tasks.

Acknowledgements

This work builds on an initial prototype of the pred_measure feature implemented
by @VisruthSK (see VisruthSK/loo-sandbox)
under guidance from @jgabry.

Test plan

  • devtools::test() and devtools::check() pass
    • devtools::test(): 1 failure — unrelated test_psislw.R snapshot drift (Warning in psislw()Warning:)
    • R CMD check: 1 error — kfold_pred_measure() roxygen example uses measure = "rmse" without mupred
  • Deprecated functions warn (test_deprecated_measures.R)
  • Deprecated functions return same results (snapshot tests in test_crps.R; not re-diffed against loo-v3.0.0 branch)
  • CRPS/RPS comparison tests pass (test_crps.R, 32 tests)
  • New *_pred_measure() examples run (migration guide + workflow vignette; rendered locally with NOT_CRAN=true)
  • pkgdown site builds with GitHub brms (workflow + migration-guide articles render; full pkgdown::build_site() not verified locally — existing docs/ blocks build)

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 023add0 is merged into loo-v3.0.0:

  • ✔️loo_function: 1.9s -> 1.91s [-0.35%, +0.74%]
  • ✔️loo_matrix: 1.77s -> 1.78s [-0.16%, +0.61%]
    Further explanation regarding interpretation and methodology can be found in the documentation.

@florence-bockting

florence-bockting commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Linking here to Issue #179 as it seems to be about loo scores. We should see whether it is still relevant and can be implemented within loo refactoring.

@florence-bockting

florence-bockting commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

There is an open issue requesting a subset() method for psis objects #343 and an (unrelated) open PR #110. Might be worth considering whether we can address this as well in this refactoring PR. If yes, we should

Moved this to PR #379

@VisruthSK VisruthSK added this to the v3.0.0 milestone Jun 9, 2026
@florence-bockting florence-bockting mentioned this pull request Jul 1, 2026
6 tasks
@florence-bockting florence-bockting changed the title LOO refactoring and adding pred_measure feature Adding pred_measure feature Jul 1, 2026
@florence-bockting florence-bockting changed the base branch from master to loo-v3.0.0 July 1, 2026 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants