This runbook executes the Phase 3 classifier-and-risk pipeline in an isolated output folder.
naive_bayes, logreg_tfidf, or privacybert).From the latest full-data comparable runs:
artifacts/phase-3-nb/ using multinomial_naive_bayesartifacts/phase-3-logreg/ using logreg_tfidfopp115::consolidation-0.75Default source:
data/raw/OPP-115Optional source:
text, label, policy_uid (optional: example_id, category)data/raw/Polisis/normalized) with .jsonl and/or .csv files.Normalized Polisis row contract:
textcategorylabel (user|system|organization), policy_uid, example_idRun Phase 3 baseline using OPP-115 default input set:
PYTHONPATH=src python scripts/run_phase3_baseline.py
Equivalent package command:
prert-phase3
Run with a custom labeled JSONL dataset:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--labeled-input-path data/processed/phase3_labeled.jsonl \
--output-dir artifacts/phase-3
Run with normalized Polisis source files:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--polisis-root data/raw/Polisis \
--polisis-input-set normalized \
--output-dir artifacts/phase-3
Run a bounded sample for quick iteration:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--max-rows 5000 \
--seed 42
Run with explicit run metadata and measurement controls:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--run-id phase3-2026-04-07-a \
--calibration-bins 10 \
--bootstrap-resamples 1000 \
--output-dir artifacts/phase-3-nb
Run with the upgraded TF-IDF + weighted logistic regression model:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--model-type logreg_tfidf \
--max-features 20000 \
--ngram-max 2 \
--max-iter 1000 \
--output-dir artifacts/phase-3-logreg
Run with the PrivacyBERT backend scaffold:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--model-type privacybert \
--privacybert-model-name bert-base-uncased \
--privacybert-epochs 2 \
--privacybert-batch-size 8 \
--privacybert-learning-rate 5e-5 \
--privacybert-max-length 256 \
--output-dir artifacts/phase-3-privacybert
Run with custom Bayesian priors (enabled by default):
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--model-type logreg_tfidf \
--bayesian-priors-path configs/phase3_bayesian_priors.json \
--bayesian-top-k 5 \
--output-dir artifacts/phase-3
Disable Bayesian scoring output (benchmark/diagnostic only):
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--model-type logreg_tfidf \
--disable-bayesian-scoring \
--output-dir artifacts/phase-3-no-bayes
Run a comparable full-data Naive Bayes baseline for side-by-side benchmarking:
PYTHONPATH=src python scripts/run_phase3_baseline.py \
--model-type naive_bayes \
--output-dir artifacts/phase-3-nb
Run a proposal-aligned Phase 3 acceptance freeze (PrivacyBERT + Bayesian-primary checks):
PYTHONPATH=src python scripts/run_phase3_acceptance_freeze.py \
--model-type privacybert \
--strict \
--output-dir artifacts/phase-3-freeze
Run acceptance freeze with Polisis advisory reporting (non-blocking for current milestone):
PYTHONPATH=src python scripts/run_phase3_acceptance_freeze.py \
--polisis-root data/raw/Polisis \
--polisis-input-set normalized \
--output-dir artifacts/phase-3-freeze
Written to the selected --output-dir (for example artifacts/phase-3/, artifacts/phase-3-nb/, or artifacts/phase-3-logreg/):
training_dataset.jsonlvalidation_dataset.jsonltest_dataset.jsonldataset_manifest.jsonclassifier_checkpoint/model.json (naive_bayes)classifier_checkpoint/model.pkl (logreg_tfidf)classifier_checkpoint/privacybert/ (privacybert)classifier_metrics.jsonclassifier_metrics.jsonlvalidation_predictions.jsonltest_predictions.jsonlcalibration_validation.jsoncalibration_test.jsonthreshold_sweep_validation.jsonthreshold_sweep_test.jsonbootstrap_ci_validation.jsonbootstrap_ci_test.jsonbayesian_risk_validation.json (when Bayesian scoring enabled)bayesian_risk_test.json (when Bayesian scoring enabled)model_card.mdscoring_spec.mdprototype_demo.mdphase3_manifest.jsonartifacts/phase3_run_history.jsonl (canonical run-history index)phase3_acceptance_report.json (acceptance-freeze runs)phase3_acceptance_report.md (acceptance-freeze runs)dataset_manifest.json.policy_overlap.* == 0classifier_metrics.json.validation.macro_f1 and classifier_metrics.json.test.macro_f1 are in [0, 1]classifier_metrics.json.bayesian.enabled == true for Bayesian-primary runsclassifier_metrics.json.bayesian.primary_score is in [0, 1] when enabledcalibration_test.json.overall.ece and calibration_test.json.overall.brier are in [0, 1]threshold_sweep_test.json.by_label.*[].precision|recall|f1 are in [0, 1]bootstrap_ci_test.json.metrics.*.interval_95.lower <= upperphase3_manifest.json includes input config, split counts, and output file referencesphase3_manifest.json.primary_metric_surface is bayesian_posterior for default runsphase3_manifest.json.execution_metadata.run_id and executed_at are populatedRun pipeline tests:
PYTHONPATH=src pytest -q tests/test_phase3_pipeline.py tests/test_phase3_analytics.py
Run the cross-phase regression check used in this workspace:
PYTHONPATH=src pytest -q tests/test_phase2_pipeline.py tests/test_phase3_pipeline.py
Regenerate dashboard figures (Figure 5-17):
PYTHONPATH=src python scripts/generate_phase3_dashboard_figures.py
| ⬅ Back | Next ⮕ |