This runbook executes Phase 2 in an isolated output folder so Phase 1 code and artifacts remain unchanged.
Required:
artifacts/phase-1/controls_all.jsonlOptional:
.csv or .jsonl (for example ENISA/PRC extracts)data/raw/OPP-115 (reference corpus)data/raw/OPP-115.--public-input, which expects a flat .csv or .jsonl table.event_date, sector, records_affected).--public-input and use OPP-115 only as a reference corpus.Create processed OPP-115 flat files (default: consolidation threshold 0.75):
PYTHONPATH=src python scripts/process_opp115_for_phase2.py
Create processed OPP-115 files from the raw annotation set instead of consolidated annotations:
PYTHONPATH=src python scripts/process_opp115_for_phase2.py \
--input-set annotations
Equivalent package command:
prert-opp115
Run Phase 2 using OPP-115 as a reference corpus only (no --public-input):
PYTHONPATH=src python scripts/run_phase2_metrics.py
Run Phase 2 with a processed OPP-115 CSV export:
PYTHONPATH=src python scripts/run_phase2_metrics.py \
--public-input data/processed/opp115_public_mapping.csv
Run Phase 2 with a processed OPP-115 JSONL export:
PYTHONPATH=src python scripts/run_phase2_metrics.py \
--public-input data/processed/opp115_public_mapping.jsonl
Run Phase 2 with processed OPP-115 input and a custom output folder:
PYTHONPATH=src python scripts/run_phase2_metrics.py \
--public-input data/processed/opp115_public_mapping.csv \
--output-dir artifacts/phase-2
Written to artifacts/phase-2/:
metric_specs.jsonlsynthetic_events.jsonlpublic_data_mapped.jsonlbaseline_scores.jsonlphase2_manifest.jsonsynthetic_data_dictionary.mdphase2_manifest.json.coverage_summary.mapped_controls == total_controlsphase2_manifest.json.coverage_summary.missing_controls is empty.baseline_scores.jsonl values for compliance_score and risk_score stay within [0, 1].public_data_mapped.jsonl rows with missing required fields are flagged in dq_missing_required_fields.artifacts/phase-1/ are modified.| ⬅ Back | Next ⮕ |