weppcloud integration plan for Culvert_web_app

Implementation plan based on docs/culvert-at-risk-integration/weppcloud-integration.spec.md

Guiding requirements (spec highlights)

Endpoint: POST /rq-engine/api/culverts-wepp-batch/ (multipart, FastAPI rq-engine to avoid 30s Caddy timeout; extend timeout there as needed).
Storage: /wc1/culverts/<culvert_batch_uuid>/ with per-culvert runs under /runs/<Point_ID>/ and _base/ seeded from culvert.cfg.
Payload ZIP: topo/breached_filled_DEM_UTM.tif + topo/streams.tif + culverts/culvert_points.geojson + culverts/watersheds.geojson + metadata.json + model-parameters.json; all inputs in the same projected CRS (meters).
GeoJSON validation: culvert_points must use Point geometries, watersheds must use Polygon/MultiPolygon geometries, and each GeoJSON includes a named CRS matching the rasters.
DEM handling: new Ron.symlink_dem() to symlink the canonical DEM into each run and populate ron.map.
Streams: provided by Culvert_web_app (no mcl/csa parameters needed).
Watersheds: GeoJSON polygons with Point_ID attribute (no raster, no culvert_id_map needed).
RQ job status: use /rq-engine/api/jobstatus/{job_id}; artifacts via browse/download under /weppcloud/culverts/<batch_uuid>/browse/ and /weppcloud/culverts/<batch_uuid>/download/{subpath} (requires browse_token from submit response). Batch archive path: /weppcloud/culverts/<batch_uuid>/download/weppcloud_run_skeletons.zip.
Outputs: per-culvert GeoJSON + parquet + WEPP interchange; batch-level batch_summary.json plus per-run run_metadata.json.
Limits: max ZIP 2GB, max 300 culverts; error responses are structured 400s.

Phase 0 - Contract finalization and fixtures (COMPLETE)

Scope: finalize metadata.json + model-parameters.json schema, idempotency rules, retention policy; use the Santee_10m_no_hydroenforcement project (local at /wc1/culvert_app_instance_dir/user_data/) as the baseline payload for validation.
Dependencies: Culvert_web_app owners for schema fields and retention expectations; ops for cleanup window and storage constraints; security for initial auth choice.
Deliverables: JSON schema docs for metadata.json/model-parameters.json, minimal sample payloads (synthetic + real), updated spec notes on retry/idempotency/retention.
Risks: schema churn after implementation starts; large real payloads exceeding test budgets; mismatch between culvert outputs and payload contract.
Verification: validated the baseline payload from Santee_10m_no_hydroenforcement; created tests/culverts/test_payloads/santee_10m_no_hydroenforcement/payload.zip and tests/culverts/test_payloads/manifest.md. The second payload will be added once the user 1 copy completes.

`metadata.json` schema (v1)

schema_version (string, required; culvert-metadata-v1)
source (object, required: system string, project_id string, user_id string optional)
created_at (ISO 8601 string, required)
culvert_count (int, required)
crs (object, required: proj4 string, epsg int optional)
dem (object, required: path string, resolution_m number, width int, height int, nodata number)
streams (object, required: path string, nodata number, value_semantics = binary)
culvert_points (object, required: path string, point_id_field = Point_ID, feature_count int optional)
watersheds (object, required: path string, point_id_field = Point_ID, feature_count int optional)
flow_accum_threshold (int, optional; preserved for traceability when provided in metadata)
hydro_enforcement_select (string, optional; normalized from hydroEnforcementSelect in Culvert_web_app)

Notes:

culvert_batch_uuid is minted by wepp.cloud and returned in the API response (not required in metadata.json).
Payload hash/size are computed by wepp.cloud at upload time and are not required in metadata.json.

`model-parameters.json` schema (v1)

schema_version (string, required; culvert-model-params-v1)
base_project_runid (string, optional)
nlcd_db (string, optional; overrides landuse.nlcd_db)
order_reduction_passes (integer, optional; overrides culvert_runner.order_reduction_passes)
flow_accum_threshold (integer, optional; flow accumulation threshold from Culvert_web_app)

Notes:

mcl/csa parameters are NOT included—streams are pre-computed by Culvert_web_app and provided in topo/streams.tif.
Climate duration and soils DB use defaults from culvert.cfg (no override keys in v1).

Phase 1 - API ingestion, validation, and job enqueue (rq-engine) (COMPLETE)

Scope: implement /rq-engine/api/culverts-wepp-batch/ in rq-engine (FastAPI); accept multipart upload, mint culvert_batch_uuid, validate payload inline (payload_validator), extract payload, enqueue RQ job, return {job_id, culvert_batch_uuid, status_url, browse_token, browse_token_expires_at}. Add /rq-engine/api/culverts-wepp-batch/{batch_uuid}/retry/{point_id} for flake-checking reruns. This keeps validation inside the ingestion path and avoids the 30s Caddy timeout applied to weppcloud routes. Long term, migrate /rq/api/* to /rq-engine/api/*.
Request parameters (optional): zip_sha256, total_bytes to capture payload metadata since the ZIP is created client-side.
Dependencies: Phase 0 schema decisions; RQ queue configuration; open endpoint for POC (auth deferred to Phase 6).
Deliverables: rq-engine route + request/response contract; payload_validator module + error types; RQ job function stub (run_culvert_batch_rq); batch_metadata.json written at batch root; logging to batch root.
Risks: upload timeouts for large ZIPs; unbounded disk usage; duplicate POSTs creating multiple batches without idempotency key; CRS parsing differences between raster/GeoJSON libs; Point_ID dtype mismatches; DEM/streams extent alignment.
Verification: tests for rq-engine ingestion/validation (happy + invalid payloads); manual curl with a real payload; verify status via /rq-engine/api/jobstatus/{job_id}.

Phase 2 - Culvert batch runner scaffolding (COMPLETE)

Scope: create CulvertsRunner (or extend BatchRunner) to manage batch state/logging; implement _base project copy using culvert.cfg and base_runid; create per-culvert run dirs; add Ron.symlink_dem() and symlink topo/flovec.tif + topo/netful.tif into each run; map Point_ID to run IDs; set run_group metadata; add symlink_channels_map (WBT-only) to use the shared netful raster instead of build_channels.
Dependencies: Phase 1 job entrypoint; culvert.cfg definition; agreement on runid template and run_group name.
Deliverables: runner class + nodb state; Ron.symlink_dem() implementation; symlink_channels_map that enforces WBT backend in raster mode and emits netful.geojson via polygonize_netful() + json_to_wgs(); culvert config template; stub updates if .pyi exists.
Risks: filesystem contention when creating many runs; symlink breakage on cleanup; runid collisions for non-unique Point_ID.
Verification: unit tests for Ron.symlink_dem() and run directory creation; integration test that _base + one run hydrate NoDb singletons without errors.

Phase 1/2 combined handoff summary

Implemented rq-engine package split (wepppy/microservices/rq_engine/) with APIRouter modules; /rq-engine/api/culverts-wepp-batch/ now accepts multipart payload.zip, validates inline, extracts to /wc1/culverts/<culvert_batch_uuid>/, writes batch_metadata.json, enqueues run_culvert_batch_rq, and returns {job_id, culvert_batch_uuid, status_url, browse_token, browse_token_expires_at}.
Validator lives in wepppy/microservices/culvert_payload_validator.py (required files, CRS alignment, DEM/streams alignment, Point_ID coverage) and returns structured 400s.
RQ entrypoint stub lives in wepppy/rq/culvert_rq.py (StatusMessenger wiring, TIMEOUT=43200).
CulvertsRunner NoDb added (wepppy/nodb/culverts_runner.py) to create per-culvert runs under /wc1/culverts/<culvert_batch_uuid>/runs/<Point_ID>/, set run_group culvert;;<batch_uuid>;;<runid>, and record completion metadata.
Ron.symlink_dem() + Watershed.symlink_channels_map() added to reuse shared DEM/topo rasters (WBT-only) and generate netful.geojson/netful.WGS.geojson.
Added culvert.cfg with [culvert_runner] base_runid and culvert run_group resolution in get_wd() (uses CULVERTS_ROOT); CulvertsRunner now copies the base run into each batch _base.
Tests added in tests/microservices/test_rq_engine_culverts.py, tests/microservices/test_rq_engine_jobinfo.py, and tests/culverts/test_culverts_runner.py; run with wctl run-pytest tests/microservices/test_rq_engine_culverts.py, wctl run-pytest tests/microservices/test_rq_engine_jobinfo.py, and wctl run-pytest tests/culverts/test_culverts_runner.py.
Verification (2025-01-05): wctl run-pytest tests/culverts/test_culverts_runner.py tests/microservices/test_rq_engine_culverts.py (pass).
Dependency update: add python-multipart==0.0.12 to docker/requirements-uv.txt and rebuild the weppcloud image so multipart parsing works in fresh containers.
Model-parameters overrides now applied during run setup (base_project_runid, nlcd_db).

Phase 3 - Per-culvert WEPP orchestration (COMPLETE)

Scope: orchestrate delineation, landuse, soils, climate, and WEPP per culvert using existing run tasks; incorporate WhiteboxToolsTopazEmulator; use symlink_channels_map to avoid build_channels; apply model-parameters overrides; record per-run success/failure in run_metadata.json.
Dependencies: Phase 2 scaffolding; availability of WBT, PRISM, soils datasets in container; confirmed model-parameters schema.
Deliverables: RQ orchestration pipeline; per-run execution logs; per-culvert metadata record (timings, versions, config).
Risks: long runtimes for 300 culverts; missing datasets for runs; error isolation (one culvert failure should not cancel entire batch).
Verification: integration test with a tiny payload (1-2 culverts) using mocked heavy tasks; manual run with the santee_mini_4culverts payload in the container, confirming per-culvert outputs.

Phase 3 handoff summary

Orchestration updated in wepppy/rq/culvert_rq.py to create runs, load watershed features, and execute the per-culvert pipeline sequentially: find_outlet (watershed polygon), build_subcatchments, abstract_watershed, build_landuse, build_soils, build_climate, run_wepp_hillslopes, run_wepp_watershed.
Per-culvert failures are isolated (caught/logged) and do not stop later runs; completion/retention timestamps are set only after all runs finish.
run_metadata.json is written per run with runid, point_id, culvert_batch_uuid, config, status, started_at, completed_at, duration_seconds, optional wepppy_version, and error details when failed.
CulvertsRunner.load_watershed_features() added to reuse Point_ID validation and construct WatershedFeature objects; stubs updated.
Integration test added in tests/culverts/test_culvert_orchestration.py (monkeypatched heavy methods) to validate metadata creation, failure isolation, and completed_at set after processing.
Verification: wctl run-pytest tests/culverts/test_culvert_orchestration.py (pass; warnings only).
Manual run: santee_mini_4culverts payload completed end-to-end after fixes to WBT outlet access, shared topo generation (flovec/netful/chnjnt), and WBT symlink handling.
BatchRunner WEPP post-processing now explicitly ensures hillslope interchange outputs, totalwatsed3.parquet, watershed interchange outputs, and query-engine activation when missing (mirrors _build_hillslope_interchange_rq, _build_totalwatsed3_rq, _post_watershed_interchange_rq behavior).
Consolidated WEPP post-processing helpers into wepppy/nodb/wepp_nodb_post_utils.py (and .pyi) and refactored wepppy/nodb/batch_runner.py + wepppy/rq/culvert_rq.py to use the shared utilities.

Phase 3b - Parallelized culvert/batch execution (RQ batch queue)

Status: complete.
Scope: enqueue one RQ job per culvert run (mirrors batch_rq pattern) so culvert batches execute in parallel without blocking interactive workloads; use a dedicated rq-worker-batch service and queue for both batch + culvert jobs.
Dependencies: Phase 3 orchestration complete; RQ queue routing decisions; agreement on per-job runid format so logging attaches to rq.log.
Deliverables:
- New RQ queue (e.g., batch) for culvert + batch work; culvert orchestrator enqueues per-run jobs into this queue and uses a finalizer job with depends_on.
- New rq-worker-batch service in dev/prod compose (4 workers) listening to batch only; increase rq-worker (default queue) to 6 workers for interactive traffic.
- Concurrency clamps for batch workers:
  - Set WEPPPY_NCPU=6 in the rq-worker-batch service (caps NCPU-driven pools in climate/watershed/etc).
Risks: nested pools (ProcessPool + ThreadPool) inside a single culvert run can oversubscribe CPU if batch worker count is too high; misrouted jobs could starve interactive queues.
Verification: enqueue a small payload (1–2 culverts) and confirm parallel job fan-out + finalizer; validate worker isolation by running an interactive job on the default queue while batch queue executes.

Phase 3b handoff summary

Status: complete.
Routed culvert ingestion and batch fan-out to the batch RQ queue in wepppy/microservices/rq_engine/culvert_routes.py and wepppy/rq/batch_rq.py.
Split culvert orchestration in wepppy/rq/culvert_rq.py into run_culvert_batch_rq (orchestrator), run_culvert_run_rq (per-run worker, runid first arg for rq.log), and _final_culvert_batch_complete_rq (finalizer); per-run jobs are enqueued in Queue("batch") with depends_on for the finalizer.
Orchestrator records per-run job IDs in both job.meta and CulvertsRunner._runs[run_id]["job_id"]; finalizer reads per-run run_metadata.json to compute totals and writes batch_summary.json while updating CulvertsRunner._completed_at and _retention_days.
WEPPPY_NCPU caps added for flowpath pools, soil prep, hillslope runs, fixed climate pools, and watershed interchange task fan-out in wepppy/nodb/core/wepp.py, wepppy/nodb/core/climate.py, and wepppy/wepp/interchange/watershed_interchange.py (no behavior change when unset).
Stubs updated in wepppy/rq/culvert_rq.pyi and stubs/wepppy/rq/culvert_rq.pyi.
Tests updated in tests/culverts/test_culvert_orchestration.py to call run_culvert_run_rq + finalizer; verification: wctl run-pytest tests/culverts/test_culvert_orchestration.py tests/microservices/test_rq_engine_culverts.py (pass; warnings only).

Phase 3c - Per-run creation (batch_rq parity + parallel run setup)

Status: complete.
Scope: move culvert run creation into per-run RQ jobs to match the batch_rq/BatchRunner.run_batch_project pattern, avoiding a serial create_runs() step in the orchestrator.
Dependencies: Phase 3b fan-out in place; _base creation is reliable; runid format and run_group metadata finalized.
Deliverables:
- CulvertsRunner.create_run_if_missing() (idempotent) that creates a single run directory, rewrites .nodb metadata, clears caches/locks, and symlinks DEM + topo rasters.
- run_culvert_batch_rq updated to only compute run_ids, ensure _base, and enqueue per-run jobs + finalizer (no full create_runs()).
- run_culvert_run_rq updated to call create_run_if_missing() before executing the per-run pipeline.
- Optional: helper to load a single watershed feature by Point_ID to avoid re-parsing the full GeoJSON on every run.
Risks: race conditions if multiple jobs attempt the same run_id; partial run directories if a job crashes mid-copy; repeated GeoJSON parsing overhead.
Verification: run a small payload (1–2 culverts) and confirm per-run creation happens inside worker jobs; verify _base remains unchanged, per-run logs are scoped correctly, and reruns are idempotent.

Phase 3c handoff summary

Status: complete.
Added CulvertsRunner.create_run_if_missing() to copy _base into runs/<run_id>, rewrite .nodb state (wd, _run_group, _group_name), clear Redis cache/locks, and symlink DEM + WBT topo rasters (flovec, netful, relief, chnjnt) in wepppy/nodb/culverts_runner.py.
run_culvert_batch_rq now loads run_ids, ensures _base, and enqueues per-run jobs without serial create_runs(); per-run jobs create runs on demand in wepppy/rq/culvert_rq.py.
run_culvert_run_rq now instantiates CulvertsRunner when missing and calls create_run_if_missing() before executing the pipeline, preserving run_group semantics (culvert;;<batch_uuid>;;<run_id>).
Orchestrator still records job IDs in CulvertsRunner._runs while create_run_if_missing() preserves existing entries to avoid clobbering job_id.
Stubs updated in wepppy/nodb/culverts_runner.pyi and stubs/wepppy/nodb/culverts_runner.pyi.
Tests updated to assert per-run job creates the run directory and is idempotent; run with wctl run-pytest tests/culverts/test_culvert_orchestration.py.

Phase 3e - Stream pruning + order reduction (culvert batches)

Status: complete.
Scope: add order_reduction_passes to CulvertsRunner and post-process topo/netful.tif once per batch before per-run jobs are enqueued: prune short streams (WBT remove_short_streams, min_length = 2 * cellsize_m), compute a Strahler order raster from the pruned stream mask, run PruneStrahlerStreamOrder N times (binary output on the final pass), then generate chnjnt.tif from the final netful.tif.
Sanity check: applying the prune once at the batch root keeps the per-run symlink flow intact and avoids redundant work in each culvert job; overwriting netful.tif is acceptable because the payload is per-batch and isolated.
Dependencies: whitebox_tools from the weppcloud fork (/workdir/weppcloud-wbt) must expose StrahlerStreamOrder and PruneStrahlerStreamOrder (with --binary_output); payload provides topo/streams.tif.
Deliverables:
- CulvertsRunner.order_reduction_passes NoDb property, read from [culvert_runner] order_reduction_passes (default culvert.cfg value; allow 0 to disable).
- run_culvert_batch_rq prunes short streams (remove_short_streams), builds a Strahler order raster from the pruned stream mask, then prunes stream order (PruneStrahlerStreamOrder) with binary output on the final pass, then generates topo/chnjnt.tif from the final topo/netful.tif before enqueuing culvert runs.
- Logs emitted showing pass count and inputs/outputs; failure raises early before jobs are enqueued.
Risks: pruning changes the number of channels/hillslopes; ensure the weppcloud-wbt fork with --binary_output is deployed so downstream binary stream masks stay compatible with polygonize_netful.
Verification: run santee_mini_4culverts with order_reduction_passes=1 and compare hillslope counts/logs before/after; confirm pruned netful.tif is used in dem/wbt/netful.tif symlinks, chnjnt.tif is regenerated from the pruned netful raster, and the batch completes.

Phase 3f - Stream coverage validation + per-run fallback

Status: complete.
Scope: ensure each run has stream pixels inside the watershed boundary before outlet detection. If the pruned topo/netful.tif has zero stream pixels within the watershed mask, fall back to the full topo/streams.tif for that run only and use the batch-generated topo/chnjnt.streams.tif so junctions align with the fallback stream map. Allow Watershed.find_outlet() to run with a pre-built target_watershed.tif (no feature argument) so culvert orchestration can reuse cached masks.
Deliverables:
- CulvertsRunner stream-source selection helper that rasterizes the watershed polygon into dem/target_watershed.tif, checks stream coverage, and selects either netful.tif + chnjnt.tif or streams.tif + chnjnt.streams.tif.
- run_culvert_batch_rq generates topo/chnjnt.streams.tif once per batch (alongside topo/chnjnt.tif) for per-run symlinking.
- Watershed.find_outlet() accepts an optional WatershedFeature and uses the cached target_watershed.tif when present.
- Stubs/tests updated to cover the new optional signature and payload fixtures include topo/streams.tif.
Risks: fallback stream maps can increase channel density for specific culverts; ensure junction maps always match the chosen stream raster.
Verification: run a batch where a culvert polygon does not intersect pruned netful.tif and confirm the run uses streams.tif + chnjnt.streams.tif while other runs keep netful.tif.

Phase 3g - LPT queue ordering (area proxy)

Status: complete.
Scope: compute per-feature area inside WatershedFeature (geodesic for lat/long, planar for projected CRS) and enqueue runs in descending area to reduce stragglers for large batches/culvert collections.
Deliverables:
- WatershedFeature.area_m2 cached property with geodesic fallback for geographic CRS.
- run_culvert_batch_rq orders run_ids by descending area_m2.
- run_batch_rq orders watershed jobs by descending area_m2 (LPT).
Risks: area is a proxy for hillslopes/runtime; MultiPolygon holes/invalid geometries may skew ranking.
Verification: confirmed enqueue order is largest → smallest in RQ metadata/logs for mixed-size payloads.

Phase 4 - Artifact delivery and browse integration (COMPLETE)

Scope: expose browse access for culvert + batch roots; skeletonize per-run folders after run_metadata.json is written; package the skeletonized runs into a batch artifact (weppcloud_run_skeletons.zip). Do not copy artifacts into a culvert/ subfolder in this phase.
Dependencies: Phase 3 outputs; browse service routing for /weppcloud/culverts/ and /weppcloud/batch/; agreement on skeletonization allowlist + denylist; shared skeletonize_run helper in wepppy/nodb.
Required per-run artifact (MVP, stored in run root):
- run_metadata.json (written during Phase 3; not copied or renamed).
Batch-level artifact (MVP):
- weppcloud_run_skeletons.zip at /culverts/<uuid>/weppcloud_run_skeletons.zip containing the skeletonized runs/ tree.
- runs_manifest.md at /culverts/<uuid>/runs_manifest.md listing run IDs, watershed labels (when present), job metadata, and available counts. Job fields (job_status, job_created) are best-effort from the worker; they may show - if a worker fails before updating NoDb, and no finalizer backfill runs yet.
- culverts_runner.nodb at /culverts/<uuid>/culverts_runner.nodb with batch summary and per-run metadata (includes job_status/job_created when available).
Browse paths:
- /weppcloud/culverts/<uuid>/browse/ for batch root browsing (includes runs/<id>/...).
- /weppcloud/batch/<batch_name>/browse/ for BatchRunner outputs.
Skeletonization (per run, run after run_metadata.json is written and before zipping):
- Keep list (allowlist; remove everything else):
  - *.log
  - climate.nodb
  - disturbed.nodb
  - landuse.nodb
  - nodb.version
  - soils.nodb
  - redisprep.dump
  - ron.nodb
  - run_metadata.json
  - unitizer.nodb
  - watershed.nodb
  - wepp.nodb
  - climate/
  - dem/wbt/*.geojson
  - disturbed/disturbed_land_soil_lookup.csv
  - landuse/landuse.parquet
  - soils/soils.parquet
  - watershed/channels.parquet
  - watershed/hillslopes.parquet
  - watershed/network.txt
  - watershed/structure.json
  - wepp/output/interchange/
- Exclude list (denylist; override allowlist):
  - wepp/output/interchange/H.pass.parquet
- Note: _logs/ directories are not retained; profile recorder logs are intentionally dropped.
Failure representation:
- Always write run_metadata.json; for failed runs it must include status=failed and the error block already written in Phase 3.
- Do not create placeholder files in Phase 4.
Deferred (post-MVP):
- Artifact manifests that enumerate required outputs and mark missing files for failed runs.
- Explicit culvert/ packaging or copied outputs (if clients need a curated directory later).
Deliverables: browse route support for /weppcloud/culverts/ + /weppcloud/batch/; reusable skeletonize_run helper in wepppy/nodb + hook in _process_culvert_run (or equivalent); weppcloud_run_skeletons.zip batch artifact (MVP).
Risks: browse service path mapping gaps; missing outputs for failed culverts; large artifact sizes; skeletonization removing debug inputs needed for re-runs.
Verification: browse integration test that lists the batch root and runs/ tree; confirm run_metadata.json is present in skeletonized runs; confirm weppcloud_run_skeletons.zip contains only the allowlist minus denylist; verify runs_manifest.md has one row per run.

Phase 4 handoff summary

Skeletonization: added reusable skeletonize_run (git clean allowlist/denylist) and invoked it after run_metadata.json is written; _logs/ are intentionally dropped; wepp/output/interchange/H.pass.parquet is explicitly excluded.
Batch artifacts: finalizer writes batch_summary.json, runs_manifest.md (Source + Batch Summary + runs table), and weppcloud_run_skeletons.zip (includes skeletonized runs/, runs_manifest.md, and culverts_runner.nodb).
NoDb state: culverts_runner.nodb now carries per-run job metadata (job_status, job_created) plus a persisted batch summary.
Browse/DTale/download/gdalinfo: added /weppcloud/culverts/... and /weppcloud/batch/... routes in browse, download, gdalinfo, and D-Tale services; Caddy routes updated to proxy these paths; path traversal checks hardened while allowing symlinked assets.
Tests: added browse route tests and extended culvert orchestration test to validate manifest + NoDb summary; submit-response browse_token download coverage is in tests/microservices/test_rq_engine_culverts.py::test_culvert_submit_browse_token_downloads_batch_skeleton_zip.

Phase 4b - Batch landuse/soils downscale (COMPLETE)

Scope: for culvert batches, fetch NLCD + SSURGO once at 30m for the payload DEM extent, then downscale locally to the DEM grid (matches subwta); run this in run_culvert_batch_rq before enqueuing child jobs, store canonical rasters at the batch root, and symlink into runs.
Shared batch outputs (kept outside runs/):
- landuse/nlcd_30m.tif, landuse/nlcd.tif
- soils/ssurgo_30m.tif, soils/ssurgo.tif
Run behavior:
- Require landuse/nlcd.tif + soils/ssurgo.tif to exist at the batch root, then symlink them before Landuse.build()/Soils.build().
- Call Landuse.build(retrieve_nlcd=False) and Soils.build(retrieve_gridded_ssurgo=False) to skip cleanup and remote retrieval.
Assumptions: all runs in a culvert batch share the same DEM grid/extent; subwta grid matches the DEM grid.
Notes: skeletonization removes run-local symlinks; canonical rasters live at the batch root.

Phase 4b handoff summary

Batch rasters: landuse/nlcd_30m.tif, landuse/nlcd.tif, soils/ssurgo_30m.tif, soils/ssurgo.tif generated once per batch from the payload DEM extent before jobs are queued and shared by runs.
Per-run wiring: landuse/soils directories are cleaned, then symlinked to the batch rasters before Landuse.build(retrieve_nlcd=False) and Soils.build(retrieve_gridded_ssurgo=False).
Overrides: model_parameters.nlcd_db is respected to select the 30m sources; defaults fall back to the base project settings.
Skeletonization: run-level symlinks are removed by skeletonization; batch rasters remain at the batch root.

Phase 4c - Cropped VRT symlinks for large DEMs (COMPLETE)

Scope: generate windowed VRTs (single SimpleSource + srcWin) for DEM + shared topo rasters using watershed feature bounds + pixel padding to reduce WBT/Peridot memory footprint.
Touch points:
- wepppy/all_your_base/geo/vrt.py centralizes VRT creation (build_windowed_vrt, build_windowed_vrt_from_window, CRS-aware bbox handling).
- Ron.symlink_dem accepts as_cropped_vrt (default false) + crop_window, persists _dem_is_vrt + crop window metadata, and writes dem.vrt when cropping.
- Watershed.symlink_channels_map uses the Ron crop window to build flovec/netful/relief/chnjnt VRTs and persists _flovec_netful_relief_chnjnt_are_vrt.
- Landuse.symlink_landuse_map and Soils.symlink_soils_map accept as_cropped_vrt (default false), persist _landuse_is_vrt/_soils_is_vrt, and create nlcd.vrt/ssurgo.vrt when cropping.
- NoDbBase no longer exposes lc_dir/soils_dir/lc_fn/ssurgo_fn; callers must use Landuse/Soils instances directly.
- WatershedFeature.get_padded_bbox now requires an explicit output_crs to avoid ambiguous coordinate systems.
- CulvertsRunner sources crop padding from culvert.cfg (crop_pad_px).
Runtime behavior:
- Ron.dem_fn composes dem.vrt vs dem.tif from _dem_is_vrt; VRT creation requires a crop window; call sites now resolve DEMs via Ron.
- WhiteboxToolsTopazEmulator composes relief/flovec/netful/chnjnt paths from _flovec_netful_relief_chnjnt_are_vrt; WBT builds reset this flag to .tif.
- Landuse.lc_fn and Soils.ssurgo_fn resolve .vrt vs .tif based on _landuse_is_vrt/_soils_is_vrt; retrieval workflows reset these flags to .tif.
- elevationquery accepts dem.vrt; _compute_ruggedness_from_dem uses ron.dem_fn.
Assumptions: culvert payloads are UTM; run raster grids are aligned so a shared crop window is valid across DEM/flovec/netful/relief/chnjnt.

Phase 4c handoff summary

VRT helper: wepppy/all_your_base/geo/vrt.py provides CRS-aware window computation and VRT creation (single SimpleSource + srcWin).
DEM selection: Ron now persists _dem_is_vrt and owns dem_fn, eliminating file-system guessing in downstream modules.
Channel raster selection: Watershed persists _flovec_netful_relief_chnjnt_are_vrt and keeps the WBT emulator in sync; VRTs use .vrt extensions for WBT compatibility.
Landuse/soils selection: Landuse/Soils now persist _landuse_is_vrt/_soils_is_vrt and own lc_fn/ssurgo_fn; call sites use the instances directly (no NoDbBase helpers).
Culvert config: culvert.cfg carries crop_pad_px for DEM/topo crop padding; culvert runs call the new VRT-enabled symlink methods.

Phase 4e - wbt_abstract_watershed memory optimization (COMPLETE)

Scope: reduce memory footprint of wbt_abstract_watershed (Peridot) to prevent memory-watchdog kills during parallel culvert batch processing; integrate --skip-flowpaths flag to eliminate unnecessary flowpath output generation.
Problem: initial Hubbard Brook batch (7 watersheds) with 4 concurrent workers peaked at 101GB used / 24GB available, triggering the memory-watchdog kill threshold (25GB available). Per-process memory was 15-21GB each.
Dependencies: optimized Peridot binary with reduced raster footprint and --skip-flowpaths CLI flag.

Peridot optimizations (Rust binary)

Raster footprint reduced: f32 for relief/fvslop/taspec (was f64).
flovec uses u8 with in-place remap (was i8 with separate allocation).
Precomputed indices_map avoids redundant cell iteration.
--skip-flowpaths flag skips flowpaths.csv and slope_files/flowpaths/ output (not needed for culvert runs).

Code changes

Binary: copied optimized wbt_abstract_watershed to wepppy/topo/peridot/bin/.
wepppy/topo/peridot/peridot_runner.py: added skip_flowpaths parameter to run_peridot_wbt_abstract_watershed(); updated post_abstract_watershed() to handle missing flowpaths.csv.
wepppy/nodb/core/watershed.py: added skip_flowpaths property with getter/setter; passed to runner in abstract_watershed().
wepppy/rq/culvert_rq.py: set watershed.skip_flowpaths = True before abstract_watershed() for batch processing.

Validation metrics (Hubbard Brook batch, 7 watersheds)

Metric	Before	After	Improvement
Peak memory used	101GB	51GB	-50%
Memory available	24GB	73-82GB	+200%
Per-process (large watershed)	15-21GB	7-10GB	-50%
Per-process (small watershed)	8-12GB	3-5GB	-60%
Watchdog kills	1	0	eliminated

Per-run memory profile (from `_peridot.log`)

Run	Raster Cells	Hillslopes	Memory (optimized)
200	3.98M	87	~10GB
174	3.31M	78	~7GB
59	4.18M	78	~10GB
184	3.55M	78	~7GB
210	1.51M	43	~3GB
1	1.64M	28	~3GB

Memory-heavy phases (from log analysis)

Raster loading: ~50-70MB combined for subwta/relief/flovec/fvslop/taspec per run.
Indices map construction: ~5-18MB depending on hillslope count.
Hillslope abstraction: dominant memory consumer; scales with cell count × hillslope count; produces flowpath indices arrays (~15-17MB for large watersheds).

Notes

Memory pressure depends on concurrent process overlap, not individual watershed size; 4 large watersheds overlapping caused peak usage.
--skip-flowpaths eliminates flowpath CSV/slope file I/O overhead but primary savings come from reduced raster footprint.
Memory watchdog thresholds: warn at 30GB available, kill at 25GB available (/home/workdir/wepppy/scripts/memory-watchdog.sh).

Phase 4f - Per-run stream junction generation + job staggering (COMPLETE)

Scope: optimize per-run stream junction (chnjnt.tif) generation and reduce VRT file contention during parallel batch processing.
Problem: culvert runs use VRT files that reference shared source TIFs; when multiple workers start simultaneously, file contention can cause transient failures.

Stream junction generation

Original approach: clip netful.vrt → netful.masked.tif using target_watershed.tif (not bound.tif), then run stream_junction_identifier(flovec, netful.masked.tif) → chnjnt.tif.
Attempted optimization: clip pre-computed chnjnt.vrt directly to watershed mask → chnjnt.tif (skip stream_junction_identifier).
Result: optimization failed—stream junctions must be recalculated for the masked stream network because junction topology changes at watershed boundaries.
Final implementation: kept original approach with _generate_masked_stream_junctions() in wepppy/rq/culvert_rq.py (mask source is target_watershed.tif, not bound.tif):
1. Clip netful.vrt to target_watershed.tif mask → netful.masked.tif
2. Run wbt.stream_junction_identifier(d8_pntr=flovec.vrt, streams=netful.masked.tif) → chnjnt.tif

Job staggering for VRT contention

Problem: transient "bound file was not created" failures when multiple workers read the same source TIF through VRT references simultaneously.
Solution: added 1-second delay between job submissions in run_culvert_batch_rq() to stagger worker starts.
Code change: time.sleep(1) after each q.enqueue_call() in wepppy/rq/culvert_rq.py.

Notes

chnjnt.vrt is created per-run as a cropped view of the batch's topo/chnjnt.tif, but it cannot be used directly because masking to the watershed boundary changes which stream segments exist, which changes junction locations.
The 1s delay adds ~N seconds to batch submission time (where N = number of runs) but prevents file contention failures.
Use bound.tif for masking (not target_watershed.tif): clipping channels to the target watershed can misidentify headwater pixels and cause mismatch with netw0.tif.

Phase 4g - Representative flowpath optimization (COMPLETE)

Status: complete.
Scope: replace the O(pixels) per-hillslope flowpath walking in wbt_abstract_watershed with a single representative flowpath per hillslope, dramatically reducing abstraction time for 1.0m DEM culvert batches.
Problem: wbt_abstract_watershed was the bottleneck for high-resolution (1.0m) culvert batches. Run 184 (78 hillslopes, 3.5M cells) took 4409 seconds (~73 minutes) for abstraction alone, walking 2.4M flowpath indices.

Peridot optimizations (Rust binary)

New --representative-flowpath flag for wbt_abstract_watershed (WBT-only).
Forces --skip-flowpaths when enabled (no flowpaths.csv or slope_files/flowpaths output).
Loads dem/wbt/discha.tif (distance-to-channel raster) to select seed cells.
Seed selection: picks a median-distance source cell (no upstream neighbor per D8) with deterministic tie-breaks (higher relief, then row-major).
Walks one downstream flowpath to the channel with a fallback candidate sweep.
Builds hillslope summary from that single path while preserving existing length/width logic (source hillslopes use path length; left/right use channel length for width and area/width for length).
Deprecated get_edge_flowpaths in favor of faster get_edge_flowpaths2 (O(N) per hillslope mask vs. O(F^2 * L)).

Code changes

Binary: updated wbt_abstract_watershed in wepppy/topo/peridot/bin/.
wepppy/topo/peridot/peridot_runner.py: added representative_flowpath parameter to run_peridot_wbt_abstract_watershed().
wepppy/nodb/core/watershed.py: added representative_flowpath property with getter/setter; passed to runner in abstract_watershed().
wepppy/rq/culvert_rq.py: set watershed.representative_flowpath = True before abstract_watershed() for batch processing.

Performance comparison (Run 184: 78 hillslopes, 3.5M cells)

Metric	Before (full flowpath)	After (representative)	Improvement
Abstraction time	4409.51s (~73 min)	0.17s	25,938x faster
Indices walked	2,374,966	3,740	635x fewer
Points output	27,405	3,740	7.3x fewer
Memory (flowpaths)	~19.16 MiB	~0.17 MiB	113x less
Total run time	4711.5s (~78 min)	292.5s (~4.9 min)	16x faster

Batch-level performance (Hubbard Brook, 40 culverts)

Metric	Before	After	Improvement
Status	Failed (all runs)	Success (all runs)	Fixed
Avg run duration	N/A (failed at ~2.6h)	110.8s	Viable
Min run	N/A	43.5s	-
Max run	N/A	322.4s	-

Hillslope geometry comparison (Run 184: 78 hillslopes)

Metric	Old (full flowpath)	New (representative)	Difference
Mean length	121.8m	88.0m	-33.8m (-10.7%)
Min length	1.8m	1.0m	-
Max length	626.4m	909.0m	-
Correlation	-	-	0.55

Note: Representative flowpath mode produces different hillslope geometries because it walks a single flowpath per hillslope rather than aggregating all flowpaths. Large differences occur for hillslopes where the representative path is much shorter/longer than the weighted average of all paths.

WEPP output comparison (Run 184)

Metric	Old	New	Diff (%)
Contributing area (ha)	237.94	237.89	-0.0%
Avg. Ann. Precipitation (m³)	3,650,763	3,649,976	-0.0%
Avg. Ann. Water discharge (m³)	2,164,516	2,167,289	+0.1%
Avg. Ann. Hillslope soil loss (t)	2.70	1.40	-48.1%
Avg. Ann. Channel soil loss (t)	67.20	67.10	-0.1%
Avg. Ann. Sediment discharge (t)	19.00	17.80	-6.3%
Sediment Delivery Ratio	0.271	0.261	-3.7%

Key observation: Hillslope soil loss is 48% lower with representative flowpath mode. This is expected because representative flowpaths tend to be shorter on average, and soil loss is sensitive to hillslope length. Channel processes remain nearly identical since channel geometry is unchanged. The watershed-level sediment discharge difference (6.3%) is much smaller than the hillslope-level difference due to the dominance of channel erosion in this watershed.

Known limitations

discha.tif fallback: many hillslopes show "no discha candidates" warnings, falling back to first pixel index instead of median distance selection. This affects seed quality but doesn't break the abstraction. Future work: investigate discha raster generation to ensure valid values within hillslope boundaries.
This mode is WBT-only and intentionally diverges from TOPAZ behavior.
Hillslope lengths can differ significantly from full-flowpath mode; this affects soil loss predictions but provides acceptable approximations for culvert risk screening.
WBT junction detection limitation: small or simple watersheds with minimal stream networks may fail during subcatchment delineation with "Current cell is not recognized as a junction" (WhiteboxAppError). This occurs when the outlet cell doesn't land on a recognized stream junction in WBT's stream_link_identifier topology. This is a preexisting WBT limitation, not specific to representative flowpath mode.

Verification

Manual batch: Hubbard Brook payload (40 culverts, 1.0m DEM) completed successfully with 100% success rate.
Peridot logs confirm representative flowpath mode active and abstraction completing in sub-second times.

Phase 4h - Minimal stream seeding for edge-case watersheds (COMPLETE)

Status: complete.
Scope: handle watersheds where find_outlet candidates fall outside the watershed mask by extending the mask and seeding a minimal stream/junction.
Problem: WBT find_outlet reports candidate row/col at the VRT edge for culvert watersheds where the flow path exits the raster outside the polygon mask.

Root cause analysis

find_outlet returns candidates on the raster edge that are outside target_watershed.tif, so the tool never sees a stream cell inside the mask.
Walking upstream from the candidate is ambiguous because multiple cells can drain into the same downstream pixel, so the fix uses the candidate location directly.

Solution: extend watershed mask + seed outlet

Parse find_outlet error candidates; only proceed if all candidates converge.
Ensure dem/target_watershed.tif exists (rebuild from the watershed feature if missing).
Extend the watershed mask to include the candidate pixel.
Seed netful at the candidate (plus upstream neighbor when available) and ensure chnjnt contains a junction at the outlet.
Retry find_outlet with the cached mask.

Code changes

wepppy/rq/culvert_rq.py:
- Added _extend_watershed_mask_to_candidate()
- find_outlet fallback now rebuilds the mask if missing, extends the mask to include the candidate, seeds netful/chnjnt, and retries find_outlet
- Uses _parse_outlet_candidates_from_error(), _seed_outlet_pixel(), _ensure_outlet_junction()
wepppy/nodb/core/watershed.py:
- Minimal structure handling when network.txt is missing (1 hillslope, 1 channel)
wepppy/nodb/core/wepp.py:
- Minimal pw0.str generation for 1 hillslope/1 channel

Test payload

tests/culverts/test_payloads/Hubbard_Brook_subset_11/payload.zip (11 edge-case watersheds)

Results

Batch UUID: ceed1b38-1ef4-4c19-83d8-7edd625c1d6c
Summary (pre-fix batch): total 11, succeeded 9, failed 2, skipped_no_outlet 0
Resolved: Point_ID 9 ZeroDivisionError fixed by guarding disturbed soil pct_coverage when total_area <= 0.0; run 9 now completes.
Remaining failure: Point_ID 207 WBT junction error ("Current cell is not recognized as a junction")

Remaining work

Point_ID 207 follow-up: the issue was malformed inputs (culvert point outside watershed), not WBT. Added a guard that validates point-in-watershed and raises NoOutletFoundError early.

Phase 5 - Observability, error handling, retention (COMPLETE)

Status: complete; remaining work moved to Phase 6a and cleanup follow-ups.
Scope: run-level validation + error propagation into batch artifacts (run_metadata.json, runs_manifest.md, culverts_runner.nodb), publish status events to Redis DB 2 (optional), add validation metrics, and implement cleanup/retention policy in /wc1/culverts/ (delete 7 days after job completion, with completion time stored in CulvertsRunner state).
Dependencies: Phase 1 RQ job framework; ops decision on retention window.
Deliverables:
- Culvert point-in-watershed validation (WatershedFeature.contains_point) before modeling; supports culvert_runner.contains_point_buffer_m (meters) to tolerate small alignment offsets; failures recorded as CulvertPointOutsideWatershedError.
- Run-level errors from run_metadata.json merged into culverts_runner.nodb + runs_manifest.md.
- Finalizer computes validation metrics (culvert/outlet coords, distance, target area, bounds area) and stores them in culverts_runner.nodb + runs_manifest.md.
- NoDb contention retry for CulvertsRunner writes when batch workers overlap.
- Retry/backoff for flaky bound.tif creation (WBT watershed step) to reduce VRT -> TIF contention failures.
- Batch-scoped logging: culvert_rq logger now routes into the CulvertsRunner file handler under the batch UUID.
- Outlet seeding log includes D8 neighbor mask sum from target_watershed.tif for diagnostics.
- Remaining: status event payloads, cleanup job (cron or RQ) that reads CulvertsRunner.completed_at + retention window, run/batch log summaries.
Risks: retention job deleting active batches; missing completion timestamp on failed jobs; missing error propagation in RQ engine.
Verification: Hubbard Brook edge-case payload confirms outside-watershed failures appear in run_metadata.json, culverts_runner.nodb, and runs_manifest.md; run-level metrics populated when outputs exist.

Phase 5a - Minimum watershed area filtering (COMPLETE)

Scope: filter micro-watersheds using culvert_runner.minimum_watershed_area_m2 (configured in culvert.cfg) when the watershed GeoJSON provides area_sqm; reject runs below the threshold with a structured validation error.
Deliverables:
- CulvertsRunner.minimum_watershed_area_m2 config hook.
- Run-level guard after point-in-watershed validation and after target_watershed_path creation (NoOutletFoundError fallback path).
- Failure surfaced as WatershedAreaBelowMinimumError in run_metadata.json, culverts_runner.nodb, and runs_manifest.md.
Risks: inconsistent area_sqm values in payloads; missing area_sqm means no filtering (intentional).
Verification: Hubbard Brook payload analysis shows 100 m^2 threshold eliminates micro-watersheds without blocking valid small catchments.

Phase 5b - Watershed simplification issue documentation (COMPLETE)

Scope: Document critical issue where Culvert_web_app's 1.0m watershed simplification causes weppcloud to skip culverts.
Problem: weppcloud validates that each culvert's pour point is inside its associated watershed polygon. Simplified watersheds often fail this check.

Impact analysis (Hubbard Brook dataset)

Metric	Unsimplified	Simplified (1.0m)
Pour points inside watershed	208 / 210	116 / 210
Culverts skipped by weppcloud	2	94

45% of culverts will be skipped due to simplification moving polygon boundaries.

Vertex reduction from simplification

Point_ID	Unsimplified	Simplified	Ratio
130	3,958	62	64x
162	5,608	49	114x
112	1,575	5	315x

Culvert_web_app deleted resources

The following files are created but deleted after processing:

ws_raster_UTM.tif - Watershed raster (cell values = pour point FID)
ws_polygon_UTM.shp - Unsimplified polygons before simplification

Source code location: subroutine_nested_watershed_delineation.py:

# Line ~1198: Creates unsimplified polygons (temporary)
wbt.raster_to_vector_polygons(i=output_watershed_raster_path, output=watershed_polygon_path)

# Line ~1213: Simplifies with 1.0m tolerance BEFORE saving
watershed_poly_gdf_merged = simplify_geometry(watershed_poly_gdf_merged, tolerance=1.0)

Recommended fix for Culvert_web_app

# Preserve unsimplified polygons before calling simplify_geometry():
ws_polygon_unsimplified_path = os.path.join(user_output_WS_deln_path, "ws_polygon_unsimplified_UTM.shp")
watershed_poly_gdf_merged.to_file(ws_polygon_unsimplified_path)

# Then apply simplification for the standard output
watershed_poly_gdf_merged = simplify_geometry(watershed_poly_gdf_merged, tolerance=1.0)

Deliverables

Updated docs/culvert-at-risk-integration/dev-package/README.md with "CRITICAL: Watershed Simplification Issue" section.
build_payload.py uses simplified watersheds as-is (no reconstruction attempted).
Culvert_web_app team informed: to run all culverts, provide unsimplified watersheds.geojson.

Phase 5c - Stream network scaling (COMPLETE)

Context: flow_accum_threshold in Culvert_web_app is cell-count based (default 100). For high-resolution DEMs (1.0m), that yields a much denser stream network than a 9–10m DEM using the same threshold, exploding hillslopes and runtime.
Baseline reference: weppcloud assumes flow_accum_threshold=100 plus one stream-order reduction pass, but the real calibration target is the 30m DEM workflow where channel initiation is driven by critical source area (typically 5–10 ha).
Scaling guidance:
- Target area: A_target_m2 = flow_accum_threshold * (cellsize_m^2)
- If re-running extract_streams to match the 30m/100-cell baseline (~90,000 m²), reasonable targets are:
  - 10m DEM: flow_accum_threshold ≈ 900
  - 1m DEM: flow_accum_threshold ≈ 90,000
- Order-reduction mapping (when culvert_runner.order_reduction_mode = "map"):
  - Compute an effective cellsize: cellsize_m * sqrt(flow_accum_threshold / 100) (default flow_accum_threshold=100 when missing).
  - <= 1m → 3 passes; <= 4m → 2 passes; <= 10m → 1 pass (default for coarser DEMs)
Mitigations available today:
- Re-run extract_streams in Culvert_web_app with a scaled threshold (best fidelity).
- Adjust culvert_runner.order_reduction_passes as a heuristic simplifier when re-running streams is not feasible (less direct than thresholding).
Current action: test order_reduction_passes values (start with 2, then 3) on 1.0m DEM batches to measure hillslope and runtime reduction.
Deliverables:
- Document target-area scaling approach and recommended thresholds by DEM resolution.
- Decide whether to re-run stream extraction for high-resolution projects or rely on order-reduction passes.

Phase 5d - Native CRS retrieval for landuse/soils + wepppyo3 nodata guard (COMPLETE)

Scope: avoid WGS84 round-trip clipping by requesting NLCD/SSURGO with native UTM extents; extend wmesque2 + client to accept native CRS bounding boxes; guard identify_mode_single_raster_key against 100% nodata hillslopes.

Problem analysis (batch `55b28bb9-2d61-43f3-9f45-10779e93c501`, run 7)

Run 7 failed with KeyError: '1441' in wepp.run_hillslopes() when looking up landcover.
Root cause: 6 hillslopes (1441, 1541, 1592, 1601, 1602, 1603) had 100% nodata in the NLCD raster.
NLCD retrieval used WGS84 bbox which, when transformed back to UTM, produced a smaller extent than the DEM:
- DEM extent: 261900–273608 (11.7km width)
- NLCD 30m extent: 262478–273068 (10.6km width, ~1km missing on edges)
identify_mode_single_raster_key (wepppyo3) correctly skipped hillslopes with all-nodata pixels, but downstream code expected all hillslope IDs to be present in domlc_d.

Deliverables

wmesque2 native CRS support (complete):
- wmesque2 accepts optional bbox_crs (EPSG/proj4) for projected extent requests.
- wmesque client (wmesque_retrieve) accepts extent_crs parameter, appends bbox_crs for v2 only.
- Culvert batch landuse/soils retrieval uses native UTM extent to avoid WGS84 round-trip clipping.
wepppyo3 nodata guard (complete; updated wepppyo3 mounted in container):
- identify_mode_single_raster_key and identify_mode_intersecting_raster_keys must return entries for all keys in the key raster.
- When a hillslope has 100% nodata in the parameter raster, return the nodata value (or a sentinel) instead of silently skipping.
- Test fixture created: /workdir/wepppyo3/tests/raster_characteristics/fixtures/ with subwta_nodata_edge.tif (4 hillslopes) and nlcd_nodata_edge.tif (68% nodata coverage).

Test fixture details

Source: cropped from batch 55b28bb9-2d61-43f3-9f45-10779e93c501 run 7
subwta_nodata_edge.tif: 500x200 px, hillslopes [1312, 1323, 1442, 1443], channel [1444]
nlcd_nodata_edge.tif: 500x200 px, 32% valid / 68% nodata (right edge nodata)
Expected: all 4 hillslopes must appear in identify_mode_single_raster_key result; 1442/1443 may have nodata as value

Risks

Incorrect CRS strings or non-projected CRS inputs to wmesque2.
Downstream code must handle nodata values returned for hillslopes with no valid landcover data.

Verification

wepppyo3 test using fixture confirms all keys are returned.
Run 7 retry succeeds after wmesque2 native CRS fix is deployed.

Phase 5 handoff summary

Validation now checks culvert points against watershed polygons; outside points fail fast with CulvertPointOutsideWatershedError.
Minimum watershed area filter rejects micro-watersheds when area_sqm is present, using WatershedAreaBelowMinimumError.
Finalizer merges run_metadata.json status/errors into culverts_runner.nodb and runs_manifest.md for consistent reporting.
Runs manifest includes validation metrics and culvert/outlet distance for downstream QA.
NoDb contention retry pattern applied to CulvertsRunner updates in batch jobs.

Phase 6 - Auth and webhook enhancements (post-POC)

Scope: JWT issuance/rotation, webhook registration + retries, HMAC signing, opt-in callbacks on completion/failure.
Dependencies: security decisions on token lifecycle; agreement on webhook payload schema.
Deliverables: auth middleware, webhook dispatcher, docs for Culvert_web_app.
Risks: callback storms for large batches; secrets management; backward compatibility with POC auth.
Verification: auth enforcement tests; webhook retry tests with mock endpoints; manual validation with Culvert_web_app dev instance.

Phase 6a - Error schema standardization (DEFERRED)

Scope: standardize error payloads across rq-engine routes. (Note: jobstatus/jobinfo do not currently surface error_code/error_detail.)
Dependencies: agreement on error taxonomy and client expectations for job status polling.
Deliverables: updated response helpers, updated spec/dev-package docs.
Risks: client-side parsing changes; backward compatibility for existing integrations.
Verification: regression tests for error responses; manual checks against culvert payload uploads.

Phase 6b - Payload naming + ws_deln metadata alignment (READY FOR REVIEW)

Scope: rename the DEM payload path to topo/breached_filled_DEM_UTM.tif (no misleading "hydro-enforced" filename) and capture hydroEnforcementSelect as hydro_enforcement_select in metadata.json.
Dependencies: Culvert_web_app form values and user_ws_deln_responses.txt output; payload builder updates in the dev package.
Deliverables:
- Payload validator + runner use the new DEM filename.
- build_payload.py copies the DEM to topo/breached_filled_DEM_UTM.tif and emits hydro_enforcement_select in metadata.json.
- Docs/specs updated to reference the new DEM filename and metadata field.
Compatibility: this is a breaking rename as we move from dev to production; older payloads using topo/hydro-enforced-dem.tif are intentionally unsupported and must be reexported.
Risks: culvert docs drifting out of sync.
Verification: update culvert tests/fixtures to use topo/breached_filled_DEM_UTM.tif; validate a real payload from /wc1/culvert_app_instance_dir/user_data/.

Cross-phase test strategy (minimum)

Implemented tests

tests/microservices/test_rq_engine_culverts.py (rq-engine culvert ingestion).
tests/microservices/test_rq_engine_jobinfo.py (rq-engine jobinfo).
tests/culverts/test_culverts_runner.py (run scaffolding + symlinks).
tests/culverts/test_culvert_orchestration.py (per-culvert orchestration + metadata).
Payload fixtures: tests/culverts/test_payloads/.

Planned tests

Unit: Ron.symlink_dem(), culvert ID mapping, error formatting.
Integration: per-culvert orchestration, browse URLs, RQ job status.
Smoke/manual: real payloads from /workdir/culvert_app_instance_dir/user_data/ in container; verify outputs and download.

Open questions / blockers

Idempotency/retry rules for duplicate POSTs (keying on payload hash? caller-supplied id?).
Retention/cleanup window and ownership of /wc1/culverts/ storage.
Concurrency/timeouts for 300-culvert batches and RQ worker sizing.

weppcloud integration plan for Culvert_web_app

Guiding requirements (spec highlights)

Phase 0 - Contract finalization and fixtures (COMPLETE)

metadata.json schema (v1)

model-parameters.json schema (v1)

Phase 1 - API ingestion, validation, and job enqueue (rq-engine) (COMPLETE)

Phase 2 - Culvert batch runner scaffolding (COMPLETE)

Phase 1/2 combined handoff summary

Phase 3 - Per-culvert WEPP orchestration (COMPLETE)

Phase 3 handoff summary

Phase 3b - Parallelized culvert/batch execution (RQ batch queue)

Phase 3b handoff summary

Phase 3c - Per-run creation (batch_rq parity + parallel run setup)

Phase 3c handoff summary

Phase 3e - Stream pruning + order reduction (culvert batches)

Phase 3f - Stream coverage validation + per-run fallback

Phase 3g - LPT queue ordering (area proxy)

Phase 4 - Artifact delivery and browse integration (COMPLETE)

Phase 4 handoff summary

Phase 4b - Batch landuse/soils downscale (COMPLETE)

Phase 4b handoff summary

Phase 4c - Cropped VRT symlinks for large DEMs (COMPLETE)

Phase 4c handoff summary

Phase 4e - wbt_abstract_watershed memory optimization (COMPLETE)

Peridot optimizations (Rust binary)

Code changes

Validation metrics (Hubbard Brook batch, 7 watersheds)

Per-run memory profile (from _peridot.log)

Memory-heavy phases (from log analysis)

Notes

Phase 4f - Per-run stream junction generation + job staggering (COMPLETE)

Stream junction generation

Job staggering for VRT contention

Notes

Phase 4g - Representative flowpath optimization (COMPLETE)

Peridot optimizations (Rust binary)

Code changes

Performance comparison (Run 184: 78 hillslopes, 3.5M cells)

Batch-level performance (Hubbard Brook, 40 culverts)

Hillslope geometry comparison (Run 184: 78 hillslopes)

WEPP output comparison (Run 184)

Known limitations

Verification

Phase 4h - Minimal stream seeding for edge-case watersheds (COMPLETE)

Root cause analysis

Solution: extend watershed mask + seed outlet

Code changes

Test payload

Results

Remaining work

Phase 5 - Observability, error handling, retention (COMPLETE)

Phase 5a - Minimum watershed area filtering (COMPLETE)

Phase 5b - Watershed simplification issue documentation (COMPLETE)

Impact analysis (Hubbard Brook dataset)

Vertex reduction from simplification

Culvert_web_app deleted resources

Recommended fix for Culvert_web_app

Deliverables

Phase 5c - Stream network scaling (COMPLETE)

Phase 5d - Native CRS retrieval for landuse/soils + wepppyo3 nodata guard (COMPLETE)

Problem analysis (batch 55b28bb9-2d61-43f3-9f45-10779e93c501, run 7)

Deliverables

Test fixture details

Risks

Verification

Phase 5 handoff summary

Phase 6 - Auth and webhook enhancements (post-POC)

Phase 6a - Error schema standardization (DEFERRED)

Phase 6b - Payload naming + ws_deln metadata alignment (READY FOR REVIEW)

Cross-phase test strategy (minimum)

Open questions / blockers

`metadata.json` schema (v1)

`model-parameters.json` schema (v1)

Per-run memory profile (from `_peridot.log`)

Problem analysis (batch `55b28bb9-2d61-43f3-9f45-10779e93c501`, run 7)