# Failure Analysis ## Failed stage **Stage:** `Run the CLUBB restart test script` **Exact error symptom:** - `python3 run_scripts/run_restart_test.py bomex` - `Running standard bomex case... Done!` - `Running restart bomex case from halfway point... Done!` - `Results not bit-for-bit!` - Jenkins then reports `ERROR: script returned exit code 1` and marks the build `FAILURE`. ## What I inspected I reviewed: - the Jenkins console tail and explicit failure lines, - the recent source diff for `HEAD~10..HEAD` across `src`, `run_scripts`, and `clubb_python_api`, - the HEAD commit summary. ## Most likely cause in source The strongest regression candidate is in: - `src/CLUBB_core/advance_clubb_core_module.F90` - In the restart branch, the code changed from: - `time_current = time_restart` - to: - `time_current = time_initial` This is a direct restart-path behavior change. The restart test compares a continuous run against a split/restarted run, so changing the model clock to the initial time during restart can alter iteration bookkeeping and any time-dependent logic that depends on `time_current` or `iinit`. ## Why this plausibly breaks BFB restart behavior A restart run must reproduce the same state evolution as the uninterrupted run. Setting `time_current` to `time_initial` in the restart branch can: - shift the timestep index calculation, - change which timestep-dependent branches execute, - and desynchronize restart bookkeeping from the saved restart state. That is consistent with the observed symptom: both runs complete, but the outputs are no longer bit-for-bit identical. ## Secondary change that may contribute The same diff also changed several closure/clipping interfaces and made clipping counters mutable/inout: - `src/CLUBB_core/advance_clubb_core_module.F90` - `src/CLUBB_core/advance_windm_edsclrm_module.F90` - `src/CLUBB_core/advance_xm_wpxp_module.F90` - `src/CLUBB_core/advance_xp2_xpyp_module.F90` - `src/CLUBB_core/clip_explicit.F90` - matching wrappers in `clubb_python_api/` Examples: - `order_*` arguments were replaced by counters like `wprtp_cl_num`, `wpthlp_cl_num`, `upwp_cl_num`, `vpwp_cl_num` - these are now passed as `intent(inout)` in the Fortran wrappers If restart state depends on these counters being initialized or advanced identically across the two halves of the run, that could also contribute to non-BFB output. However, the restart-time change above is the clearest direct regression. ## Conclusion **Proven failure:** restart test for `bomex` failed with `Results not bit-for-bit!` and exit code 1. **Most likely root cause:** `src/CLUBB_core/advance_clubb_core_module.F90` restart-branch change from `time_restart` to `time_initial`. **Confidence:** high for the restart-time bug; moderate for the new mutable clipping-counter plumbing as a possible secondary contributor. ## Analysis Metadata - Job: `clubb_restart_gfortran_test_branch` - Build: `20` - Build URL: http://carson.math.uwm.edu/jenkins/job/clubb_restart_gfortran_test_branch/20/ - Tool rounds: `4` --- Generated by `analyze_failure.py` in 17.0s.