Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052) (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054) (details)
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051) (details)
Adds to tuner error bars on bias removal (arrow) plots (details)
Adding OpenACC data directives for mixing length and adg routines (details)
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general. (details)
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926. (details)
Removing update_pressure from public list. This was causing compilation crashes. RESOLVED:8c7230fecb877d04fb129ef5e143e0993b4b29b1 (details)
Fixing bug, arrays given a dummy index in 0fafc6b0b1f1a6058d37bf3db4bb3708204504db are declared nsize, but are only used up to nlevels, thus we need the (1,1:nlevels) specifier when passing them. This issue was only caught by our _debug tests, so that's good evidence the new flags we added to initialize unused to memory was effective. (details)
Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver. (details)
Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly. (details)
Clubb ticket #1025: Implemented way to make esa tuner reproducible, h… (#1068) (details)
added autocommit message maker to clubb so I have an easier time testing it (details)
changes to integrate message maker into gitUpdate scripts (details)
Oops, I made 1 small error, should be consistent now. (details)
Implementing changes to the initial conditions. (details)
This commit most definitely does not change any bits, (details)
Add scripts to configure and run convergence test (details)
reorgnize the scripts for convergence test simulations (details)
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break. (details)
Undoing README update, 1 space = 1 byte and we should maximize file sizes to deter hackers from stealing our data. Also BIT_CHANGING:fb4556e4cc4cb3d4b6df3520370a28a824f357ef for configs where l_use_cloud_cover = .false., which means I was wrong about this ever not being bit changing, so I must've either only tested with l_use_cloud_cover = .true. or only tested the multicol diffs when I put compute_cloud_cover inside the if statement. (details)
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2. (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 1)
Restructure: The compute_mixing_length is one of the top most routine taking 35-50% of the total time in a single timestep. The subroutine has been restructured to push the i-loop further down to extract vectorization and parallelization. The restructuring also involves introduction of sat_mixrat_liq_acc routines to extract parallelism when called inside a OpenACC parallel region.
Porting: OpenACC directives are inserted to port the restructured compute_mixing_length code on to the GPUs. This port is currently unoptimized and there is still room for improvement.
NOTE: Currently, l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau (Earthworks config options) case is supported on OpenACC build. Any other options works on CPUs as usual. OpenACC declare create directives are inserted in model_flags and constants_clubb, as these module variables are used inside the saturation routines.
* Added debug message about only supporting l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau on GPUs Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler.
* Changing CLUBB debug level 1 to 0 for the saturation formula support running on GPUs
* Changing indentation to make gfortran happy, it wants ifdefs to start at the beginning of the line.
* Adding use statements for error checks and printouts, also making the errors set err_code to clubb_fatal_error.
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 2)
Restructure: sat_mixrat_liq_2D_acc is being directly called instead of calling the 1D version inside the column loop. Changing sat_mixrat_liq_2D_acc to a subroutine from a function and adding output array, start_index as additional argument. This is a workaround for passing sub-arrays. The OpenACC doen't like the sub-arrays being passed and fails the validation.
Porting: OpenACC directives are added inside sat_mixrat_liq_2D_acc for porting
Validation: Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler
* Fix for compilation issues
Issue 1: Missed out declaring 'start_index' while intergrating the change
Issue 2: The use of error_code module and the procedures inside it causes OpenACC compilation issues when run on the device.
* Removing the sat_mixrat_liq_acc and sat_mixrat_liq_2D_acc, making the normal sat_mixrat_liq work for all current use cases, and making the other versions of sat_mixrat_liq (bolton,gfdl,lookup) functional with OPENACC.
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051)
Adding OpenACC data directives for mixing length and adg routines
OpenACC structured data regions are added to optimize the data transfers between CPU and GPU. These data regions will converted to unstrucutred data region in the later optimization phase. Results are bit for bit.
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general.
bias removal plot, a colored correlation matrix, a matrix of linear-based 2-point bounds on parameter perturbations, and a web-page version of the 3-dot subplots.
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926.
Fixing bug, arrays given a dummy index in 0fafc6b0b1f1a6058d37bf3db4bb3708204504db are declared nsize, but are only used up to nlevels, thus we need the (1,1:nlevels) specifier when passing them. This issue was only caught by our _debug tests, so that's good evidence the new flags we added to initialize unused to memory was effective.
Porting pdf_closure subroutine with OpenACC (#1059)
* Porting pdf_closure subroutine with OpenACC
OpenACC directives are added to pdf_closure subroutine. The necessary structured data region is also added for optimzing data movement across kernels. There is opportunity to task parallelize using streams and will be explored in the future.
Clubb ticket #1025: Implemented way to make esa tuner reproducible, h… (#1068)
* Clubb ticket #1025: Implemented way to make esa tuner reproducible, hid error output of optional diagnostic variables behind check, fixed parallelization issue with tuner, esa max_iters parameter is now in stats namelist, fixed issue with TUNER compiler directive, some small fixes. - New namelist variables prescribed_rand_seed and l_use_prescribed_rand_seed determine if the esa tuner will use a random or a fixed value as random seed. Added descriptions to README. - Added max_iters to stats namelist to make it more modifiable. - Renamed stp_adjst_intercept_in and stp_adjst_slope_in to stp_adjst_shift_in and stp_adjst_factor_in, respectively, to better reflect their influence on step size. - The error output in src/CLUBB_core/pdf_closure_module.F90 for the diagnostic variables wprtp2, wpthlp2, wprtpthlp, and rcp2 is now hidden behind existence checks for these variables. And a clarification was added to "#ifdef TUNER" directive. - NetCDF file access caused the tuner to crash in parallel mode (-fopenmp flag in config file and multiple cases). Adding an $OMP CRITICAL structure around the call to stats_init in clubb_driver.F90 fixed that. - Fixed compile/README. Config files are specified with the -c option. - The -t option in run_scripts/run_tuner.bash interfered with the previous usage of the TUNER compiler directive. Renamed the old TUNER directive to NR_SP, a short for "numerical recipes, single precision". TUNER now is the option to "turn on" code changes required to run the tuner.
Implementing changes to the initial conditions. This commit contains code changes related to the modified initial conditions for convergence test simulations. These code changes can be activated by setting l_modify_ic_with_cubic_int = .true. in the namelist on a case-by-case basis.
Along with this option, the sounding profiles are also modified for the BOMEX, RICO, DYCOMS2_RF02 and Wangara cases.
-- For BOMEX, RICO, Wangara cases, we add more height levels in the original sounding profiles so that the cubic spline interpolation produces consistent profiles with those with linear interpolation
-- For DYCOMS2_RF02, instead of using the formulations in the code to derive the initial condition profiles (which will results in the grid-spacing dependent initial condition when we refine grid), we construct a sounding profiles (still use the same formulas as in the src/sounding.F90) on a high-resolution grid (refine the standard grid by a factor of 2^7), then save the profile in dycoms2_rf02_sounding.in. In this way, the model initiliazation will always read the same sounding profile for initialization when user refine the vertical model grid.
This commit most definitely does not change any bits, but commit 67878ef was BIT_CHANGING for the DYCOMS-II RF02 family of cases, RICO (and RICO SILHS), BOMEX, and Wangara.
Add scripts to configure and run convergence test This commit contains new scripts created to configure and run convergence test simulations. There are four scripts:
1. run_scripts/run_cnvg_test_multi_cases.csh. This script is used to compile and run convergence simulations with specific configurations (see details in scripts for explations). After the simulations,the space-time convergence plots will also be generated.
2.run_scripts/convergence_config.py: this script "called" by the first script to generate the namelist file for CLUBB-SCM simulations. With this script, the modified configuration will be applied in the case run directory, while the files in default clubb will not be touched
3.run_scripts/convergence_function.py: this script contains function to modify the initial condition profile for convergence test simulations. It is called by run_scripts/convergence_config.py when the model is configured to use modified initial conditions
4.run_scripts/plot_l2_convergence.py: this is a sample script to generate the space-time convergence plots
reorgnize the scripts for convergence test simulations Move the script associated with convergence test simulation in the folder of run_scripts/convergence_run
* Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver.
* Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly.
* Making openacc statements more consistent. Ensuring all statments on double loops have specified gang and vector, and that all parallel loops have an end parallel loop statment at the end of them. Everything BFB on CPUs and GPUs.
* Pushing acc data region to outermost parts of mixing_length.
* Removing pdf_implicit_coefs_terms from acc copyin and copyout. It is only used when iiPDF_type == iiPDF_new .or. iiPDF_type == iiPDF_new_hybrid, so we do not need to do any copying with it. The inclusion of it also caused the data statement to copy unallocated arrays, which are just garbage pointers, and that was causing random occasional crashes (either segfaults or gpu out of memory).
* The update device clauses for return variables seems to only be requried for arrays contained in types. See https://github.com/larson-group/clubb/issues/1049\#issuecomment-1440624778
* Moving acc end data to end of pdf_closure. This reuqired removing any conditional return statements that appear before the final return, since we're not allowed to branch out of an acc region early. I also moved a large printout statement outside of a loop. The only reason it was in the loop to begin with was because pdf_params used to be an array of types, but now is a type of arrays, allowing us to print the full arrays directly.
* Making loop an acc loop. If we weren't outputting w_[up/down]_in_cloud (iw_up_in_cloud <= 0 .or. iw_down_in_cloud <= 0, then these arrays were only being zerod out on the CPU and would've getting overwritten by the uninitialized GPU data at the end of the data statement. This change causes the arrays to get correctly zerod out on the GPU when we need.
* Update VariableGroupNondimMoments.py
Fixed a typo
* Merging new changes from master
* Removing need for -gpu=deepcopy, pushing some acc data statements up call tree, and replacing some acc data statements with acc delare statements so that return statements can be added back in.
* Adding back an acc loop that was accidentally removed during a merge.
---------
Co-authored-by: Brian Griffin <31553422+bmg929@users.noreply.github.com>
Code changes to implement modified boundary condition This commit contains code changes to implement modified boundary conditions for convergence test simulations. These code changes can be activated by setting l_modify_bc_for_cnvg_test = .true. in the CLUBB namelist.
This code change is expected to be BIT_CHANGING for cases in which `l_predict_upwp_vpwp = T`, `l_mono_flux_lim_um = T` or `l_mono_flux_lim_vm = T`, and the monotonic flux limiter is triggered.
This bug fix prevents non-conservation of momentum when the vertical integral of either of the wind components `um` or `vm` is negative.
I should've added the answer-changing tag to my previous commit: BIT_CHANGING:903169a.
That commit doesn't really change answers, but adds a new tunable parameter which will show up in the output files and therefore they will differ from previous output files.
RESOLVED:8e473e08b858df61c5c5116e37e26f3df2431a0b Above committed on March 8th, 2023 BIT_CHANGING:5cbf4f80a34cfafd2fd164415af5ec7d6239bcdd Above was committed on March 14th, 2023
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break.
Undoing README update, 1 space = 1 byte and we should maximize file sizes to deter hackers from stealing our data. Also BIT_CHANGING:fb4556e4cc4cb3d4b6df3520370a28a824f357ef for configs where l_use_cloud_cover = .false., which means I was wrong about this ever not being bit changing, so I must've either only tested with l_use_cloud_cover = .true. or only tested the multicol diffs when I put compute_cloud_cover inside the if statement.
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2.
Code changes to implement modificiations on wp3 clippings This commit contains code changes to implement modifications of skewness clippings on wp3 in src/CLUBB_core/clip_explicit.F90. The default method attempts to apply smaller (larger) clippings below (above) 100m AGL level, which can cause a discontinuities around 100m AGL level. This clippings is found to trigger sawthooth oscillations in wp3 when linear diffusion is used. Such swathooth oscillations are eleminated if a smoothed Heaviside function is introduced to obtain a smooth transition of clippings at around 100m AGL level. The changes is necessary to obtain the first order convergence in CLUBB-SCM when lienar diffusion is used.
This commit contains code changes to implement modifications on limiters in three places:
1. remove the limiters in denominator of equation for brunt_vaisala_freq_sqd_smth, which affects the computed eddy dissipation time scale in turbulent fluxes (wpxp). (in mixing_length.F90)
2. reduce the threshold values of limiters in the equation for richardson number (sqrt_Ri_zm) (in mixing_length.F90)
3. introduce the smoothed max/min function for limiters in equation of Cx_fnc_Richardson. (in advance_helper_module.F90).
After the modification, we also apply a zt2m(zm2zt) smoothing on the calculated quantities. These modifications are found to be benificial for improving solution convergence in CLUBB-SCM
The code changes are controlled by a newly introduced flag named "l_modify_limiters_for_cnvg_test", which is set to .false. (meaning that the modificaitons on limiters is turned off) by default.