Adding space because commit ba3b1ef5ceaa07706f078cc61cfdac22d3c6a1c3 in the sys_admin repo fixed a bug causing tests to fail, this means we have RESOLVED:d8df4eea1e41bee8cbd3de374148241f040fc03c, which was only labelled as failing since we erroneously checked the commit range it was included in. (details)
Making clubb_driver only call the multicolumn driver if num_standalone_columns > 1 (details)
Minor adjustment to eliminate spikes in thlm tendencies from the monotonic flux limiter. (#1043) (details)
Replacing $ with & in namelist definition to make consistent. (details)
Creates separate file for inputting data to the tuner (details)
I added "smooth" max clipping for invrs_tau_shear, which is a variable (details)
ADG1_pdf_driver subroutine port with OpenACC (details)
Replacing old elemental ADG1_w_closure with new GPUized one. Making G_unit tests work with new version. Also making mixt_frac_max_mag a scalar since it was only being used as such. larson-group/clubb#1049 (details)
Making the multicolumn code open the netcdf file before writing and close when finished writing. This fixes a bug where the netcdf data wasn't being written. (details)
For tuning dashboard, modify the separate input file (details)
Delete tuner file that was accidentally committed (details)
Prevent accidental commits of python .pyc files (details)
I can now safely remove all the "ifdef E3SM" statements from CLUBB's (details)
For the dashboard, creates a forward model solution function. (details)
I updated and augmented the instructions to run the tuner. (details)
added ntecdf var collector and readme to clubb repo (details)
Adds to tuner code for min dp for stubborn parameters (details)
Adding config script to compile with openacc using nvhpc. (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052) (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054) (details)
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051) (details)
Adds to tuner error bars on bias removal (arrow) plots (details)
Adding OpenACC data directives for mixing length and adg routines (details)
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general. (details)
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926. (details)
Removing update_pressure from public list. This was causing compilation crashes. RESOLVED:8c7230fecb877d04fb129ef5e143e0993b4b29b1 (details)
Fixing bug, arrays given a dummy index in 0fafc6b0b1f1a6058d37bf3db4bb3708204504db are declared nsize, but are only used up to nlevels, thus we need the (1,1:nlevels) specifier when passing them. This issue was only caught by our _debug tests, so that's good evidence the new flags we added to initialize unused to memory was effective. (details)
Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver. (details)
Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly. (details)
Clubb ticket #1025: Implemented way to make esa tuner reproducible, h… (#1068) (details)
added autocommit message maker to clubb so I have an easier time testing it (details)
changes to integrate message maker into gitUpdate scripts (details)
Oops, I made 1 small error, should be consistent now. (details)
Implementing changes to the initial conditions. (details)
This commit most definitely does not change any bits, (details)
Add scripts to configure and run convergence test (details)
reorgnize the scripts for convergence test simulations (details)
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break. (details)
Undoing README update, 1 space = 1 byte and we should maximize file sizes to deter hackers from stealing our data. Also BIT_CHANGING:fb4556e4cc4cb3d4b6df3520370a28a824f357ef for configs where l_use_cloud_cover = .false., which means I was wrong about this ever not being bit changing, so I must've either only tested with l_use_cloud_cover = .true. or only tested the multicol diffs when I put compute_cloud_cover inside the if statement. (details)
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2. (details)
Small improvements to diff_netcdf_outputs.py, removing reliance on ncdiff, now it is entirely in python. Cleaning up linux_x86_64_nvhpc_gpu.bash, removing outdated parts, improving default parallel compilation, changing pgfortran to nvfortran. (details)
Small tweaks to fix some GPU bugs. Some variables were uninitialized on the CPU while we were saving them. This could only have been caught by comparing consecutive runs and checking _zt and _zm files, even then few cases were having problems. (details)
Fixing a labelling error in redirect_interpolated_azt_2D and similar procedures, since this interpolates to zt the input should be zm. I think this was my fault, so I cleaned all the zt2zm and zm2zt things up to make it a little nicer. Also ordered the routines _k _1D _2D to make it easier to jump around, it was a bit confusing as they were out of order and the typo really made it hard. (details)
bug fixes for the autocommit message maker code (details)
Making it so sclr_tol is set to 0 before the specified sclr_tol_in. This is so that it is initialized to 0 in the case that sclr_dim = 0, since now we are setting it to have a minimum allocation size of 1 and would otherwise have a garbage value. This is what broke the clubb_openmp_gfortran_test. (details)
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797 (details)
Fixing an error with the autocommit_update script that was causing it new works (details)
this is another commit that cahnges nothing that will trigger the gitUpdate scripts (details)
Updates to make the convergence tests run on Anvil. (details)
adding an update that changes nothing and is just a test for the autoupdate script (details)
Making CLUBB's splatting scheme implicit and smoother (#1075) (details)
change to calc pressure to trigger autoupdate (details)
Editing convergence scripts to show that the directory should be placed in (details)
Committing scripts for use in running CLUBB convergence tests in the background. (details)
Updated the background run scripts for the convergence tests with a comment (details)
I updated the README to finish the section on the convergence tests. (details)
Edited the README section on CLUBB convergence tests. (details)
I added dycoms2_rf01 to the list of cases that could be run. (details)
I updated the run_cnvg_test_multi_cases_revall.csh script to include (details)
Modified run_cnvg_test_multi_cases_default.csh and (details)
Added comments to the script to explain ambiguous portions of my code (details)
GPUizing Lscale_width_vert_avg. Loops have been restructured for simplicity, and algorithm has a different starting value to avoid k dependency. Results are BFB. (#1083) (details)
GPUizing most of advance_clubb_core (#1084) (details)
advance_wp2_wp3 with explicitly managed memory (#1085) (details)
advance_xp2_xpyp with explicitly managed memory (#1086) (details)
advance_windm_edsclrm with explicitly managed memory (#1087) (details)
Moving data statements to outter most parts of clubb and little fix in advance_wp2_wp3 (#1088) (details)
Modifying README. Last commit was BIT_CHANGING:0b0ab3d530bef06eb90bf6dde21b26eb25780214 see https://github.com/larson-group/clubb/pull/1091 for details. (details)
Loop to `nz-1` instead of `nz` because upper boundary condition (details)
Added code to subroutine run_clubb in src/clubb_driver.F90 which prints git log and git diff information to the <case>_setup.txt output file. (#1101) (details)
Adding space because commit ba3b1ef5ceaa07706f078cc61cfdac22d3c6a1c3 in the sys_admin repo fixed a bug causing tests to fail, this means we have RESOLVED:d8df4eea1e41bee8cbd3de374148241f040fc03c, which was only labelled as failing since we erroneously checked the commit range it was included in.
A different way of dealing with monotonic flux limiter spikes (#1046)
* A different way of dealing with monotonic flux limiter spikes in CAM, by increasing the value of thl_tol_mfl. ALso reverts the earlier fix. BIT_CHANGING.
See https://github.com/NCAR/amwg_dev/discussions/134#discussioncomment-4165447.
* Clubb ticket #1025: Implemented changes dealing with pdf_params%thl1/2 and wp2 floating point errors occurring in tuning runs. BIT_CHANGING - Added command-line option -t/--tuner to compile.bash which enables the -DTUNER compiler flag. - Added line to gfortran compilation config file to easily disable openMP - Added a couple error messages and cleaned up some instances of error handling in src/error.F90, src/clubb_driver.F90, and src/CLUBB_core/advance_clubb_core_module.F90 - Added global constant wp2_max in src/CLUBB_core/constants_clubb.F90 which sets the upper bound for wp2 - In pdf_closure, added sanity checks for pdf_params%thl1/2 (>=190K, <=1000K) - Added debug warning in src/CLUBB_core/advance_wp2_wp3_module.F90 when wp2 is clipped. - Added wp2_sfc clipping in src/CLUBB_core/sfc_varnce_module.F90 - Added debug_level_check to NaN check in clubb_driver.F90 - Added mention of the new compiler option to the README
I added "smooth" max clipping for invrs_tau_shear, which is a variable that is supposed to be positive definite, yet was obtaining negative values at the model lower boundary owing to linear extension at the boundaries as part of the linear interpolation call.
Added OpenACC related flags in linux_x86_64_nvhpc_casper.bash You can enable/disable OpenACC compilation using OPENACC=true/false. Added OpenACC directives in ADG1_pdf_driver subroutine.
Replacing old elemental ADG1_w_closure with new GPUized one. Making G_unit tests work with new version. Also making mixt_frac_max_mag a scalar since it was only being used as such. larson-group/clubb#1049
Making the multicolumn code open the netcdf file before writing and close when finished writing. This fixes a bug where the netcdf data wasn't being written.
I can now safely remove all the "ifdef E3SM" statements from CLUBB's parameters_tunable.F90. This code is now located in the clubb_intr.F90 portion of E3SM.
Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 1)
Restructure: The compute_mixing_length is one of the top most routine taking 35-50% of the total time in a single timestep. The subroutine has been restructured to push the i-loop further down to extract vectorization and parallelization. The restructuring also involves introduction of sat_mixrat_liq_acc routines to extract parallelism when called inside a OpenACC parallel region.
Porting: OpenACC directives are inserted to port the restructured compute_mixing_length code on to the GPUs. This port is currently unoptimized and there is still room for improvement.
NOTE: Currently, l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau (Earthworks config options) case is supported on OpenACC build. Any other options works on CPUs as usual. OpenACC declare create directives are inserted in model_flags and constants_clubb, as these module variables are used inside the saturation routines.
* Added debug message about only supporting l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau on GPUs Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler.
* Changing CLUBB debug level 1 to 0 for the saturation formula support running on GPUs
* Changing indentation to make gfortran happy, it wants ifdefs to start at the beginning of the line.
* Adding use statements for error checks and printouts, also making the errors set err_code to clubb_fatal_error.
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 2)
Restructure: sat_mixrat_liq_2D_acc is being directly called instead of calling the 1D version inside the column loop. Changing sat_mixrat_liq_2D_acc to a subroutine from a function and adding output array, start_index as additional argument. This is a workaround for passing sub-arrays. The OpenACC doen't like the sub-arrays being passed and fails the validation.
Porting: OpenACC directives are added inside sat_mixrat_liq_2D_acc for porting
Validation: Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler
* Fix for compilation issues
Issue 1: Missed out declaring 'start_index' while intergrating the change
Issue 2: The use of error_code module and the procedures inside it causes OpenACC compilation issues when run on the device.
* Removing the sat_mixrat_liq_acc and sat_mixrat_liq_2D_acc, making the normal sat_mixrat_liq work for all current use cases, and making the other versions of sat_mixrat_liq (bolton,gfdl,lookup) functional with OPENACC.
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051)
Adding OpenACC data directives for mixing length and adg routines
OpenACC structured data regions are added to optimize the data transfers between CPU and GPU. These data regions will converted to unstrucutred data region in the later optimization phase. Results are bit for bit.
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general.
bias removal plot, a colored correlation matrix, a matrix of linear-based 2-point bounds on parameter perturbations, and a web-page version of the 3-dot subplots.
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926.
Fixing bug, arrays given a dummy index in 0fafc6b0b1f1a6058d37bf3db4bb3708204504db are declared nsize, but are only used up to nlevels, thus we need the (1,1:nlevels) specifier when passing them. This issue was only caught by our _debug tests, so that's good evidence the new flags we added to initialize unused to memory was effective.
Porting pdf_closure subroutine with OpenACC (#1059)
* Porting pdf_closure subroutine with OpenACC
OpenACC directives are added to pdf_closure subroutine. The necessary structured data region is also added for optimzing data movement across kernels. There is opportunity to task parallelize using streams and will be explored in the future.
Clubb ticket #1025: Implemented way to make esa tuner reproducible, h… (#1068)
* Clubb ticket #1025: Implemented way to make esa tuner reproducible, hid error output of optional diagnostic variables behind check, fixed parallelization issue with tuner, esa max_iters parameter is now in stats namelist, fixed issue with TUNER compiler directive, some small fixes. - New namelist variables prescribed_rand_seed and l_use_prescribed_rand_seed determine if the esa tuner will use a random or a fixed value as random seed. Added descriptions to README. - Added max_iters to stats namelist to make it more modifiable. - Renamed stp_adjst_intercept_in and stp_adjst_slope_in to stp_adjst_shift_in and stp_adjst_factor_in, respectively, to better reflect their influence on step size. - The error output in src/CLUBB_core/pdf_closure_module.F90 for the diagnostic variables wprtp2, wpthlp2, wprtpthlp, and rcp2 is now hidden behind existence checks for these variables. And a clarification was added to "#ifdef TUNER" directive. - NetCDF file access caused the tuner to crash in parallel mode (-fopenmp flag in config file and multiple cases). Adding an $OMP CRITICAL structure around the call to stats_init in clubb_driver.F90 fixed that. - Fixed compile/README. Config files are specified with the -c option. - The -t option in run_scripts/run_tuner.bash interfered with the previous usage of the TUNER compiler directive. Renamed the old TUNER directive to NR_SP, a short for "numerical recipes, single precision". TUNER now is the option to "turn on" code changes required to run the tuner.
Implementing changes to the initial conditions. This commit contains code changes related to the modified initial conditions for convergence test simulations. These code changes can be activated by setting l_modify_ic_with_cubic_int = .true. in the namelist on a case-by-case basis.
Along with this option, the sounding profiles are also modified for the BOMEX, RICO, DYCOMS2_RF02 and Wangara cases.
-- For BOMEX, RICO, Wangara cases, we add more height levels in the original sounding profiles so that the cubic spline interpolation produces consistent profiles with those with linear interpolation
-- For DYCOMS2_RF02, instead of using the formulations in the code to derive the initial condition profiles (which will results in the grid-spacing dependent initial condition when we refine grid), we construct a sounding profiles (still use the same formulas as in the src/sounding.F90) on a high-resolution grid (refine the standard grid by a factor of 2^7), then save the profile in dycoms2_rf02_sounding.in. In this way, the model initiliazation will always read the same sounding profile for initialization when user refine the vertical model grid.
This commit most definitely does not change any bits, but commit 67878ef was BIT_CHANGING for the DYCOMS-II RF02 family of cases, RICO (and RICO SILHS), BOMEX, and Wangara.
Add scripts to configure and run convergence test This commit contains new scripts created to configure and run convergence test simulations. There are four scripts:
1. run_scripts/run_cnvg_test_multi_cases.csh. This script is used to compile and run convergence simulations with specific configurations (see details in scripts for explations). After the simulations,the space-time convergence plots will also be generated.
2.run_scripts/convergence_config.py: this script "called" by the first script to generate the namelist file for CLUBB-SCM simulations. With this script, the modified configuration will be applied in the case run directory, while the files in default clubb will not be touched
3.run_scripts/convergence_function.py: this script contains function to modify the initial condition profile for convergence test simulations. It is called by run_scripts/convergence_config.py when the model is configured to use modified initial conditions
4.run_scripts/plot_l2_convergence.py: this is a sample script to generate the space-time convergence plots
reorgnize the scripts for convergence test simulations Move the script associated with convergence test simulation in the folder of run_scripts/convergence_run
* Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver.
* Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly.
* Making openacc statements more consistent. Ensuring all statments on double loops have specified gang and vector, and that all parallel loops have an end parallel loop statment at the end of them. Everything BFB on CPUs and GPUs.
* Pushing acc data region to outermost parts of mixing_length.
* Removing pdf_implicit_coefs_terms from acc copyin and copyout. It is only used when iiPDF_type == iiPDF_new .or. iiPDF_type == iiPDF_new_hybrid, so we do not need to do any copying with it. The inclusion of it also caused the data statement to copy unallocated arrays, which are just garbage pointers, and that was causing random occasional crashes (either segfaults or gpu out of memory).
* The update device clauses for return variables seems to only be requried for arrays contained in types. See https://github.com/larson-group/clubb/issues/1049\#issuecomment-1440624778
* Moving acc end data to end of pdf_closure. This reuqired removing any conditional return statements that appear before the final return, since we're not allowed to branch out of an acc region early. I also moved a large printout statement outside of a loop. The only reason it was in the loop to begin with was because pdf_params used to be an array of types, but now is a type of arrays, allowing us to print the full arrays directly.
* Making loop an acc loop. If we weren't outputting w_[up/down]_in_cloud (iw_up_in_cloud <= 0 .or. iw_down_in_cloud <= 0, then these arrays were only being zerod out on the CPU and would've getting overwritten by the uninitialized GPU data at the end of the data statement. This change causes the arrays to get correctly zerod out on the GPU when we need.
* Update VariableGroupNondimMoments.py
Fixed a typo
* Merging new changes from master
* Removing need for -gpu=deepcopy, pushing some acc data statements up call tree, and replacing some acc data statements with acc delare statements so that return statements can be added back in.
* Adding back an acc loop that was accidentally removed during a merge.
---------
Co-authored-by: Brian Griffin <31553422+bmg929@users.noreply.github.com>
Code changes to implement modified boundary condition This commit contains code changes to implement modified boundary conditions for convergence test simulations. These code changes can be activated by setting l_modify_bc_for_cnvg_test = .true. in the CLUBB namelist.
This code change is expected to be BIT_CHANGING for cases in which `l_predict_upwp_vpwp = T`, `l_mono_flux_lim_um = T` or `l_mono_flux_lim_vm = T`, and the monotonic flux limiter is triggered.
This bug fix prevents non-conservation of momentum when the vertical integral of either of the wind components `um` or `vm` is negative.
I should've added the answer-changing tag to my previous commit: BIT_CHANGING:903169a.
That commit doesn't really change answers, but adds a new tunable parameter which will show up in the output files and therefore they will differ from previous output files.
RESOLVED:8e473e08b858df61c5c5116e37e26f3df2431a0b Above committed on March 8th, 2023 BIT_CHANGING:5cbf4f80a34cfafd2fd164415af5ec7d6239bcdd Above was committed on March 14th, 2023
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break.
Undoing README update, 1 space = 1 byte and we should maximize file sizes to deter hackers from stealing our data. Also BIT_CHANGING:fb4556e4cc4cb3d4b6df3520370a28a824f357ef for configs where l_use_cloud_cover = .false., which means I was wrong about this ever not being bit changing, so I must've either only tested with l_use_cloud_cover = .true. or only tested the multicol diffs when I put compute_cloud_cover inside the if statement.
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2.
Code changes to implement modificiations on wp3 clippings This commit contains code changes to implement modifications of skewness clippings on wp3 in src/CLUBB_core/clip_explicit.F90. The default method attempts to apply smaller (larger) clippings below (above) 100m AGL level, which can cause a discontinuities around 100m AGL level. This clippings is found to trigger sawthooth oscillations in wp3 when linear diffusion is used. Such swathooth oscillations are eleminated if a smoothed Heaviside function is introduced to obtain a smooth transition of clippings at around 100m AGL level. The changes is necessary to obtain the first order convergence in CLUBB-SCM when lienar diffusion is used.
This commit contains code changes to implement modifications on limiters in three places:
1. remove the limiters in denominator of equation for brunt_vaisala_freq_sqd_smth, which affects the computed eddy dissipation time scale in turbulent fluxes (wpxp). (in mixing_length.F90)
2. reduce the threshold values of limiters in the equation for richardson number (sqrt_Ri_zm) (in mixing_length.F90)
3. introduce the smoothed max/min function for limiters in equation of Cx_fnc_Richardson. (in advance_helper_module.F90).
After the modification, we also apply a zt2m(zm2zt) smoothing on the calculated quantities. These modifications are found to be benificial for improving solution convergence in CLUBB-SCM
The code changes are controlled by a newly introduced flag named "l_modify_limiters_for_cnvg_test", which is set to .false. (meaning that the modificaitons on limiters is turned off) by default.
* Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2.
* Small GPU fixes (#1076)
* Fixing small things that I caught by adding the default(present) onto acc loops.
* Moving default(present) to the end because it looks nicer there.
* Adding default(present) to all acc loop statements. Also adding azt to a copyin statement, which was missed previously. All BFB.
* Incemental update, not well tested yet.
* Removing some copies and making the sclr_dim change.
* Fixing a bug that only seemed detectable with astex_a209. We need to pass only single arrays to functions, calling ddzt( nz, ngrdcol, gr, rho_ds_zt * K_zt_nu ) was resulting in rho_ds_zt * K_zt_nu being evluated on the CPU, but the values were only valid on the GPU. So we need to evaluate that expression on the GPU, save it into an array (currently K_zt_nu_tmp), then pass that to ddzt.
* GPUizing calc_turb_adv_range
* GPUizing mono_flux_limiter
* Cleaning up data statments and a couple other things.
* Updated for some different options.
* More updates needed for various options.
* Reverting accidental flag change
* Should be the final changes, all options tested now.
* Replacing some comments in monoflux limiter, and also modifying it to make it BFB on CPUs. Also changing incorrect error conditions on tridiag.
* Adding max_x_allowable to update host statement, missed previous.
* Properly naming tmp variables and variables calculated from ddzt and ddzm start with ddzt_ and ddzm_.
* Replacing constants with named ones from constants_clubb.
* Replacing hard coded numbers in lhs variables representing the number of bands they contain with fortran parameters.
Small improvements to diff_netcdf_outputs.py, removing reliance on ncdiff, now it is entirely in python. Cleaning up linux_x86_64_nvhpc_gpu.bash, removing outdated parts, improving default parallel compilation, changing pgfortran to nvfortran.
Small tweaks to fix some GPU bugs. Some variables were uninitialized on the CPU while we were saving them. This could only have been caught by comparing consecutive runs and checking _zt and _zm files, even then few cases were having problems.
Fixing a labelling error in redirect_interpolated_azt_2D and similar procedures, since this interpolates to zt the input should be zm. I think this was my fault, so I cleaned all the zt2zm and zm2zt things up to make it a little nicer. Also ordered the routines _k _1D _2D to make it easier to jump around, it was a bit confusing as they were out of order and the typo really made it hard.
Making it so sclr_tol is set to 0 before the specified sclr_tol_in. This is so that it is initialized to 0 in the case that sclr_dim = 0, since now we are setting it to have a minimum allocation size of 1 and would otherwise have a garbage value. This is what broke the clubb_openmp_gfortran_test.
* Making 2 new functions zm2zt2zm and zt2zm2zt to handle smoothing by interpolation. Replaced the spots in clubb I know that uses this to smooth things. This is just a nice to have and could allow for easy optimizations in the future by inlining the interpolations. All cases BFB on CPU and GPU, checked all relevant options too.
* GPUizing diagnose_Lscale_from_tau
* Removing some unused variables.
* Moving acc data statements from calc_Lscale_directly up to advance_clubb_core.
* Removing an unused variable.
* GPUizing the l_smooth_min_max option.
* GPUizing l_avg_Lscale
* Changes to variable names to avoid gross long names only used once.
* GPUizing pvertinterp even though I don't think we care about the l_do_expldiff_rtm_thlm flag
* Fixing bug. Setting l_do_expldiff_rtm_thlm causes us to use edsclrm, so we need to also ensure that edsclrm > 1 (1 because it uses a edsclr_dim-1 index)
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797
* BIT_CHANGING:3b086a40085284aa49c71d32c001d20153a8ddb4 the last commit is bit changing for only some cases and only when using higher than -02 optimization. uf min seems to be the first calculation that starting to differ bitwise. Using the check_multicol script confirms the differences are small.
* Adding a tweak to surface values in the extra columns. This helped me check calc_sfc_varance, since we were not changing any arrays that would've affected calculations there.
* Small optimization, making wstar and ustar2 scalars.
* GPUizing calc_sfc_varnce
* Removing conditional around some stats calls. Now we will always save sfc values to stats, because this will change stats files when gr%zm(i,1) > sfc_elevation, this is potentially BIT_CHANGING.
* Merging with latest clubb changes and making work on GPUs again.
This contained 2 commits that are BIT_CHANGING in some situations.
Editing convergence scripts to show that the directory should be placed in scratch space, where there is plentiful room to run, given the size of the output files.
GPUizing Lscale_width_vert_avg. Loops have been restructured for simplicity, and algorithm has a different starting value to avoid k dependency. Results are BFB. (#1083)
Modifying README. Last commit was BIT_CHANGING:0b0ab3d530bef06eb90bf6dde21b26eb25780214 see https://github.com/larson-group/clubb/pull/1091 for details.
* Chaning acc declare statements to acc enter data statement
* Making acc statements more consistent
* Making lapack useable while using openacc. Lapack is still run on the CPU
* Updating setup_clubb_core to now accept clubb_config_flags as an input, and adding warning in case clubb is running with lapack but was compiled with openacc.
* Making all end parallel directives specify end loop
* Replacing last acc declare in a procedure with acc enter/exit data commands
* Removing pure declarations, turns out they dont really improve performance and openmp isn't allowed within them.
* Adding explicit directives to penta_lu and tridiag_lu, as opposed to the previously used kernels directive. Also splitting up a loop to improve GPU performance.
* Splitting up a couple loops for performance reasons.
* Moving functions called inside loops to their own 2D subroutines. This is for performance, but is also BIT_CHANGING.
* Correct range of values to calculate for term_dp1_lhs
* Comments and cleanup
* Slight change to Skx_func, this is mathematically equivalent but BIT_CHANGING. This is faster on GPUs, and doesn't seem to have a signficiant impact on CPU performance.
* Introducing wp_coef and wp_coef_zt to reduce needed computations. This changes order of operations, so it is BIT_CHANGING.
BIT_CHANGING! Added e-folding code for mixed Brunt Vaisala frequency - CLUBB ticket #1069 - Included usage for bv_efold in src/CLUBB_core/advance_helper_module.F90 - Added intent(out) for correction_stability in subroutine calc_correction_stability - Removed brunt_vaisala_freq_sqd_plus - Idiot proofed interaction between l_diag_Lscale_from_tau and l_use_invrs_tau_N2_iso - Moved stat_update_var(invrs_tau_wp3_zm) out of if ( l_diag_Lscale_from_tau ) - Fixed some typos in src/CLUBB_core/stats_zt_module.F90 - Added comment about to input_misc/tuner/README about t_variables adn input/stats/tuning_stats.in