Adding in a git clone step to the Cam Clubb Copy test. (details)
Trying out git command rather than checkout for cloning. (details)
Adding correct scm command and removed redundant run commond from end. (details)
Adding space because commit 26b7e8f8d7007f78d655b8fc15045015ce8d4790 fixed the jenkins test that were failing, but did not include the proper resolving messages. RESOLVED:7046396a59f73087323f04b267cd8614ff1653f2 RESOLVED:30559bb00c20ce8a8c21242272f6d16cecb12b45 RESOLVED:970b880d9144efe00ce9d6fa34bcd49a493ac65b RESOLVED:633e158c8e6d5818b39963e6836ef71c1489faf9 RESOLVED:4ae233813e39a8bc7e3b58e600c0f71103fff8b3 (details)
Renamed smth_range to heaviside_smth_range (details)
Added tot_vartn_normlzd statistics. Renamed sclr in advance_helper_module to scalar to be more consistent with clubb naming schemes. (details)
Implemented three further normalized variation stats. Included a (most likely temporary) check because in a few cases, denominator for normalization would be 0. (details)
Changed priorities of total normalized variation stats, included error handling in total normalized variation stats. (details)
Renamed pdf_output_filename. Added grid_level constant to avoid magic numbers in stats_update_var_pt calls. (details)
Bugfix for merge #1000. stderr was not imported in stat_clubb_utilities.F90, causing compilation to fail. (details)
Added ability to apply smooth min max functions in mixing_length.F90 (details)
Updated deprecated documentation of smooth min and max functions in advance_helper_module.F90. https://github.com/larson-group/clubb/issues/965 (details)
Checked whether results really are identical even with round-off when we have l_smooth_min_max=T and smooth_min_max_mag=zero. Next commit rolls some of these changes back for merge into master. https://github.com/larson-group/clubb/issues/965 (details)
Added test cases to smooth_min_max_tests.F90 and updated documentation. (details)
Constructing rcm within SILHS (as rcm_pdf) (#1011) (details)
Constructing rcm within SILHS (as rcm_pdf) (#1011) (details)
Updating for changes to CLUBB. See https://github.com/larson-group/clubb/commit/e4f125ba067ba8083f917e0e06b6b2398483d3e4. (details)
The G_unit tests never allocated pdf_implicit_coefs_terms. It's unclear to me how these were working before, but it seems that something about making these 2D allocatable arrays exposed the bug. (details)
Making zt2zm calls with pdf_implicit_coefs_terms use the 2D version. (details)
Removing setup_grid and setup_parameters functionality from setup_clubb_core. This is beacuse in host models the required grid information may not be known during the setup process, resulting in dummy arguments for setup_clubb_core and the grid and paramters being setup during runtime anyway. Now, in the host models we can call these subroutines immediately after setup_clubb_core to maintain identical functionality (sam, wrf), and in others we can wait to call these until the main timestepping procedure (cam,e3sm). (details)
Making compatible with latest clubb change. (details)
Making 2D versions of setup_grid and setup_parameters. (details)
Now using 2D versions of setup_grid and setup_parameters (details)
Making lmin a scalar again, twas a mistake to make it an array, it can only take on one value. (details)
Making compatible with latest clubb change. (details)
Making nu_vert_res_dep a type containing arrays, as opposed to being an array of types. (details)
Making compatible with latest clubb change. (details)
Making silhs use the new 2D version of setup_grid. This also inadvertently fixes a bug added in commit c816bcbd4bb8b8cfa89aa9c0f976ee799cb795e2. The issue was that we used pver when we wanted pverp. (details)
Changing NETCDF path in jenkins test command. (details)
Modifying shell commands to be the more recognized version. |& should be understood by bash version >= 3.2.25 but that does not seem to be the case, so we're reverting it to the equivalent 2>&1 | (details)
Changing the cam_global_ERP_Ln9_gfortran_test jenkins test to specify the computer when configuring. (details)
The big grid change. Converting gr from being an array of types containing 1D arrays, to a type containing 2D arrays. All cases BFB, cam multicolumn+silhs BFB, and cam multicolumn (no silhs) with backwards compatible settings BFB. (details)
The big grid change. Converting gr from being an array of types containing 1D arrays, to a type containing 2D arrays. All cases BFB, cam multicolumn+silhs BFB, and cam multicolumn (no silhs) with backwards compatible settings BFB. (details)
Making compatible with latest clubb change. update_xp2_mc will be broken by this, but all we need to do to fix it is push the column loop into it. (details)
Removing the zt2zm interface from clubb_api and making the api calls just redirect to the grid class interface for it. (details)
Pushing column loop into mean advection procedures. (details)
Making linear_interpolated_azt_2D and linear_interpolated_azm_2D subroutines just to avoid a needless data copy. (details)
Making update_xp2_mc 2D and creating interface for 1D calls. (details)
Making use of new 2D call to update_xp2_mc (details)
Declaring local variables with state%ncol rather than pcols. This avoid occasional slicing and other complications. Also making everything in clubb_tend_cam follow the same indentation scheme. (details)
added ulimit tag to arm97 case of scam test to avoid stack overflow errors larson-group/sys_admin#781 (details)
Adding run script for gpu runs without silhs. The only thing making this a GPU script right now is that it's set for casper using nvhpc with acc flags, but there might be more settings in the future we want to add here. (details)
Compiler and machine scripts to include the lastest changes in the cime repo, but also keeping our custom configurations in. (details)
Adding compiler optioned needed to compile with latest gnu version. These were accidentally removed and caused tests to fail. RESOLVED:f98790fed7b9283cce7e9981d7e0c2be7fe8e3fb (details)
Fixing a bug in mono_flux_limiter.F90. (#1026) (details)
I added stats output for w_down_in_cloud to all_stats.in. (details)
Updating for the latest changes to CLUBB. (details)
I altered the w_up_in_cloud and w_down_in_cloud code so that a (details)
I have optimized the new w_up_in_cloud and w_down_in_cloud code by (details)
Refactored fill_holes_vertical to make GPUization simple. This is BIT_CHANGING, but results are bit-for-bit when using -O0 optimization, thus it is not answer changing. The first pass over each grid column will not parallelize well, the k-loop needs to be done in serial. Maximum parallelization has been exposed for the global hole-filling though, at the cost of occasionally doing unneccesary calculations. larson-group/clubb#972. (details)
Removing fill_holes_multiplicative and replacing magic numbers with parameters from constants_clubb. larson-group/clubb#972 (details)
Moving vertical_avg and vertical_integral to advance_helper_module. larson-group/clubb#972 (details)
Moving vertical_avg and vertical_integral to advance_helper_module. larson-group/clubb#972 (details)
Removing elementalness from sat_vapor_press_liq and making interal procedures subroutines rather than functions to prevent unneccesary data copies. Doing the same for thlm2T_in_K since it is often used in conjunction with sat_vapor_press_liq. Bit-for-bit confirmed with O0 using all single column cases with or without l_diag_Lscale_from_tau, and with cam_coarse_res. larson-group/clubb#972 (details)
Cleaning up new subroutine calc_liquid_cloud_frac_component, and making sat_mixrat_ice a subroutine that works the same way as sat_mixrat_liq. larson-group/clubb#972 (details)
Removing these paratheses is BIT_CHANGING since it modifies the order of operations, but allows for the multiplication and subraction to be done in parallel for complex pipelines. (details)
Adding line to indicate BIT_CHANGING:0f8ff1baa73dd122911de7978d0067ad1fcc348b. This is because of actual bit changing commits, and is expected. See https://github.com/larson-group/clubb/pull/1034. (details)
Improvements. Netcdf output is now functional and we can detect errors with multiple columns even when the standard output is identical. (details)
Adds commented-out line that prevents the monotonic flux (details)
Updating monotonic flux limiter code to remove spikes. (#1038) (details)
Creating new flags to control monotonic flux limiter (#1039) (details)
Updating for changes to CLUBB. See https://github.com/larson-group/clubb/pull/1039. (details)
I am adding cloudy_updraft_frac and cloudy_downdraft_frac as (details)
Modified for changes to advance_clubb_core_api. (details)
Minor adjustment to eliminate spikes in thlm tendencies from the monotonic flux limiter. (#1043) (details)
Adding line to indicate BIT_CHANGING:5abaa74c4daad3b827fff70359563703a9152d1b. This is because of actual bit changing commits, and is expected (details)
Adding space to file since previous commit was intentionally BIT_CHANGING:a466f958c140acbd260290c59213a96f3c049793. (details)
Adding capability to change matrix solving method via clubb_config_flags. (details)
Fixing small bug, need to pass _copy arrays to prevent lapack mangling the real ones. (details)
Making GPU and CPU versions of the penta_lu solver the same as discussed in larson-group/clubb#1024. (details)
Making compatible with latest clubb change. (details)
I added "smooth" max clipping for invrs_tau_shear, which is a variable (details)
ADG1_pdf_driver subroutine port with OpenACC (details)
Replacing old elemental ADG1_w_closure with new GPUized one. Making G_unit tests work with new version. Also making mixt_frac_max_mag a scalar since it was only being used as such. larson-group/clubb#1049 (details)
I can now safely remove all the "ifdef E3SM" statements from CLUBB's (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052) (details)
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054) (details)
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051) (details)
Adding OpenACC data directives for mixing length and adg routines (details)
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general. (details)
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926. (details)
Removing update_pressure from public list. This was causing compilation crashes. RESOLVED:8c7230fecb877d04fb129ef5e143e0993b4b29b1 (details)
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break. (details)
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2. (details)
Small tweaks to fix some GPU bugs. Some variables were uninitialized on the CPU while we were saving them. This could only have been caught by comparing consecutive runs and checking _zt and _zm files, even then few cases were having problems. (details)
Fixing a labelling error in redirect_interpolated_azt_2D and similar procedures, since this interpolates to zt the input should be zm. I think this was my fault, so I cleaned all the zt2zm and zm2zt things up to make it a little nicer. Also ordered the routines _k _1D _2D to make it easier to jump around, it was a bit confusing as they were out of order and the typo really made it hard. (details)
Making it so sclr_tol is set to 0 before the specified sclr_tol_in. This is so that it is initialized to 0 in the case that sclr_dim = 0, since now we are setting it to have a minimum allocation size of 1 and would otherwise have a garbage value. This is what broke the clubb_openmp_gfortran_test. (details)
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797 (details)
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797 (details)
this is another commit that cahnges nothing that will trigger the gitUpdate scripts (details)
adding an update that changes nothing and is just a test for the autoupdate script (details)
Making CLUBB's splatting scheme implicit and smoother (#1075) (details)
change to calc pressure to trigger autoupdate (details)
GPUizing Lscale_width_vert_avg. Loops have been restructured for simplicity, and algorithm has a different starting value to avoid k dependency. Results are BFB. (#1083) (details)
GPUizing most of advance_clubb_core (#1084) (details)
advance_wp2_wp3 with explicitly managed memory (#1085) (details)
advance_xp2_xpyp with explicitly managed memory (#1086) (details)
advance_windm_edsclrm with explicitly managed memory (#1087) (details)
Moving data statements to outter most parts of clubb and little fix in advance_wp2_wp3 (#1088) (details)
Adding space because commit 26b7e8f8d7007f78d655b8fc15045015ce8d4790 fixed the jenkins test that were failing, but did not include the proper resolving messages. RESOLVED:7046396a59f73087323f04b267cd8614ff1653f2 RESOLVED:30559bb00c20ce8a8c21242272f6d16cecb12b45 RESOLVED:970b880d9144efe00ce9d6fa34bcd49a493ac65b RESOLVED:633e158c8e6d5818b39963e6836ef71c1489faf9 RESOLVED:4ae233813e39a8bc7e3b58e600c0f71103fff8b3
* Pushing column loop into xm_wpxp_clipping_and_stats and monotonic_turbulent_flux_limit. This essentially completes advance_xm_wpxp for now. larson-group/clubb#972
* Replacing the i loop iterator for scalars with sclr.
* Pushing loop into advance_xp2_wpxp.
* Moving the lhs_dp1 calculation to outside of xp2_xpyp_lhs. This gives us more flexibility, since we want to pass lhs_dp1 into xp2_xpyp_implicit_stats.
* Moving the lmm_stepping and stats calls to immediately after calls to xp2_xpyp_solve. This is because xp2_xpyp_implicit_stats saves things that are saved by scratch variables, and these scratch variables are set by the lhs and rhs setup calls. So for the multiple lhs routine, we need to save the scratch variables immediately after these calls before overwriting them by calling lhs and rhs again for a different variable. Since the stats also saves the variable were solving for, we also have to move the lmm_stepping to before this call.
* Removing need for stats scratch variables by passing lhs terms to save through agument lists. This makes everything better in every way.
* Adding slicing for some lhs arrays being saved in stats. This was causing bit changing in a few lhs terms when run in multicolumn.
* Changing names of dp1 variables for up2 and vp2. Surrounding where they're set to be within a l_stats_samp.
* Breaking up column loop in advance_xp2_xpyp and pushing column loop into calc_xp2_xpyp_ta_terms. larson-group/cam#972
* Breaking up column loop in calc_xp2_xpyp_ta_terms.
* Pushing loop into solve_xp2_xpyp_with_single_lhs.
Implemented three further normalized variation stats. Included a (most likely temporary) check because in a few cases, denominator for normalization would be 0.
* Pushing loop down through advance_windm. Other files needed to be touched because they were using a simple function, xpwp_func, which has been replaced in relevenant places by the few lines of code it takes to do.
* Cleaned up intents
* Moving xpwp calcuations back to procedure, but making procedure a 1/2D interface in advance helper. Other various small tweaks as well.
Checked whether results really are identical even with round-off when we have l_smooth_min_max=T and smooth_min_max_mag=zero. Next commit rolls some of these changes back for merge into master. https://github.com/larson-group/clubb/issues/965
Constructing rcm within SILHS (as rcm_pdf) (#1011)
* Constructing rcm within SILHS (as rcm_pdf) to avoid having to pass rcm directly to SILHS. Relevant to possibly moving PDF call placement into the "post" position in CAM.
See https://github.com/ESCOMP/CAM/issues/582.
* Reconstructing rcm inside of setup_pdf_parameters instead of passing it through the argument lists. Might help NCAR move the PDF call placement to "post" position in CAM.
See https://github.com/larson-group/clubb/issues/997.
Constructing rcm within SILHS (as rcm_pdf) (#1011)
* Constructing rcm within SILHS (as rcm_pdf) to avoid having to pass rcm directly to SILHS. Relevant to possibly moving PDF call placement into the "post" position in CAM.
See https://github.com/ESCOMP/CAM/issues/582.
* Reconstructing rcm inside of setup_pdf_parameters instead of passing it through the argument lists. Might help NCAR move the PDF call placement to "post" position in CAM.
See https://github.com/larson-group/clubb/issues/997.
but the following commits are BIT_CHANGING:29e08c789aef09db2b76a418d3ab4c35bfc50a04 BIT_CHANGING:741338adec67b4d8d6087af4c8c3e2206692dc04 BIT_CHANGING:ed63800262913a0c130c80cd16067e7d68548613 for simulations that include silhs
The G_unit tests never allocated pdf_implicit_coefs_terms. It's unclear to me how these were working before, but it seems that something about making these 2D allocatable arrays exposed the bug.
Removing setup_grid and setup_parameters functionality from setup_clubb_core. This is beacuse in host models the required grid information may not be known during the setup process, resulting in dummy arguments for setup_clubb_core and the grid and paramters being setup during runtime anyway. Now, in the host models we can call these subroutines immediately after setup_clubb_core to maintain identical functionality (sam, wrf), and in others we can wait to call these until the main timestepping procedure (cam,e3sm).
Making silhs use the new 2D version of setup_grid. This also inadvertently fixes a bug added in commit c816bcbd4bb8b8cfa89aa9c0f976ee799cb795e2. The issue was that we used pver when we wanted pverp.
Modifying shell commands to be the more recognized version. |& should be understood by bash version >= 3.2.25 but that does not seem to be the case, so we're reverting it to the equivalent 2>&1 |
The big grid change. Converting gr from being an array of types containing 1D arrays, to a type containing 2D arrays. All cases BFB, cam multicolumn+silhs BFB, and cam multicolumn (no silhs) with backwards compatible settings BFB.
The big grid change. Converting gr from being an array of types containing 1D arrays, to a type containing 2D arrays. All cases BFB, cam multicolumn+silhs BFB, and cam multicolumn (no silhs) with backwards compatible settings BFB.
Declaring local variables with state%ncol rather than pcols. This avoid occasional slicing and other complications. Also making everything in clubb_tend_cam follow the same indentation scheme.
Adding run script for gpu runs without silhs. The only thing making this a GPU script right now is that it's set for casper using nvhpc with acc flags, but there might be more settings in the future we want to add here.
Adding compiler optioned needed to compile with latest gnu version. These were accidentally removed and caused tests to fail. RESOLVED:f98790fed7b9283cce7e9981d7e0c2be7fe8e3fb
1) The denominator term is now the "cloudy updraft" frac in each PDF component, rather than just the PDF component cloud_frac. This is more consistent with the quantity being integrated.
2) I also added a w_down_in_cloud for cloudy downdraft velocity.
Since these fields are not output as part of standard stats, this commit will be bit-for-bit for the normal CLUBB output files.
I altered the w_up_in_cloud and w_down_in_cloud code so that a thresholding is used similar to what is seen in the cloud fraction and cloud water code. If the PDF component mean of w more than the maximum allow number of standard deviations away from 0, the PDF component is either all-updrafty or all-downdrafty, and the code avoids expensive computations where large magnitude values can potentially be fed into ERF or EXP.
Of course, the results are not bit-for-bit with the previous version, meaning that there are some situations where these thresholds come into effect. However, plots of all cases show no visible differences in w_up_in_cloud and w_down_in_cloud.
Since w_up_in_cloud and w_down_in_cloud are not included in normal stats output, this commit does not change the bit-for-bitness of CLUBB code.
I have optimized the new w_up_in_cloud and w_down_in_cloud code by only doing repeated operations one time and then saving them as a local variable.
Since the numerical order of operations changes for the argument to the EXP term, the results will differ at the level of numerical round-off, making this revision not bit-for-bit with the last revision. This only affects the w_up_in_cloud and w_down_in_cloud statistical output variables, which in turn are not output as part of standard_stats.in.
Refactored fill_holes_vertical to make GPUization simple. This is BIT_CHANGING, but results are bit-for-bit when using -O0 optimization, thus it is not answer changing. The first pass over each grid column will not parallelize well, the k-loop needs to be done in serial. Maximum parallelization has been exposed for the global hole-filling though, at the cost of occasionally doing unneccesary calculations. larson-group/clubb#972.
Removing elementalness from sat_vapor_press_liq and making interal procedures subroutines rather than functions to prevent unneccesary data copies. Doing the same for thlm2T_in_K since it is often used in conjunction with sat_vapor_press_liq. Bit-for-bit confirmed with O0 using all single column cases with or without l_diag_Lscale_from_tau, and with cam_coarse_res. larson-group/clubb#972
Cleaning up new subroutine calc_liquid_cloud_frac_component, and making sat_mixrat_ice a subroutine that works the same way as sat_mixrat_liq. larson-group/clubb#972
Removing these paratheses is BIT_CHANGING since it modifies the order of operations, but allows for the multiplication and subraction to be done in parallel for complex pipelines.
Adding line to indicate BIT_CHANGING:0f8ff1baa73dd122911de7978d0067ad1fcc348b. This is because of actual bit changing commits, and is expected. See https://github.com/larson-group/clubb/pull/1034.
I am adding cloudy_updraft_frac and cloudy_downdraft_frac as outputs to the calculate_w_up_in_cloud code.
These fields are non-interactive for the standard set of cases. Thus, all cases are bit-for-bit identical.
However, for the w_up_in_cloud and w_down_in_cloud results themselves, it is possible that results might not be bit-for-bit since the location of the max(eps, ...) clipping in the denominator is changed. However, results should not be appreciable different.
A different way of dealing with monotonic flux limiter spikes (#1046)
* A different way of dealing with monotonic flux limiter spikes in CAM, by increasing the value of thl_tol_mfl. ALso reverts the earlier fix. BIT_CHANGING.
See https://github.com/NCAR/amwg_dev/discussions/134#discussioncomment-4165447.
* Clubb ticket #1025: Implemented changes dealing with pdf_params%thl1/2 and wp2 floating point errors occurring in tuning runs. BIT_CHANGING - Added command-line option -t/--tuner to compile.bash which enables the -DTUNER compiler flag. - Added line to gfortran compilation config file to easily disable openMP - Added a couple error messages and cleaned up some instances of error handling in src/error.F90, src/clubb_driver.F90, and src/CLUBB_core/advance_clubb_core_module.F90 - Added global constant wp2_max in src/CLUBB_core/constants_clubb.F90 which sets the upper bound for wp2 - In pdf_closure, added sanity checks for pdf_params%thl1/2 (>=190K, <=1000K) - Added debug warning in src/CLUBB_core/advance_wp2_wp3_module.F90 when wp2 is clipped. - Added wp2_sfc clipping in src/CLUBB_core/sfc_varnce_module.F90 - Added debug_level_check to NaN check in clubb_driver.F90 - Added mention of the new compiler option to the README
I added "smooth" max clipping for invrs_tau_shear, which is a variable that is supposed to be positive definite, yet was obtaining negative values at the model lower boundary owing to linear extension at the boundaries as part of the linear interpolation call.
Added OpenACC related flags in linux_x86_64_nvhpc_casper.bash You can enable/disable OpenACC compilation using OPENACC=true/false. Added OpenACC directives in ADG1_pdf_driver subroutine.
Replacing old elemental ADG1_w_closure with new GPUized one. Making G_unit tests work with new version. Also making mixt_frac_max_mag a scalar since it was only being used as such. larson-group/clubb#1049
I can now safely remove all the "ifdef E3SM" statements from CLUBB's parameters_tunable.F90. This code is now located in the clubb_intr.F90 portion of E3SM.
Restructuring and Porting of Compute_mixing_length subroutine(Phase 1) (#1052)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 1)
Restructure: The compute_mixing_length is one of the top most routine taking 35-50% of the total time in a single timestep. The subroutine has been restructured to push the i-loop further down to extract vectorization and parallelization. The restructuring also involves introduction of sat_mixrat_liq_acc routines to extract parallelism when called inside a OpenACC parallel region.
Porting: OpenACC directives are inserted to port the restructured compute_mixing_length code on to the GPUs. This port is currently unoptimized and there is still room for improvement.
NOTE: Currently, l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau (Earthworks config options) case is supported on OpenACC build. Any other options works on CPUs as usual. OpenACC declare create directives are inserted in model_flags and constants_clubb, as these module variables are used inside the saturation routines.
* Added debug message about only supporting l_sat_mixrat_lookup = false and saturation_formula = saturation_flatau on GPUs Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler.
* Changing CLUBB debug level 1 to 0 for the saturation formula support running on GPUs
* Changing indentation to make gfortran happy, it wants ifdefs to start at the beginning of the line.
* Adding use statements for error checks and printouts, also making the errors set err_code to clubb_fatal_error.
Restructuring and Porting of Compute_mixing_length subroutine(Phase 2) (#1054)
* Restructuring and Porting of Compute_mixing_length subroutine(Phase 2)
Restructure: sat_mixrat_liq_2D_acc is being directly called instead of calling the 1D version inside the column loop. Changing sat_mixrat_liq_2D_acc to a subroutine from a function and adding output array, start_index as additional argument. This is a workaround for passing sub-arrays. The OpenACC doen't like the sub-arrays being passed and fails the validation.
Porting: OpenACC directives are added inside sat_mixrat_liq_2D_acc for porting
Validation: Answers are Bit for Bit with arm-multicolumn case + nvhpc compiler
* Fix for compilation issues
Issue 1: Missed out declaring 'start_index' while intergrating the change
Issue 2: The use of error_code module and the procedures inside it causes OpenACC compilation issues when run on the device.
* Removing the sat_mixrat_liq_acc and sat_mixrat_liq_2D_acc, making the normal sat_mixrat_liq work for all current use cases, and making the other versions of sat_mixrat_liq (bolton,gfdl,lookup) functional with OPENACC.
Breaking up column loop in mono_flux_limiter. This may not be the final form for GPUization, but it's definitely a start, no longer do we have to copy single column variables to multicolumn ones anywhere. (#1051)
Adding OpenACC data directives for mixing length and adg routines
OpenACC structured data regions are added to optimize the data transfers between CPU and GPU. These data regions will converted to unstrucutred data region in the later optimization phase. Results are bit for bit.
Fixing error causing GPU code not to run. Some variables to be copied were labelled as (ngrdcol,nz) when it should be (:ngrdcol,:nz). I've just removed the data length specifiers completely since they are not neccesary in general.
Removing update_pressure since it is no longer called anywhere in clubb or host models. The addition of this subroutine was discussed in larson-group/e3sm#6 and the removal of the call to it was discussed in larson-group/clubb#926.
Porting pdf_closure subroutine with OpenACC (#1059)
* Porting pdf_closure subroutine with OpenACC
OpenACC directives are added to pdf_closure subroutine. The necessary structured data region is also added for optimzing data movement across kernels. There is opportunity to task parallelize using streams and will be explored in the future.
Clubb ticket #1025: Implemented way to make esa tuner reproducible, h… (#1068)
* Clubb ticket #1025: Implemented way to make esa tuner reproducible, hid error output of optional diagnostic variables behind check, fixed parallelization issue with tuner, esa max_iters parameter is now in stats namelist, fixed issue with TUNER compiler directive, some small fixes. - New namelist variables prescribed_rand_seed and l_use_prescribed_rand_seed determine if the esa tuner will use a random or a fixed value as random seed. Added descriptions to README. - Added max_iters to stats namelist to make it more modifiable. - Renamed stp_adjst_intercept_in and stp_adjst_slope_in to stp_adjst_shift_in and stp_adjst_factor_in, respectively, to better reflect their influence on step size. - The error output in src/CLUBB_core/pdf_closure_module.F90 for the diagnostic variables wprtp2, wpthlp2, wprtpthlp, and rcp2 is now hidden behind existence checks for these variables. And a clarification was added to "#ifdef TUNER" directive. - NetCDF file access caused the tuner to crash in parallel mode (-fopenmp flag in config file and multiple cases). Adding an $OMP CRITICAL structure around the call to stats_init in clubb_driver.F90 fixed that. - Fixed compile/README. Config files are specified with the -c option. - The -t option in run_scripts/run_tuner.bash interfered with the previous usage of the TUNER compiler directive. Renamed the old TUNER directive to NR_SP, a short for "numerical recipes, single precision". TUNER now is the option to "turn on" code changes required to run the tuner.
* Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver.
* Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly.
* Making openacc statements more consistent. Ensuring all statments on double loops have specified gang and vector, and that all parallel loops have an end parallel loop statment at the end of them. Everything BFB on CPUs and GPUs.
* Pushing acc data region to outermost parts of mixing_length.
* Removing pdf_implicit_coefs_terms from acc copyin and copyout. It is only used when iiPDF_type == iiPDF_new .or. iiPDF_type == iiPDF_new_hybrid, so we do not need to do any copying with it. The inclusion of it also caused the data statement to copy unallocated arrays, which are just garbage pointers, and that was causing random occasional crashes (either segfaults or gpu out of memory).
* The update device clauses for return variables seems to only be requried for arrays contained in types. See https://github.com/larson-group/clubb/issues/1049\#issuecomment-1440624778
* Moving acc end data to end of pdf_closure. This reuqired removing any conditional return statements that appear before the final return, since we're not allowed to branch out of an acc region early. I also moved a large printout statement outside of a loop. The only reason it was in the loop to begin with was because pdf_params used to be an array of types, but now is a type of arrays, allowing us to print the full arrays directly.
* Making loop an acc loop. If we weren't outputting w_[up/down]_in_cloud (iw_up_in_cloud <= 0 .or. iw_down_in_cloud <= 0, then these arrays were only being zerod out on the CPU and would've getting overwritten by the uninitialized GPU data at the end of the data statement. This change causes the arrays to get correctly zerod out on the GPU when we need.
* Update VariableGroupNondimMoments.py
Fixed a typo
* Merging new changes from master
* Removing need for -gpu=deepcopy, pushing some acc data statements up call tree, and replacing some acc data statements with acc delare statements so that return statements can be added back in.
* Adding back an acc loop that was accidentally removed during a merge.
---------
Co-authored-by: Brian Griffin <31553422+bmg929@users.noreply.github.com>
This code change is expected to be BIT_CHANGING for cases in which `l_predict_upwp_vpwp = T`, `l_mono_flux_lim_um = T` or `l_mono_flux_lim_vm = T`, and the monotonic flux limiter is triggered.
This bug fix prevents non-conservation of momentum when the vertical integral of either of the wind components `um` or `vm` is negative.
Moving compute_cloud_cover outside of if ( l_use_cloud_cover ) then statement, the cloud_cover and rcm_in_layer variables they compute aren't output in clubb_standalone, but are in cam, causing cam bit diff tests to break.
Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2.
Code changes to implement modificiations on wp3 clippings This commit contains code changes to implement modifications of skewness clippings on wp3 in src/CLUBB_core/clip_explicit.F90. The default method attempts to apply smaller (larger) clippings below (above) 100m AGL level, which can cause a discontinuities around 100m AGL level. This clippings is found to trigger sawthooth oscillations in wp3 when linear diffusion is used. Such swathooth oscillations are eleminated if a smoothed Heaviside function is introduced to obtain a smooth transition of clippings at around 100m AGL level. The changes is necessary to obtain the first order convergence in CLUBB-SCM when lienar diffusion is used.
This commit contains code changes to implement modifications on limiters in three places:
1. remove the limiters in denominator of equation for brunt_vaisala_freq_sqd_smth, which affects the computed eddy dissipation time scale in turbulent fluxes (wpxp). (in mixing_length.F90)
2. reduce the threshold values of limiters in the equation for richardson number (sqrt_Ri_zm) (in mixing_length.F90)
3. introduce the smoothed max/min function for limiters in equation of Cx_fnc_Richardson. (in advance_helper_module.F90).
After the modification, we also apply a zt2m(zm2zt) smoothing on the calculated quantities. These modifications are found to be benificial for improving solution convergence in CLUBB-SCM
The code changes are controlled by a newly introduced flag named "l_modify_limiters_for_cnvg_test", which is set to .false. (meaning that the modificaitons on limiters is turned off) by default.
* Adding Skthl_zm to the update host list, I missed this in the last PR. I noticed this by comparing results with and without managed memory, now I've checked BFBness with arm, mpace_b, mc3e, and gabls2.
* Small GPU fixes (#1076)
* Fixing small things that I caught by adding the default(present) onto acc loops.
* Moving default(present) to the end because it looks nicer there.
* Adding default(present) to all acc loop statements. Also adding azt to a copyin statement, which was missed previously. All BFB.
* Incemental update, not well tested yet.
* Removing some copies and making the sclr_dim change.
* Fixing a bug that only seemed detectable with astex_a209. We need to pass only single arrays to functions, calling ddzt( nz, ngrdcol, gr, rho_ds_zt * K_zt_nu ) was resulting in rho_ds_zt * K_zt_nu being evluated on the CPU, but the values were only valid on the GPU. So we need to evaluate that expression on the GPU, save it into an array (currently K_zt_nu_tmp), then pass that to ddzt.
* GPUizing calc_turb_adv_range
* GPUizing mono_flux_limiter
* Cleaning up data statments and a couple other things.
* Updated for some different options.
* More updates needed for various options.
* Reverting accidental flag change
* Should be the final changes, all options tested now.
* Replacing some comments in monoflux limiter, and also modifying it to make it BFB on CPUs. Also changing incorrect error conditions on tridiag.
* Adding max_x_allowable to update host statement, missed previous.
* Properly naming tmp variables and variables calculated from ddzt and ddzm start with ddzt_ and ddzm_.
* Replacing constants with named ones from constants_clubb.
* Replacing hard coded numbers in lhs variables representing the number of bands they contain with fortran parameters.
Small tweaks to fix some GPU bugs. Some variables were uninitialized on the CPU while we were saving them. This could only have been caught by comparing consecutive runs and checking _zt and _zm files, even then few cases were having problems.
Fixing a labelling error in redirect_interpolated_azt_2D and similar procedures, since this interpolates to zt the input should be zm. I think this was my fault, so I cleaned all the zt2zm and zm2zt things up to make it a little nicer. Also ordered the routines _k _1D _2D to make it easier to jump around, it was a bit confusing as they were out of order and the typo really made it hard.
Making it so sclr_tol is set to 0 before the specified sclr_tol_in. This is so that it is initialized to 0 in the case that sclr_dim = 0, since now we are setting it to have a minimum allocation size of 1 and would otherwise have a garbage value. This is what broke the clubb_openmp_gfortran_test.
* Making 2 new functions zm2zt2zm and zt2zm2zt to handle smoothing by interpolation. Replaced the spots in clubb I know that uses this to smooth things. This is just a nice to have and could allow for easy optimizations in the future by inlining the interpolations. All cases BFB on CPU and GPU, checked all relevant options too.
* GPUizing diagnose_Lscale_from_tau
* Removing some unused variables.
* Moving acc data statements from calc_Lscale_directly up to advance_clubb_core.
* Removing an unused variable.
* GPUizing the l_smooth_min_max option.
* GPUizing l_avg_Lscale
* Changes to variable names to avoid gross long names only used once.
* GPUizing pvertinterp even though I don't think we care about the l_do_expldiff_rtm_thlm flag
* Fixing bug. Setting l_do_expldiff_rtm_thlm causes us to use edsclrm, so we need to also ensure that edsclrm > 1 (1 because it uses a edsclr_dim-1 index)
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797
This commit is a commit that changes absolutely nothing. It is meant to trigger a change in the git update scripts, so that I can start the commit message logging in the autocommit updates larson-group/sys_admin#797
* BIT_CHANGING:3b086a40085284aa49c71d32c001d20153a8ddb4 the last commit is bit changing for only some cases and only when using higher than -02 optimization. uf min seems to be the first calculation that starting to differ bitwise. Using the check_multicol script confirms the differences are small.
* Adding a tweak to surface values in the extra columns. This helped me check calc_sfc_varance, since we were not changing any arrays that would've affected calculations there.
* Small optimization, making wstar and ustar2 scalars.
* GPUizing calc_sfc_varnce
* Removing conditional around some stats calls. Now we will always save sfc values to stats, because this will change stats files when gr%zm(i,1) > sfc_elevation, this is potentially BIT_CHANGING.
* Merging with latest clubb changes and making work on GPUs again.
This contained 2 commits that are BIT_CHANGING in some situations.
GPUizing Lscale_width_vert_avg. Loops have been restructured for simplicity, and algorithm has a different starting value to avoid k dependency. Results are BFB. (#1083)