Skip to content
Success

Changes

Summary

  1. Gpu updates (#1070) (details)
Commit 61ea0db56546ea2946cb3c0ee368fd57752c1451 by noreply
Gpu updates (#1070)

* Fixing bug. This was only triggered when l_input_fields=.true., which I am only testing because it needs to be true so that I can test ADG2_driver.

* Removing usage of gr from pdf_closure. It was only ever used for nz, which is now fed in directly.

* Making openacc statements more consistent. Ensuring all statments on double loops have specified gang and vector, and that all parallel loops have an end parallel loop statment at the end of them. Everything BFB on CPUs and GPUs.

* Pushing acc data region to outermost parts of mixing_length.

* Removing pdf_implicit_coefs_terms from acc copyin and copyout. It is only used when iiPDF_type == iiPDF_new .or. iiPDF_type == iiPDF_new_hybrid, so we do not need to do any copying with it. The inclusion of it also caused the data statement to copy unallocated arrays, which are just garbage pointers, and that was causing random occasional crashes (either segfaults or gpu out of memory).

* The update device clauses for return variables seems to only be requried for arrays contained in types. See https://github.com/larson-group/clubb/issues/1049\#issuecomment-1440624778

* Moving acc end data to end of pdf_closure. This reuqired removing any conditional return statements that appear before the final return, since we're not allowed to branch out of an acc region early. I also moved a large printout statement outside of a loop. The only reason it was in the loop to begin with was because pdf_params used to be an array of types, but now is a type of arrays, allowing us to print the full arrays directly.

* Making loop an acc loop. If we weren't outputting w_[up/down]_in_cloud (iw_up_in_cloud <= 0 .or. iw_down_in_cloud <= 0, then these arrays were only being zerod out on the CPU and would've getting overwritten by the uninitialized GPU data at the end of the data statement. This change causes the arrays to get correctly zerod out on the GPU when we need.

* Update VariableGroupNondimMoments.py

Fixed a typo

* Merging new changes from master

* Removing need for -gpu=deepcopy, pushing some acc data statements up call tree, and replacing some acc data statements with acc delare statements so that return statements can be added back in.

* Adding back an acc loop that was accidentally removed during a merge.

---------

Co-authored-by: Brian Griffin <31553422+bmg929@users.noreply.github.com>
The file was modified pdf_utilities.F90 (diff)
The file was modified saturation.F90 (diff)
The file was modified mixing_length.F90 (diff)
The file was modified pdf_closure_module.F90 (diff)
The file was modified advance_clubb_core_module.F90 (diff)
The file was modified turbulent_adv_pdf.F90 (diff)
The file was modified adg1_adg2_3d_luhar_pdf.F90 (diff)
The file was modified grid_class.F90 (diff)
The file was modified mean_adv.F90 (diff)