Monoflux GPU optimization (#1221)* Making a fast version of the serial loop to check if we even need to perform the slow version at all.* Updates* Final touches to make new version logically the same.* Improving comments and variables names