Record a water performance tuning process

Posted by duclet on Thu, 09 Sep 2021 22:26:58 +0200

The plug-in debugged this time is called Stylized Water 2 1.0.9. Because this plug-in is a top selling plug-in, it is more meaningful to use and learn, so record it this time. You can search in the Asset Store. It took about a day.

More than half of the time is spent on the preparation of the project, including the search of material control attributes and the control of material attributes by UI

Record the idea, process and experience of searching. First, FPS is printed. Test the main functions one by one, and then basically the difference between one frame and two frames. When the wave comes, it is found that there is a difference of about 15 frames between opening and not opening. Then, look for the code in it and test it by random annotation, because writing UI tuning parameters is too expensive. Here are the test results

  1. To test this shader switch, you need to turn off the switch to remove the inappropriate shader variant
    Multi_compile is all branches, regardless of Unity, will be packaged
    Shader_ Features and shaders_ feature_ Local is not used. Unity will remove the package
    The latter two will all shaders_ The feature switches are turned on or off together

  2. Do not use wechat to send apk to your mobile phone. Wechat will automatically install only the first version, and subsequent versions will be the first version

  3. All branches use Multi_Compile packaging is too time-consuming. You can only open tests one by one when you close other
    Multiple open test times increase exponentially

  4. Some passes are not #pragma multi_compile _ _WAVES removes other pass's #pragma shaders_ feature_ local _ Waves or change to #pragma multi_compile _ _WAVES

  5. After finding the reason why waves are the most performance consuming, I plan to randomly find a part of the relevant code comments of waves for calculation. This method is the fastest method besides UI adjustment of wave related parameters. Then I am lucky to find the relevant code and test it successfully. Unless I can grasp the relevant code, I can find the performance bottleneck and be familiar with the bottleneck, Otherwise, you can't rely on guessing. Generally, the most common way is to use dichotomy annotation to find the performance consuming Shader part

  6. After testing, it is found that blendnormalworldspace RNM is related to this function, but the use of this function is not affected. But this code has an impact on the source function

The following is the result of this test. The macro definition represents a function

#pragma multi_compile _ _NORMALMAP

#pragma multi_compile _ _DISTANCE_NORMALS

//Wave effect on 35 frames, not on 20 frames  
//In other cases, 30 frames are turned on and 40 frames are not turned on 
#pragma multi_compile _ _WAVES

//No frame effect can be seen
#pragma multi_compile _ _FOAM

//It has almost a frame effect
#pragma shader_feature_local _UNLIT

#pragma shader_feature_local _TRANSLUCENCY

//It has almost a frame effect
#pragma shader_feature_local _CAUSTICS


//Yes 31.5 no 33
#pragma shader_feature_local _DISABLE_DEPTH_TEX

//Yes 27.5 no 29.5
#pragma shader_feature_local _SHARP_INERSECTION
//Not tested yet
#pragma shader_feature_local _SMOOTH_INTERSECTION

The following is the problem code, which is in the chip source function

#if _WAVES
	WaveInfo waves = GetWaveInfo(uv, TIME * _WaveSpeed, _WaveFadeDistance.x, _WaveFadeDistance.y);
	// #if !_FLAT_SHADING
		Flatten by blue vertex color weight
		waves.normal = lerp(waves.normal, normalWS, lerp(0, 1, vertexColor.b));
		Blend wave/vertex normals in world-space
		waveNormal = BlendNormalWorldspaceRNM(waves.normal, normalWS, UP_VECTOR);
	// #endif
	return float4(waveNormal.xyz, 1);
	height = waves.position.y * 0.5 + 0.5;
	height *= lerp(1, 0, vertexColor.b);
	return float4(height, height, height, 1);
	
	vertices are already displaced on XZ, in this case the world-space UV needs the same treatment
	if(_WorldSpaceUV == 1) uv.xy -= waves.position.xz * HORIZONTAL_DISPLACEMENT_SCALAR * _WaveHeight;
	return float4(frac(uv.xy), 0, 1);
// #endif

Topics: wechat