Using CodeXLAnalyzer CLI’s live register analysis report, you can better understand the register usage of your HLSL shaders and OpenCL kernels throughout their execution. Live register analysis is a beta feature of CodeXLAnalyzer CLI, and it currently only fully supports HLSL shaders and partially supports OpenCL kernels.
Generating a live register analysis report for your kernel or shader:
As mentioned in the “Details of available commands” section above, in order to generate a live register analysis report, you need to make sure that your invocation command includes the following command line switches:
1. -- isa <arg> which instructs CodeXLAnalyzer to generate ISA disassembly for your kernel/shader
2. --livereg <arg> which instructs CodeXLAnalyzer to perform a live register analysis of the generated ISA disassembly
Usage examples:
Let’s break down the above command to understand its structure:
1. “-s cl” instructs CodeXLAnalyzer to work in OpenCL mode
2. “-c Fiji” sets Fiji as the target ASIC
3. “--kernel DCT” sets DCT as the target kernel (this is the kernel to be analyzed; it is defined in DCT_Kernels.cl, which is the last argument in the above command)
4. “--isa c:\output\.isa” instructs CodeXLAnalyzer to generate an ISA disassembly file and save it in c:\output with a “.isa” file extension. The output file name is generated automatically.
5. “--livereg “c:\output\livereg.txt” instructs CodeXLAnalyzer to perform live register analysis, save the report in c:\output, and use “livereg.txt” as the report file name’s suffix and extension.
After running the above command, we see the following output files in c:\output (our destination folder):
Fiji_DCT.isa
The live register analysis report file is Fiji_DCT_livereg.txt.
For HLSL, the usage is similar:
CodeXLAnalyzer.exe -s hlsl -c Fiji -f VSMain -p vs_5_0 --isa c:\temp\.txt --livereg c:\temp\lreg.txt c:\temp\dx\BasicHLSL11_VS.hlsl
1. “-s hlsl” instructs CodeXLAnalyzer to work in HLSL mode
2. “-c Fiji” sets Fiji as the target ASIC
3. “-f VSMain” sets VSMain as the target shader
4. “--isa c:\output\.isa” instructs CodeXLAnalyzer to generate an ISA disassembly file and save it in c:\output with a “.isa” file extension. The output file name is generated automatically.
5. “--livereg “c:\output\livereg.txt” instructs CodeXLAnalyzer to perform live register analysis, save the report in c:\output, and use “livereg.txt” as the report file name’s suffix and extension.
Report structure:
If you open up the live register analysis report file, you will see that it is a plain textual file. Each line in the file gives a snapshot of the register usage when the PC is at that specific ISA line. Each line in the report is of the following format:
<line number> | <number of live registers> | <list of registers + access type> | <ISA instruction>
Where:
1. <line number> is the number of the current ISA disassembly line
2. <number of live registers> is the number of live registers when the PC is at that ISA line
3. <list of registers + access type> is a list of n columns. Each column (except for the first one) refers to a register:
a. ‘^’ indicates a register is written to
b. ‘v’ indicates a register is read
c. ‘x’ is used for a register which is written and read
d. ‘:’ is used for register where the contents must be preserved across this instruction (live register)
e. A blank means that the register is not used
4. <ISA instruction> is the ISA disassembly of the relevant instruction
At the end of the report, you will find a summary in the following format:
Maximum # VGPR used <Max VGPR used>, # VGPR allocated: <Number of VGPR allocated>
Where:
1. <Max VGPR used> is the number of VGPRs actually used throughout the code
2. <Number of VGPR allocated> is the number of VGPRs that were allocated
Two things to remember when inspecting the live register analysis report are:
1. If the number of live registers is lower than the number of allocated registers, it indicates that the SC could reduce VGPRs without spilling by introducing moves.
2. If registers have a very long liveness range without read/write access, those registers could be likely spilled at low cost.
Here is a sample live register analysis report:
1 | 9 | ::::::: :: | label_basic_block_1: s_swappc_b64 s[2:3], s[2:3]
2 | 9 | ::::::: :: | s_andn2_b32 s0, s9, 0x3fff0000
3 | 9 | ::::::: :: | s_mov_b32 s1, s0
4 | 9 | ::::::: :: | s_mov_b32 s2, s10
5 | 9 | ::::::: :: | s_mov_b32 s3, s11
6 | 9 | ::::::: :: | s_mov_b32 s0, s8
7 | 9 | ::::::: :: | s_buffer_load_dwordx8 s[4:11], s[0:3], 0x00
8 | 9 | ::::::: :: | s_buffer_load_dwordx8 s[12:19], s[0:3], 0x20
9 | 9 | ::::::: :: | s_waitcnt lgkmcnt(0)
10 | 10 | ^ :::v::: :: | v_mul_f32 v0, s7, v7
11 | 11 | :^ :::v::: :: | v_mul_f32 v1, s11, v7
12 | 12 | ::^ :::v::: :: | v_mul_f32 v2, s15, v7
13 | 13 | :::^:::v::: :: | v_mul_f32 v3, s19, v7
14 | 12 | x:::::v ::: :: | v_mac_f32 v0, s6, v6
15 | 12 | :x::::v ::: :: | v_mac_f32 v1, s10, v6
16 | 12 | ::x:::v ::: :: | v_mac_f32 v2, s14, v6
17 | 12 | :::x::v ::: :: | v_mac_f32 v3, s18, v6
18 | 11 | x::::v ::: :: | v_mac_f32 v0, s5, v5
19 | 11 | :x:::v ::: :: | v_mac_f32 v1, s9, v5
20 | 11 | ::x::v ::: :: | v_mac_f32 v2, s13, v5
21 | 11 | :::x:v ::: :: | v_mac_f32 v3, s17, v5
22 | 10 | x:::v ::: :: | v_mac_f32 v0, s4, v4
23 | 10 | :x::v ::: :: | v_mac_f32 v1, s8, v4
24 | 10 | ::x:v ::: :: | v_mac_f32 v2, s12, v4
25 | 10 | :::xv ::: :: | v_mac_f32 v3, s16, v4
26 | 9 | vvvv ::: :: | exp pos0, v0, v1, v2, v3
27 | 5 | ::: :: | s_buffer_load_dwordx4 s[4:7], s[0:3], 0x40
28 | 5 | ::: :: | s_buffer_load_dwordx4 s[8:11], s[0:3], 0x50
29 | 5 | ::: :: | s_buffer_load_dwordx4 s[0:3], s[0:3], 0x60
30 | 5 | ::: :: | s_waitcnt expcnt(0)
31 | 6 | ^ ::v :: | v_mul_f32 v0, s6, v10
32 | 7 | :^ ::v :: | v_mul_f32 v1, s10, v10
33 | 8 | ::^ ::v :: | v_mul_f32 v2, s2, v10
34 | 7 | x:: :v :: | v_mac_f32 v0, s5, v9
35 | 7 | :x: :v :: | v_mac_f32 v1, s9, v9
36 | 7 | ::x :v :: | v_mac_f32 v2, s1, v9
37 | 6 | x:: v :: | v_mac_f32 v0, s4, v8
38 | 6 | :x: v :: | v_mac_f32 v1, s8, v8
39 | 6 | ::x v :: | v_mac_f32 v2, s0, v8
40 | 6 | :::^ :: | v_mov_b32 v3, 1.0
41 | 7 | ::::^ :: | v_mov_b32 v4, 0
42 | 7 | vvvv: :: | exp param0, v0, v1, v2, v3
43 | 4 | vv vv | exp param1, v12, v13, v4, v3
44 | 0 | | s_endpgm
Maximum # VGPR used 13, # VGPR allocated: 14