Generating and Interpreting CodeXLAnalyzer CLI’s Live Register Analysis Report

CodeXL

PreviousNext
CodeXL User Guide
Help > Using CodeXL > Static Analyzer > CodeXLAnalyzer Command Line Interface > Generating and Interpreting CodeXLAnalyzer CLI’s Live Register Analysis Report
Generating and Interpreting CodeXLAnalyzer CLI’s Live Register Analysis Report

Using CodeXLAnalyzer CLI’s live register analysis report, you can better understand the register usage of your HLSL shaders and OpenCL kernels throughout their execution. Live register analysis is a beta feature of CodeXLAnalyzer CLI, and it currently only fully supports HLSL shaders and partially supports OpenCL kernels.

 

Generating a live register analysis report for your kernel or shader:

As mentioned in the “Details of available commands” section above, in order to generate a live register analysis report, you need to make sure that your invocation command includes the following command line switches:

1.  -- isa <arg> which instructs CodeXLAnalyzer to generate ISA disassembly for your kernel/shader

2.  --livereg <arg> which instructs CodeXLAnalyzer to perform a live register analysis of the generated ISA disassembly

 

Usage examples:

CodeXLAnalyzer.exe –s cl –c Fiji --kernel DCT --isa c:\output\.isa --livereg c:\output\livereg.txt DCT_Kernels.cl

Let’s break down the above command to understand its structure:

1.       “-s cl” instructs CodeXLAnalyzer to work in OpenCL mode

2.       “-c Fiji” sets Fiji as the target ASIC

3.       “--kernel DCT” sets DCT as the target kernel (this is the kernel to be analyzed; it is defined in DCT_Kernels.cl, which is the last argument in the above command)

4.       “--isa c:\output\.isa” instructs CodeXLAnalyzer to generate an ISA disassembly file and save it in c:\output with a “.isa” file extension. The output file name is generated automatically.

5.       “--livereg “c:\output\livereg.txt” instructs CodeXLAnalyzer to perform live register analysis, save the report in c:\output, and use “livereg.txt” as the report file name’s suffix and extension.

 

After running the above command, we see the following output files in c:\output (our destination folder):

Fiji_DCT.isa

Fiji_DCT_livereg.txt

 

The live register analysis report file is Fiji_DCT_livereg.txt.

 

For HLSL, the usage is similar:

CodeXLAnalyzer.exe -s hlsl -c Fiji -f VSMain -p vs_5_0 --isa c:\temp\.txt --livereg c:\temp\lreg.txt c:\temp\dx\BasicHLSL11_VS.hlsl

1.       “-s hlsl” instructs CodeXLAnalyzer to work in HLSL mode

2.       “-c Fiji” sets Fiji as the target ASIC

3.       “-f VSMain” sets VSMain as the target shader

4.        “--isa c:\output\.isa” instructs CodeXLAnalyzer to generate an ISA disassembly file and save it in c:\output with a “.isa” file extension. The output file name is generated automatically.

5.       “--livereg “c:\output\livereg.txt” instructs CodeXLAnalyzer to perform live register analysis, save the report in c:\output, and use “livereg.txt” as the report file name’s suffix and extension.

 

Report structure:

If you open up the live register analysis report file, you will see that it is a plain textual file. Each line in the file gives a snapshot of the register usage when the PC is at that specific ISA line.  Each line in the report is of the following format:

<line number> | <number of live registers> | <list of registers + access type> | <ISA instruction>

Where:

1.       <line number> is the number of the current ISA disassembly line

2.       <number of live registers> is the number of live registers when the PC is at that ISA line

3.       <list of registers + access type> is a list of n columns. Each column (except for the first one) refers to a register:

a.       ‘^’ indicates a register is written to

b.      ‘v’ indicates a register is read

c.       ‘x’ is used for a register which is written and read

d.      ‘:’ is used for register where the contents must be preserved across this instruction (live register)

e.      A blank means that the register is not used

4.       <ISA instruction> is the ISA disassembly of the relevant instruction

At the end of the report, you will find a summary in the following format:

Maximum # VGPR used  <Max VGPR used>, # VGPR allocated:  <Number of VGPR allocated>

Where:

1.       <Max VGPR used> is the number of VGPRs actually used throughout the code

2.       <Number of VGPR allocated> is the number of VGPRs that were allocated

Two things to remember when inspecting the live register analysis report are:

1.     If the number of live registers is lower than the number of allocated registers, it indicates that the SC could reduce VGPRs without spilling by introducing moves.

2.     If registers have a very long liveness range without read/write access, those registers could be likely spilled at low cost.

Here is a sample live register analysis report:

 

    1 |   9 |     ::::::: :: | label_basic_block_1: s_swappc_b64 s[2:3], s[2:3]

    2 |   9 |     ::::::: :: | s_andn2_b32 s0, s9, 0x3fff0000

    3 |   9 |     ::::::: :: | s_mov_b32 s1, s0

    4 |   9 |     ::::::: :: | s_mov_b32 s2, s10

    5 |   9 |     ::::::: :: | s_mov_b32 s3, s11

    6 |   9 |     ::::::: :: | s_mov_b32 s0, s8

    7 |   9 |     ::::::: :: | s_buffer_load_dwordx8 s[4:11], s[0:3], 0x00

    8 |   9 |     ::::::: :: | s_buffer_load_dwordx8 s[12:19], s[0:3], 0x20

    9 |   9 |     ::::::: :: | s_waitcnt lgkmcnt(0)

   10 |  10 | ^   :::v::: :: | v_mul_f32 v0, s7, v7

   11 |  11 | :^  :::v::: :: | v_mul_f32 v1, s11, v7

   12 |  12 | ::^ :::v::: :: | v_mul_f32 v2, s15, v7

   13 |  13 | :::^:::v::: :: | v_mul_f32 v3, s19, v7

   14 |  12 | x:::::v ::: :: | v_mac_f32 v0, s6, v6

   15 |  12 | :x::::v ::: :: | v_mac_f32 v1, s10, v6

   16 |  12 | ::x:::v ::: :: | v_mac_f32 v2, s14, v6

   17 |  12 | :::x::v ::: :: | v_mac_f32 v3, s18, v6

   18 |  11 | x::::v  ::: :: | v_mac_f32 v0, s5, v5

   19 |  11 | :x:::v  ::: :: | v_mac_f32 v1, s9, v5

   20 |  11 | ::x::v  ::: :: | v_mac_f32 v2, s13, v5

   21 |  11 | :::x:v  ::: :: | v_mac_f32 v3, s17, v5

   22 |  10 | x:::v   ::: :: | v_mac_f32 v0, s4, v4

   23 |  10 | :x::v   ::: :: | v_mac_f32 v1, s8, v4

   24 |  10 | ::x:v   ::: :: | v_mac_f32 v2, s12, v4

   25 |  10 | :::xv   ::: :: | v_mac_f32 v3, s16, v4

   26 |   9 | vvvv    ::: :: | exp pos0, v0, v1, v2, v3

   27 |   5 |         ::: :: | s_buffer_load_dwordx4 s[4:7], s[0:3], 0x40

   28 |   5 |         ::: :: | s_buffer_load_dwordx4 s[8:11], s[0:3], 0x50

   29 |   5 |         ::: :: | s_buffer_load_dwordx4 s[0:3], s[0:3], 0x60

   30 |   5 |         ::: :: | s_waitcnt expcnt(0)

   31 |   6 | ^       ::v :: | v_mul_f32 v0, s6, v10

   32 |   7 | :^      ::v :: | v_mul_f32 v1, s10, v10

   33 |   8 | ::^     ::v :: | v_mul_f32 v2, s2, v10

   34 |   7 | x::     :v  :: | v_mac_f32 v0, s5, v9

   35 |   7 | :x:     :v  :: | v_mac_f32 v1, s9, v9

   36 |   7 | ::x     :v  :: | v_mac_f32 v2, s1, v9

   37 |   6 | x::     v   :: | v_mac_f32 v0, s4, v8

   38 |   6 | :x:     v   :: | v_mac_f32 v1, s8, v8

   39 |   6 | ::x     v   :: | v_mac_f32 v2, s0, v8

   40 |   6 | :::^        :: | v_mov_b32 v3, 1.0

   41 |   7 | ::::^       :: | v_mov_b32 v4, 0

   42 |   7 | vvvv:       :: | exp param0, v0, v1, v2, v3

   43 |   4 |    vv       vv | exp param1, v12, v13, v4, v3

   44 |   0 |                | s_endpgm

 

Maximum # VGPR used  13, # VGPR allocated:  14