Cache Line Utilization

CodeXL

PreviousNext
CodeXL User Guide
Help > Using CodeXL > CPU Profiler > CPU Profile Configurations > Cache Line Utilization
Cache Line Utilization

This feature is a first step towards providing data-centric application profiling capabilities. This feature models the behavior of the processor L1 data cache and uses the Load and Store Instruction-Based Sampling (IBS) records to provide a measure of how efficiently an application utilizes the L1 data cache.

A cache is a relatively small amount of on-chip memory which is extremely fast compare to main memory. When the processor needs to access a location in main memory, it first checks whether a copy of that data is in the cache. If the data is present in cache, it is called a cache hit and the processor immediately accesses the data from cache. If the data is not in the cache, it is called a cache miss. In this case processor has to wait for the data to be fetched from main memory before it can continue to execute. All of the data required by all of the processes running on a processor cannot simultaneously fit in the cache, so the processor removes, or evicts, data from the cache when new data is needed and the cache is full. Data is transferred between memory and cache in blocks of fixed size, called cache lines.

The cache misses directly influence the performance of the application. Having the data in the cache when the processor needs it is one way to optimize performance of an application. Additionally, because cache size is small, it is desirable to fill the cache with data that will be used before it is evicted from the cache.

AMD processors have a separate instruction and data cache per core (L1 (Level 1) instruction and L1 data caches) as well as a unified L2 (per module) cache and L3 (per-chip) cache. However, CLU models the L1 data cache only. CLU measures how much of a cache line is used (read or written) before it is evicted from the cache. The percent cache line utilization is defined as percentage of number of bytes in the cache line had been accessed before the cache line has been evicted.

A low CLU value implies that the cache is being filled with data that is never or less accessed before it is evicted, implying cache capacity pressure, as well as main memory bandwidth pressure (reading data from main memory that is not accessed before being evicted).

A high usage percentage (CLU) means that the application is properly exploiting spatial and temporal locality of its data. Ideally, one would like to have 100 percent CLU. Practically speaking however, a good CLU is about 20 to 30 percent, primarily due to the sampling nature of the core in its collection of the load and store data.

Note: See the BIOS and Kernel Developer's Guide for AMD Family 10h Processors (order #31116) for detailed information about caches.

The following table describes the data that can be shown for each module, function, source and disassembly

Event

Description

Cache Line Utilization Percentage

The cache line utilization percentage for all cache lines on all cores accessed by this instruction / function / module.

Line Boundary Crossings

The number of accesses to the cache line that spanned two cache lines. This happens when an unaligned access is made that causes two cache lines to be touched.

Bytes/L1 Eviction

The number of bytes accessed between cache line evictions.

Accesses/L1 Eviction

The number of accesses (loads plus stores) to a cache line between evictions.

L1 Evictions

The number of times a cache line was evicted where this instruction depended on the data in the cache line.

Accesses

The total number of loads and stores samples for this instruction / function / module.

Bytes Accessed

The total number of bytes accessed by this instruction / function / module.