Processor cache: types and principle of operation

The processor in the computer is one of the main components, without which nothing will work. Its task is to read information and transfer it to other components that are associated with the motherboard.

A processor consists of several elements, and a cache processor is one of them.

Cache memory

Cache memory

The processor cache is one of the components that affects performance, or rather, its volume, levels and speed. This parameter has long been used in the production of processors, which serves as a proof of its usefulness. That's what a cache is in simple words.

If you use a programming language, the cache is a memory with an ultrafast data exchange speed, the task of which is to store and transmit temporary information.

The triggers on which the processor cache is built consist entirely of transistors. However, transistors tend to occupy a large amount of space, in contrast to RAM, which consists of capacitors. In this regard, considerable difficulties arise that limit the amount of memory. Despite such a small amount, the processor cache is a pretty expensive option. But at the same time, such a structure has the same quality - speed. The triggers that underlie do not require regeneration, which means that they move from one position to another with minimal delay. It is this indicator that allows the processor cache to work at its frequencies.

Initially, the cache memory was located on the open spaces of the motherboard. Now the processor cache is located on the CPU itself, which significantly reduces the access time to it.

Destination

Memory levels

As previously described, the main task of the processor cache is to buffer data and store it temporarily. This is what gives good performance when using applications where necessary.

For a better description in simple words, what is a cache and its principle of operation, we can draw an analogy with the office. The RAM plays the role of a file rack where the accountant periodically comes to collect the necessary files, and the accountant’s desktop is a cache.

On the accountant’s desktop are things that he repeatedly contacts. These things are only on the table, as they require quick access to them. To these things are periodically added data that has been removed from the rack. When this data becomes unnecessary, it is put back on the shelf. This manipulation allows you to clear the cache, preparing it for new data.

Thus, it turns out that the central processor, before requesting data from RAM, checks its presence in the cache. This is what the cache is in simple words.

Memory levels

Instruction set

Most modern processors have several cache levels. Most often there are two or three of them: L1, L2 and L3 cache.

The first level of the cache has the property of quick access to the processor core, operating at the same frequencies as the processor. It also plays the role of a buffer between the processor and second-level computer memory.

L2 cache has more powerful data, which, unfortunately, reduces its speed. Its task is to ensure the transition from the first to the third level.

Since the speed drops with each level, the cache of the third-level microprocessor has an even slower access speed. However, its access speed is more productive, unlike standard RAM. In previous versions, a cache of different levels was located on its core, but the L3 cache is designed for the entire processor.

Independent

A cache device consists of several levels and categories. Microprocessors for servers and computers have three independent caches: with a set of instructions, with data and an associative translation buffer. That is why super-operative memory is divided into three levels.

Instruction set

Data cache

A set of cache instructions is needed to load machine code, but what is it? Machine code can be called the command system of a particular machine for computing, interpreted by the central processor of a given computer. Any program that was written in machine language is executed in binary code, machine instructions consist of this. This process can also be called "opcode" - binary code.

What does the instruction cache do? This type of cache is capable of performing only a specific task in the form of data operations. That is, the cache contains a certain set of instructions, each of which is engaged in its own "work". This can be a calculation, a transition from one data to another, or copying.

Each machine instruction contains two types of operations: simple and complex. When one of these operations is performed, it is decoded in an ordinal sequence for those devices to which it was intended.

Data cache

The data cache is designed to store information that the central processor requests much more often than from RAM. Due to the small amount of processor cache, only frequently requested information is stored there. However, the location of this kind of storage, that is, on the processor chip, allows you to reduce the request time, minimizing it.

Most modern processors use a cache size of up to 16 megabytes, but in processors designed for servers, the maximum processor cache reaches 20 megabytes or more.

Associative translation buffer

Associative translation buffer

This type of cache is used to speed up the process of transferring data from virtual memory to physical memory.

Associative memory has a fixed set of records. Each of these records stores information about the transfer of data from virtual memory to physical. In the absence of such information, the processor independently finds the path and leaves data about it, but it takes much more time than using already saved data.

Misses at work

Like cache types, misses also fall into three categories.

The first type is called a miss on reading instructions from the cache. This provokes a long delay, because the processor will take considerable time to load the necessary instructions from memory.

Reading from the data cache also has misses. Unlike the first case, a miss in reading data does not work so slowly, because other instructions that are not related to the request can continue their work. In turn, the requested resource will be processed in the main memory.

Writing to the data cache also does not do without misses. Missing a record does not take much time, as they can be put in turn. This makes it possible to work with other instructions without disturbing the overall process. A miss with a record with a full queue is the only obstacle in the normal operation of the central processor.

Varieties of misses

Varieties of misses

The first kind of miss, which is called Compulsory misses, appears only if the address is requested for the first time. Corrects a prefetch position that can be hardware or software.

The miss miss Capacity misses is caused due to the finite size of the cache, which does not depend on associative memory or line size. There are no concepts of a full or nearly full cache, because its lines are in a busy state. A new cache line can be created when any busy one is repaid.

Conflict misses is, as the name implies, a miss resulting from a conflict. This happens when the processor requests the data that the cache has already crowded out.

Address translation and its variations

Most processors that are installed on computers are based on a certain type of virtual memory. That is, any program running on the device recognizes its simplified address, which contains a unique code and data belonging exclusively to this program. A virtual address space is created so that each program can use it and not depend on the location in physical memory.

Due to the translation from virtual storage to physical (RAM), such manipulations are carried out at an incredible speed.

Address translation process:

  • The address generator sends a physical address to the memory management device, but after a whole few cycles. This feature is called Delay.
  • The “overlay effect" can be considered a process when several virtual addresses fall on one physical address. The processors play them in the specified order that the program controls. However, to perform this option, you will need to request a check for the presence of one instance of the copy in the cache.
  • The virtual address environment is divided into fixed size memory blocks, the beginning of which corresponds to the address with their sizes. This feature is called the "Display Unit".

Caches and their hierarchy

Caches and their hierarchy

The presence of several interacting caches is one of the criteria for most modern processors.

Processors that support concurrency of instructions gain access to information by the pipeline method: reading instructions, the process of translating to physical addresses of virtual ones, and reading instructions. The pipelined method of working facilitates the separation of tasks between three separate caches, so you can avoid conflicts with gaining access. This place in the hierarchy is called Specialized Caches, and processes with this feature have Harvard architecture.

Hit rate and delay are one of the main problems with working with super-operational memory. The fact is that the larger the cache and its hit percentage, the greater the delay. More often than not, in order to optimize performance and resolve this conflict, cache levels are used that facilitate buffering between each other.

The advantage of level systems is that they work in ascending order. First, the first level of computer memory, which is fast, but with a small amount, sets the speed of the processor at its frequencies. If you miss the first level, the processor accesses the cache of the second level, which has more volume, but less speed. This continues until the processor receives a response to the request from RAM. This position in the hierarchy is called Multilevel Caches.

Unique or exclusive caches have the ability to store data at only one particular level. An inclusive view can store information at several levels of super-operative memory, placing them by copying.

The hierarchy level, which is called "Cache Traces", excludes the operation of the decoder, as it contributes to the faster loading of instructions, reducing the heat transfer of the central processor. Its main feature is the ability to store decoded data. The stored instructions are divided into two groups: dynamic tracks and base blocks. In some cases, a dynamic track can be built on several base units, united in groups. Thus, a dynamic trace is capable of storing data of processed blocks.


All Articles