Subin-h /

The swap value in Linux has nothing to do with how much RAM is in use before the swap starts. This is a widespread and widespread mistake. We explain what it really is.

Destroying myths about swap

Swapping is a method in which data in random access memory (RAM) is written to a special location on the hard drive—the swap partition or swap file—to free up RAM.

Linux has a setting called the swappiness value. There is a lot of confusion about what controls this setting. The most common misdescription of paging is that it sets a threshold for RAM usage, and when the amount of RAM in use reaches that threshold, paging starts.

This is a fallacy that has been repeated so often that it has now gained wisdom. If (almost) everyone else will tell you that this is how swapping works, why should you believe us when we say it doesn’t?

Just. We are going to prove it.

Your RAM is divided into zones

Linux doesn’t consider your RAM to be one big homogeneous pool of memory. He believes that it is divided into a number of different regions called zones. Which zones are present on your computer depends on whether it is 32-bit or 64-bit. Here is a simplified description of the possible zones on an x86 computer.

  • Direct Memory Access (DMA) : This is 16 MB of memory. The zone got its name because a long time ago there were computers that could only do direct memory access in this area of ​​physical memory.
  • DMA 32 : Despite its name, Direct Memory Access 32 (DMA32) is a zone only found on 64-bit Linux. It’s 4 GB of memory. Linux running on 32-bit computers can only use DMA for that amount of RAM (unless they use a physical address extension (PAE) kernel), which is how the zone got its name. Although on 32-bit computers it is called HighMem.
  • Normal : On 64-bit computers, normal memory is all RAM above 4GB (roughly). On 32-bit machines, this is RAM from 16 MB to 896 MB.
  • highmem : This only exists on 32-bit Linux computers. This is all RAM above 896 MB, including RAM over 4 GB on reasonably large machines.

Meaning of PAGESIZE

RAM is allocated in pages that have a fixed size. This size is determined by the kernel at boot time, determining the architecture of the computer. A typical page size on a Linux machine is 4 KB.

You can see your page size with the command getconf :

  getconf PAGESIZE 

getconf PAGESIZE

Zones tied to nodes

Zones are tied to nodes. The nodes are connected to the central processing unit (CPU). The kernel will try to allocate memory for a process running on a CPU from the node associated with that CPU.

The concept of node-to-processor affinity allows mixed memory types to be installed on specialized multiprocessor computers using a non-uniform memory access architecture.

It’s all very high class. The average Linux computer will have one node called zero. All zones will belong to this node. To see the nodes and zones on your computer, take a look at the file /proc/buddyinfo . We will use less to do this:

  less / proc / buddyinfo 

less /proc/buddyinfo in terminal window

This is the output of the 64-bit computer that this article was researched on:

  Узел 0, зона DMA 1 1 1 0 2 1 1 0 1 1 3
 Узел 0, зона DMA32 2 67 58 19 8 3 3 1 1 1 17 

There is one node, node zero. This computer only has 2 GB of RAM, so there is no «normal» zone. There are only two zones, DMA and DMA32.

Each column represents the number of available pages of a particular size. For example, for the DMA32 zone, reading from the left:

  • 2 : there are 2 of 2 ^( 0 * PAGESIZE) memory fragments.
  • 67 : 67 out of 2 ^( one * PAGE_SIZE) memory fragments.
  • 58 : 58 out of 2 available ^( 2 * PAGESIZE) memory fragments.
  • And so on, until…
  • 17 : there are 17 out of 2 ^( 512 * PAGES) parts.

But really, the only reason we look at this information is to see the relationship between nodes and zones.

File pages and anonymous pages

Memory mapping uses page table entries to record which memory pages are being used and for what.

Memory mappings can be:

  • Backup file : File-backed mappings contain data that has been read from a file. It can be any file. It is important to note that if the system freed this memory and needed to retrieve this data again, it could be read from the file again. But if the data has been changed in memory, those changes must be written to a file on the hard disk before the memory is freed. If it doesn’t, the changes will be lost.
  • Anonymous : Anonymous memory is a memory mapping without a file or device supporting it. These pages may contain RAM requested by programs for storing data, or for things like the stack and heap. Since there are no files behind this type of data, a special place must be allocated to store anonymous data. This location is the swap partition or swap file. Anonymous data is written for exchange before the anonymous pages are released.
  • Reserve copying devices : Devices are processed through block device files, which can be thought of as . Data can be read and written to them. The memory mapping supported by the device stores data from the device.
  • General access : Multiple page table entries can be displayed on the same RAM page. Accessing the memory locations through any of the displays will show the same data. Different processes can communicate with each other very efficiently by modifying the data in these shared memory locations. Shared mapping entries are a common means of achieving high-performance inter-process communications.
  • Copy on Write : Copy-on-write is a lazy allocation technique. If a copy of a resource already in memory is requested, the request is satisfied by returning a mapping to the original resource. If one of the processes «sharing» a resource tries to write to it, the resource must actually be replicated in memory so that changes can be made to the new copy. Thus, memory allocation occurs only on the first write command.

For replacement, we only need the first two lists: file pages and anonymous pages.


Here is a description of the permutation from the Linux documentation on GitHub:

"This control is used to define how aggressive (sic) the kernel will swap memory pages. Higher values will increase aggressiveness, lower values decrease the amount of swap. A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.

The default value is 60."

It sounds like swapping is getting more intense. Interestingly, it says that setting swapping to zero does not disable swap. It tells the kernel not to swap until certain conditions are met. But the exchange can still take place.

Let’s dig deeper. Here is the definition and default value vm_swappiness in the kernel source file vmscan.c :

* From 0 .. 100. Higher means more swappy.
int vm_swappiness = 60;

The swap value can range from 0 to 100. Again, the comment certainly sounds like the swap value affects the swap volume, with a higher number leading to more swap.

Further in the source code file, we see that a new variable called swappiness the value returned by the function is assigned mem_cgroup_swappiness() . More vm_swappiness tracing through the source code will show that the value returned by this function is vm_swappiness . So now the variable swappiness set to any value vm_swappiness .

int swappiness = mem_cgroup_swappiness(memcg);

And a little further in the same source code file, we see this:

* With swappiness at 100, anonymous and file have the same priority.
* This scanning priority is essentially the inverse of IO cost.
anon_prio = swappiness;
file_prio = 200 - anon_prio;

It is interesting. Two distinct values ​​are derived from swappiness . anon_prio and file_prio contain these values. When one increases, the other decreases, and vice versa .

The swap value in Linux actually sets ratio between two values.

golden ratio

File pages contain data that can be easily retrieved if this memory is freed. Linux can just read the file again. As we have seen, if a file’s data has been modified in RAM, those changes must be written to the file before the file’s page can be freed. But, one way or another, the file page in RAM can be overwhelmed by reading data from the file. So why add these pages to the swap partition or swap file? If you need this data again, you can also read it back from the original file instead of having a redundant copy in swap space. Thus, the pages of the file are not stored in the swap partition. They are «stored» back in the original file.

With anonymous pages, there is no master file associated with in-memory values. The values ​​on these pages were obtained dynamically. You can’t just read them back from a file. The only way to recover anonymous page memory values ​​is to save the data somewhere until the memory is freed. And that’s what keeps the swap. Anonymous pages that you will link to again.

But note that for both file pages and anonymous pages, freeing memory may require writing to the hard drive. If the file page data or anonymous page data has changed since it was last written to the file or to be replaced, a write to the file system is required. Retrieving data will require reading the file system. Both types of page recovery are expensive. Trying to reduce hard disk I/O by minimizing anonymous page swapping only increases the amount of hard disk I/O required to process file pages written to and read from files.

As you can see from the last code snippet, there are two variables. One is called file_prio for «file priority» and the other is called anon_prio for «priority anonymous».

These variables contain values ​​that work in tandem. If they are both set to 100, they are equal. For any other values anon_prio decreases from 100 to 0, and file_prio will increase from 100 to 200. Two values file_prio into a complex algorithm that determines whether the Linux kernel is running with a preference for restoring (deallocating) file pages or anonymous pages.

you can think about file_prio how about the readiness of the system to free file pages, and about anon_prio — about the readiness of the system to release anonymous pages. What these values ​​don’t do is set some kind of trigger or threshold, when swap will be used. This is decided elsewhere.

But when memory needs to be deallocated, these two variables—and their ratio—are taken into account by the rebuild and swap algorithms to determine which types of pages should be considered for deallocation. And it dictates whether the associated hard disk activity will handle files for file pages or swap space for anonymous pages.

When is swap actually enabled?

We have established that the swap value in Linux sets the preferred type of memory pages to be scanned for possible recovery. It’s good, but something must decide when the swap is going to work.

Each memory zone has an upper and lower mark. These are system values. These are the percentages of RAM in each zone. It is these values ​​that are used as swap trigger thresholds.

To check what you have /proc/zoneinfo and lower water marks, look at the file /proc/zoneinfo with this command:

  меньше / proc / zoneinfo 

less /proc/zoneinfo in terminal window

Each of the zones will have a set of memory values ​​measured in pages. Here are the values ​​for the DMA32 zone on the test machine. The low mark is 13966 pages and the high mark is 16759 pages:

DMA32 watermark values ​​in terminal window

So you can see that you can’t use the Linux swap value to influence swap behavior with regards to RAM usage. It just doesn’t work that way.

What should Swapiness be set to?

It depends on the hardware, workload, type of hard drive, and whether your computer is a desktop or a server. Obviously, it won’t be one size fits all types of settings.

And you should keep in mind that swap is not just used as a mechanism to free up RAM when you run out of free memory space. Swap is an important part of a well-functioning system, and without it, it becomes very difficult for Linux to achieve reasonable memory management.

Changing the swap value in Linux has an immediate effect; you don’t need to reboot. This way you can make small adjustments and control the effects. Ideally, you should do this over several days with a variety of activities on your computer to try and find the most ideal setting for you.

Here are some points to consider:

How to set the swap value in Linux

Before changing the swap value, you need to know its current value. If you want to reduce it a little, the question is a little less than what? You can find out with this command:

  cat / proc / sys / vm / swappiness 

cat /proc/sys/vm/swappiness

To adjust the swap value, use the command sysctl :

  sudo sysctl vm.swappiness = 45 

sudo sysctl vm.swappiness=45 in terminal window

The new value is used immediately, no reboot is required.

In fact, if you restart your computer, the swap value will revert back to the default value of 60. When you’re done experimenting and have decided on the new value you want to use, you can make it persistent across reboots by adding it to /etc/sysctl.conf File /etc/sysctl.conf . You can use any editor you prefer. Use the following command to edit the file with an editor nano :

  sudo nano /etc/sysctl.conf 

sudo nano /etc/sysctl.conf in a terminal window

When will it open nano scroll to the end of the file and add this line. We use 35 as the constant exchange value. You must replace the value you wish to use.

  vm.swappiness = 35 

/Etc/sysctl.conf in the nano editor in a terminal window

To save changes and exit nano press «Ctrl + O», press «Enter» and press «Ctrl + Z».

Memory management is complex

Memory management is tricky. And that’s why it’s usually better for the average user to leave it to the kernel.

It’s easy to think that you are using more RAM than you are. Utilities such as top and free may give the wrong impression. Linux will use free RAM for various purposes, such as disk caching. This artificially increases the «used» memory and reduces the «free» memory. In fact, RAM used as a disk cache is marked «used» and «available» because it can be reclaimed at any time, very quickly.

To the uninitiated, it may seem that the swap is not working, or that the swap value needs to be changed.

As always, the devil is in the details. Or, in this case, a demon. Core exchange daemon.

Похожие записи