When you use the Linux command du
you get both the actual disk usage and the true size of the file or directory. We will explain why these values do not match.
Actual disk usage and true size
The size of a file and the space it takes up on your hard drive are rarely the same. Disk space is allocated in blocks. If the file is smaller than a block, it is still allocated a whole block because the filesystem does not have a smaller unit of real estate to use.
If the file’s size is not an exact multiple of the number of blocks, the space it uses on the hard drive must always be rounded up to the next whole block. For example, if a file is larger than two blocks but less than three, it still requires three blocks to store.
Two measurements are used depending on the file size. The first is the actual file size, which is the number of bytes of content that make up the file. The second is the effective file size on the hard drive. This is the number of file system blocks needed to store this file.
Example
Let’s look at a simple example. We will redirect one character to a file to create a small file:
echo "1"> geek.txt
geek.txt» in terminal window.’ width=»646″ height=»57″ src=»https://gadgetshelp.com/wp-content/uploads/images/htg/content/uploads/2019/12/30.png»>
Now we will use a long format list ls
to see the length of a file:
ls -l geek.txt
Length is the numeric value that follows the entries dave dave
which is two bytes. Why is it two bytes when we only sent one character to the file? Let’s take a look at what’s going on inside the file.
We will use the command hexdump
which will give us the exact number of bytes and allow us to «see» non-printable characters as hexadecimal values. We will also use the option -C
(canonical) to have the output output hexadecimal values in the output body, as well as their alphanumeric character equivalents:
hexdump -C geek.txt
The output shows us that starting at offset 00000000 in the file, there is a byte that contains the hex value 31 and one that contains the hex value 0A. The right side of the output displays these values as alphanumeric characters where possible.
The hexadecimal value 31 is used to represent the number one. The hexadecimal value 0A is used to represent the newline character, which cannot be displayed as an alphanumeric character, so it is displayed as a dot (.) instead. The newline character is added with echo
. Default echo
starts a new line after displaying the text it needs to write to the terminal window.
This is in line with the conclusion from ls
and is consistent with a file length of two bytes.
RELATED: How to use the ls command to list files and directories in Linux
Now we will use the command du
to view file size:
du geek.txt
It says size four, but four of what?
There are blocks and then there are blocks
When you report file block sizes, the size it uses depends on several factors. You can specify what block size it should use on the command line. If you don’t force du
to use a block of a certain size, it decides which one to use.
It first checks the following environment variables:
- DU_BLOCK_SIZE
- BLOCK SIZE
- BLOCK SIZE
If either exists, the block size is set, and du
stops checking. If none of these are set, the default for du
the block size is set to 1024 bytes. Unless the variable is set POSIXLY_CORRECT
name POSIXLY_CORRECT
. If so, then by default du
has a block size of 512 bytes.
So how do we know which one is being used? You can check every environment variable to solve it, but there is a faster way. Let’s compare the results with the block size that the file system uses.
To determine the block size that the file system is using, we will use the program tune2fs
. We will then use the option -l
(list of superblocks), pipe output via grep
and then output the lines containing the word «Block».
In this example, we will consider the file system of the first partition of the first hard drive, sda1
and we need to use sudo
:
sudo tune2fs -l / dev / sda1 | блок grep
The file system block size is 4096 bytes. If we divide this by the result we got from du
(four), it turns out that the block size du
default is 1024 bytes. Now we know a few important things.
First, we know that the smallest amount of file system resources that can be allocated to store a file is 4096 bytes. This means that even our tiny two-byte file takes up 4 KB on the hard drive.