Understanding how dd works

Dan Track dan.track at gmail.com
Wed Jun 25 13:49:56 UTC 2008


On Wed, Jun 25, 2008 at 2:19 PM, Patrick O'Callaghan
<pocallaghan at gmail.com> wrote:
> On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:
>> Thanks for the heads up on this. If the data blocks don't have
>> anything written into them, then what data is written into them when
>> using dd? if I restore the dd image will the blocks then be in the
>> same state i.e unwritten to?
>>
>> Also following on from this if I create a file using dd let's say 2GB,
>> how does the filesystem know that all these blocks belong to the file
>> myfile.img, and where is the information stored to say that a block
>> has data written into it or not?
>
> It's important to understand that this has nothing to do with 'dd', it's
> simply how the Unix filesystem works, and since Linux is "culturally
> derived" from Unix, it does the same thing. You would see the same
> effect just by using 'cp' or even 'cat'.
>
> The basic points are these (I'm skating over a lot for clarity):
>
> 1) The system maintains a list of every physical disk block assigned to
> the file (thus one of the things the 'fsck' command checks is that every
> block in the filesystem is either assigned to a file or is on the free
> list).
>
> 2) When a process writes to a file it need not do so sequentially
> because the lseek(2) operation allows it to move it's "current position"
> in the file. Furthermore, it's permissible to move the pointer beyond
> the current end of the file. If a process does this by a large enough
> amount and then writes data, the intervening space may have no disk
> blocks assigned to it (depending on the distance moved and block
> alignment). This is called a 'hole'. Files with holes in them are called
> 'sparse'.
>
> 3) The system keeps a separate count of the logical size of the file.
> Because of the holes the logical size may be different from the physical
> size. "ls -l" shows the logical size. "du" shows the real physical size
> and may be different.
>
> 4) When a process tries to read from a hole, the system simply returns
> nulls for the corresponding bytes. However if a process writes nulls
> into a file, the system does *not* make any effort to detect them as a
> special case, so they are simply written as any other data and the
> system will allocate blocks to them. This happens when 'dd' (or 'cp' or
> 'cat') copies a file, so the resulting file can be larger than the
> original.
>
> Note that 'rsync --sparse' will preserve holes when it can.
>
> Note also that if you're not careful you can backup a file or even a
> filesystem that you can't restore because it's too big, especially if
> copying it to some medium (e.g. a tape drive or non-UNIX disk system)
> that can't handle sparse files.
>
> Hope this helps.
>
> poc


Hi Patrick,

Really appreciate the detailed explanation. It's a real eye opener.
Can you point me to any docs that I could read around this subject?

Thanks
Dan




More information about the fedora-list mailing list