Creating a large file without zeroing it: update

Given the large number of comments I got (26!), I feel obliged to post a summary of what was said.

First, the problem:
I want to create a large file (let’s say 10 GB) to use as swap space. This file can’t be a sparse file (a file with holes, see wikipedia if you don’t know about sparse files).
Since I’m going to mkswap it, I don’t care about the data that is actually in that file after creating it. The stupid way (but only solution on ext3) to create it is to fill it with zeroes, with is very inefficient.

Theodore Tso provided more information in a comment, which I’m copying here:

Yes, it will work on ext4. A convenient which makes this easy to use can be found here at http://sandeen.fedorapeople.org/utilities/fallocate.c. It was written by Eric Sandeen, a former XFS developer who now works for Red Hat, who has been a big help making sure ext4 will be ready for Fedora and Red Hat Enterprise Linux. (Well, I guess I shouldn’t call him a former XFS developer since he still contributes patches to XFS now and then, but he’s spending rather more time on ext4 these days.)

One warning about the program; it calls the fallocate system call directly, and it doesn’t quite have the right magic architecture-specific magic for certain architectures which have various restrictions on how arguments need to be passed to system calls. In particular, IIRC, I believe there will be issues on the s390 and powerpc architectures. The real right answer is to get fallocate into glibc; folks with pull into making glibc do the right thing, please talk to me.

Glibc does have posix_fallocate(), which implements the POSIX interface. posix_fallocate() is wired to use the fallocate system call, for sufficiently modern versions of glibc.

However, posix_fallocate() is probablematic for some applications; the problem is that for filesystems that don’t support fallocate(), posix_fallocate() will simulate it by writing all zeros to the file. However, this is not necessarily the right thing to do; there are some applications that want fallocate() for speed reasons, but if the filesystem doesn’t support it, they want to receive the ENOSPC error message, so they can try some other fallback — which might or might not involve writing all zero’s to the file.

The other shortcoming with posix_fallocate() is that it doesn’t support the FALLOC_FL_KEEP_SIZE flag. What this flag allows you to do is to allocate disk blocks to the file, but not to modify the i_size parameter. This allows you to allocate space for files such as log files and mail spool files so they will be contiguous on disk, but since i_size is not modified, programs that append to file won’t get confused, and tail -f will continue to work. For example, if you know that your log files are normally approximately 10 megs a day, you can fallocate 10 megabytes, and then the log file will be contiguous on disk, and the space is guaranteed to be there (since it is already allocated). When you compress the log file at the end of the day, if the log file ended up being slightly smaller than 10 megs, the extra blocks will be discarded when you compress the file, or if you like, you can explicitly trim away the excess using ftruncate().

fallocate works fine: creating a 20 GB file is almost immediate. Also, syncing or umounting the filesystem is also immediate, and reading the file returns only zeros. I’m not sure how it is implemented, but it looks nice :-). However, it still doesn’t solve my initial problem: mkswap works, but not swapon:

:/tmp# touch tmp
:/tmp# /root/fallocate -l 10g tmp
:/tmp# ls -lh tmp 
-rw-r--r-- 1 root root 10G Mar  3 11:01 tmp
:/tmp# du tmp 
10485764	tmp
:/tmp# mkswap tmp 
Setting up swapspace version 1, size = 10737414 kB
no label, UUID=a316ce8e-cf33-412b-8dc0-e10d9f2ebdbb
:/tmp# strace swapon tmp
[...]
swapon("/tmp/tmp")                      = -1 EINVAL (Invalid argument)
write(2, "swapon: tmp: Invalid argument\n"..., 30swapon: tmp: Invalid argument
) = 30
exit_group(-1)

(swapon works fine if the file is created normally — without using fallocate()).

Any other ideas?

10 thoughts on “Creating a large file without zeroing it: update

  1. this should work on ext3:


    #define _FILE_OFFSET_BITS 64
    #include

    int
    main() {
    FILE *f;
    f = fopen( "/tmp/tmp", "w+");
    unsigned long long o=10;
    o *= 1024*1024*1024;
    fseeko( f, o, SEEK_SET);
    fputc( '0', f);
    fclose( f);
    return 0;
    }

  2. @Sylvain: yes, /tmp is my ext4 partition.
    @glandium: You can use files to swap. swapon is supposed to work on files.

  3. Don’t know if it can change something but you can use directly “ftruncate” with your size. However, I think it is less efficient than fallocate.

    Maybe the problem you have with swapon is that fallocate on ext4 does too much black magic to allow using it as a block device.

    You can try:
    – ftruncate + size (you can also extend file with ftruncate)
    – posix_fallocate sur ext3 (on retombe sur quelques chose de proche de ftruncate)
    – posix_fallocate sur ext4

    Maybe one of this combination can prevent the swapon problem…

  4. sorry, i missuderstood problem, and in a hurry to reply.

    btw, ftruncate/truncate also create sparse file, if used to enlarge file.

  5. Hm.

    You did state that reading the fallocated file returns
    NUL bytes. Maybe there is a special magic in the filesy-
    stem driver to do so to prevent you from gaining access
    to the previous disc content? (Considering the non-root
    fallocate case.)

    I think a kernel patch to ask the VFS for the block num-
    bers and use them for swapping, overriding the above ac-
    cess control, is in order.

  6. The filesystem does indeed keep track of which blocks have been actually written to (seperate from which are allocated) in order to ensure that it can hide old data from userspace. However this also means that writing to the file will require a filesystem metadata update, which is against the rules for swap files.

    You options are basically one of:
    a) Just zero out the file
    b) Patch the kernel to allow root to allocate without zeroing
    c) On an unmounted fs, use libext2fs calls to allocate the file
    d) (maybe) use a loopback device over the file – but I’m not sure if this is safe from deadlocks under low memory conditions

  7. Heya i am for the primary time here. I came across this board and I find It really helpful & it helped me out much. I’m hoping to offer something again and aid others such as you helped me.

Comments are closed.