Creating a large file without zeroing it

Posted on February 25, 2009 by lucas

Dear readers,

I’d like to use a large file (> 10 GB) as swap space.
The required steps are:

Create a file
mkswap $file
swapon $file

Now, how can I create the file? The obvious and fast solution is to create a file with holes:
dd if=/dev/zero of=foo bs=1M count=1 seek=10239
mkswap works, but swapon complains:
# swapon /tmp/foo swapon: Skipping file /tmp/foo - it appears to have holes.

Of course, I count just dd if=/dev/zero of=foo bs=1M count=10240, but that takes too long for me.

So, question: is there a way to tell the system: create a file that is 10GB big, don’t but fill it with zeros?

29 thoughts on “Creating a large file without zeroing it”

zozo says:

February 25, 2009 at 1:22 am

if=/dev/random
or /dev/mice (and move your pointer for long enough) or /dev/kmem or /dev/sda or whatever that is not zeros.
Jeff Schroeder says:

February 25, 2009 at 1:52 am

dd if=/dev/urandom of=foo bs=1M count=10240

/dev/random blocks when there is no more high quality entrophy. /dev/urandom is “pseudo-random” and never blocks.
klop says:

February 25, 2009 at 1:57 am

So /dev/urandom is faster than /dev/zero ? LOL :)
nikos says:

February 25, 2009 at 2:01 am

Depending on the filesystem where the file will reside, there may be fs-specific methods for what you want. For example, on xfs you would use xfs_mkfile -n for what you want, I think.

OTOH, when you run mkswap it is possible that the file will actually be written out to its full extent as the swapping mechanism may require a file that is static in terms of block allocation, during normal operation…
Joe Buck says:

February 25, 2009 at 2:43 am

Usually there’s no way for non-root to do what you want. Extending a file without zeroing it would give you access to whatever was written on the disk before.
stew says:

February 25, 2009 at 6:40 am

Joe Buck,

I’m sure he’d be fine with a root only way. If he’s planning on running swapon, I think we can assume he has root.
Tester says:

February 25, 2009 at 6:47 am

If you want to allocate a bunch of blocks, you either need to write something in there or to call the appropriate syscall (that at least XFS and ext4 implement).
Tincho says:

February 25, 2009 at 6:50 am

This might not be what you’re looking for, but when I want to create a big file quickly (or when I actually need it to be sparse) I just create a sparse file by truncating to the desired size:

$ perl -e '$file = shift; $size = shift; open(FOO, ">", $file) or die $!; truncate(FOO, $size) or die $!' foo 1024000 $ ls -ls foo 0 -rw-r--r-- 1 martin martin 1024000 2009-02-25 02:48 foo

As you see, it’s in fact using zero blocks until something is written.
Tincho says:

February 25, 2009 at 6:52 am

My bad, I didn’t read correctly the post. What you did with dd is more or less the same, and this obviously has holes in it.
Ken Bloom says:

February 25, 2009 at 6:59 am

Is there a reason why you can’t make a swap partition? That’s the fastest way.
Tim Bosse says:

February 25, 2009 at 8:40 am

I have been messing around with a few different ways to do what I think you are looking for. You want to create a sparse file would be my guess (something like qcow files in kvm).

Here is the way we all know:

$ time dd if=/dev/zero of=test-nosparse bs=1024k count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 29.3526 s, 36.6 MB/s
real 0m29.745s user 0m0.000s sys 0m5.272s

Here is the speed I think you were looking for.

$ time dd if=/dev/zero of=test-sparse bs=1024k count=0 seek=1024 0+0 records in 0+0 records out 0 bytes (0 B) copied, 3.5619e-05 s, 0.0 kB/s
real 0m0.005s user 0m0.004s sys 0m0.000s

So far the files look the same.

$ ls -l test-nosparse -rw-r--r-- 1 taim taim 1073741824 2009-02-25 01:24 test-nosparse
$ ls -l test-sparse -rw-r--r-- 1 taim taim 1073741824 2009-02-25 01:25 test-sparse

Here is where things are a bit weird. My guess is that ext3 by my hand uses something other than 1M for block allocation. Meaning that I actually need slightly more than 1G for 1G of actual data.

$ du -k test-nosparse 1049604 test-nosparse
$ du -k --apparent-size test-nosparse 1048576 test-nosparse

What’s this? It’s our sparse file. We are using exactly 1G because we are wrote out 0 bytes of actual data (tack on some filesystem reference data size).

$ du -k test-sparse 0 test-sparse
$ du -k --apparent-size test-sparse 1048576 test-sparse

I think it’s best to read up on the advantages and disadvantages of sparse files. I have run up against it a few times and if not cleaned up, you could have a heck of a time finding it later on when you need the space.
Lucas says:

February 25, 2009 at 9:03 am

Thank you all for your comments.

First, yes, /dev/urandom is not what I want. What I want is to avoid writing the whole file to disk.

Second, file with holes == sparse files. That’s what I initially tried, but mkswap doesn’t like it.

Third, I’m root anyway. What are the appropriate syscalls I could use to do that with xfs and ext4?

Fourth, creating a swap partition might be a solution, but, hey, it would kill all the fun. And for other reasons, I’d like to avoid changing the partitionning.
Lucas says:

February 25, 2009 at 9:05 am

Mmmh, xfs_mkfile doesn’t seem to do what I need: with -n, according to the description in the manpage, it just creates a sparse file.
Laurent Go says:

February 25, 2009 at 9:11 am

I tried using posix_fallocate function but it took 3 minutes to create the file. Maybe the fallocate syscall is what you’re looking for but seems I’m missing some header.
xaiki says:

February 25, 2009 at 9:19 am

There is a reason for not wanting a sparse file for swap. the best you can do is fallocate (or RESVSPACE (or something like that) in XFS terms). this will tell the filesystem to reserve blocks but not write anything in it (when paged it will 0 out though). it will save you time on creation, but any sync() or umount will (iirc) fill the file with 0s.
Lucas says:

February 25, 2009 at 9:26 am

Apparently, the needed syscall is fallocate(), but that syscall isn’t supported by ext3 (haven’t checked with ext4, but it’s likely to be according to http://lwn.net/Articles/317787/). When calling posix_fallocate, the libc emulates it using short writes at the end of each block:
pwrite(3, “”…, 1, 4095) = 1
pwrite(3, “”…, 1, 8191) = 1
pwrite(3, “”…, 1, 12287) = 1
pwrite(3, “”…, 1, 16383) = 1
pwrite(3, “”…, 1, 20479) = 1
pwrite(3, “”…, 1, 24575) = 1
ssam says:

February 25, 2009 at 11:25 am

#!/usr/bin/env python
fh = open(“bigfile”,’w’)
fh.seek(10*1024*1024*1024)
fh.write(‘hello’)
fh.close()

This makes a file that ls says is 10GB, though it does not seem to take up that much space according to df. It runs instantly so i am pretty sure its not doing much disk IO.
Lucas says:

February 25, 2009 at 11:31 am

@ssam: yes, but it creates a sparse file, aka a file with holes. This file will be rejected by swapon.
nikos says:

February 25, 2009 at 1:15 pm

If you don’t want a holey file then how about xfs_mkfile without the -n switch?
Lucas says:

February 25, 2009 at 1:48 pm

It requires an XFS filesystem, which I don’t have currently. I’ll make some tests (using XFS and ext4) later this week, or next week, and post a follow-up. (I’m too busy currently, unfortunately)
nikos says:

February 25, 2009 at 3:20 pm

Apparently ext4 supports fsallocate, otherwise there is also posix_fallocate.
Wouter Verhelst says:

February 25, 2009 at 6:57 pm

It’s not surprising that using a sparse file as a swap device is impossible. There are several reasons for that: first, you can make a sparse file of 1TB on a 1GB disk; this would get the memory manager in serious trouble when it allows an application to get 300GB of memory, but then figures out that it doesn’t actually have that. Second, allocating disk space for sparse files takes memory, something you don’t want on a swapout operation.

The way space allocation is done on most file systems is on a per-block basis (512 bytes). Most of the time spent in writing the 10GB of zeroes to disk is actually spent in calling the block allocator.

Filesytems which support extents (such as the already-mentioned XFS and ext4, but there are others) will allow you to allocate much space in a single system call. But you’ll still have to do them, and spend some time doing that.
Franklin says:

February 25, 2009 at 11:15 pm

#1 LVM is your friend.
#2 Don’t allocate all you disk space until you actually need it (like on server).
Aigars Mahinovs says:

February 26, 2009 at 7:10 am

The thing that swapon is complaining about is not about having a sparse file, but about having a fragmented file. To try to fix that, make one file, try to swapon, if that fails, make another file (without removing the first one) and try swapon the second one.
glandium says:

February 26, 2009 at 8:41 am

Why do you want to have 10+GB swap space anyways ? Your machine is going to be severely slow and unuseable before reaching that much swap space use (except if you have an application that eats tons of memory but doesn’t actually use it, i.e. leaking it).
Theodore Tso says:

February 27, 2009 at 4:23 pm

Yes, it will work on ext4. A convenient which makes this easy to use can be found here at http://sandeen.fedorapeople.org/utilities/fallocate.c. It was written by Eric Sandeen, a former XFS developer who now works for Red Hat, who has been a big help making sure ext4 will be ready for Fedora and Red Hat Enterprise Linux. (Well, I guess I shouldn’t call him a former XFS developer since he still contributes patches to XFS now and then, but he’s spending rather more time on ext4 these days.)

One warning about the program; it calls the fallocate system call directly, and it doesn’t quite have the right magic architecture-specific magic for certain architectures which have various restrictions on how arguments need to be passed to system calls. In particular, IIRC, I believe there will be issues on the s390 and powerpc architectures. The real right answer is to get fallocate into glibc; folks with pull into making glibc do the right thing, please talk to me.

Glibc does have posix_fallocate(), which implements the POSIX interface. posix_fallocate() is wired to use the fallocate system call, for sufficiently modern versions of glibc.

However, posix_fallocate() is probablematic for some applications; the problem is that for filesystems that don’t support fallocate(), posix_fallocate() will simulate it by writing all zeros to the file. However, this is not necessarily the right thing to do; there are some applications that want fallocate() for speed reasons, but if the filesystem doesn’t support it, they want to receive the ENOSPC error message, so they can try some other fallback — which might or might not involve writing all zero’s to the file.

The other shortcoming with posix_fallocate() is that it doesn’t support the FALLOC_FL_KEEP_SIZE flag. What this flag allows you to do is to allocate disk blocks to the file, but not to modify the i_size parameter. This allows you to allocate space for files such as log files and mail spool files so they will be contiguous on disk, but since i_size is not modified, programs that append to file won’t get confused, and tail -f will continue to work. For example, if you know that your log files are normally approximately 10 megs a day, you can fallocate 10 megabytes, and then the log file will be contiguous on disk, and the space is guaranteed to be there (since it is already allocated). When you compress the log file at the end of the day, if the log file ended up being slightly smaller than 10 megs, the extra blocks will be discarded when you compress the file, or if you like, you can explicitly trim away the excess using ftruncate().
lucas says:

March 3, 2009 at 12:09 pm

I just wrote a followup:
http://www.lucas-nussbaum.net/blog/?p=332

Summary: fallocate() works fine, but swapon() doesn’t if the file is created using fallocate().
Luca says:

March 6, 2009 at 12:23 pm

Just passed by here while searching for something entirely different.
I’m very amused by how no one seems to actually read your post before replying, at least in the beginning.
I guess it made you feel really special ;)

Sorry, no relevant input :)

Best,
Luca
Chad D. Kersey says:

June 21, 2011 at 1:10 am

Stumbled across this while looking for a solution to the exact same problem about 52 fortnights later. I never did figure out how to make a non-sparse file quickly, but I did figure out how to swap on a sparse file. My solution was swapping through a loopback device instead of using the file directly. Swapping is already slow, so I’m not concerned with the additional overhead:

dd bs=1M if=/dev/random of=swapfile seek=8192 count=0
mkswap swapfile
losetup /dev/loop0 #Use the first free loopback device
sudo swapon /dev/loop0

I haven’t actually seen what happens under heavy swap. (like if this can somehow lead to a freeze) Write a program to leak a lot of memory (remember, you have to write to a page before it’s actually allocated) and see how this performs. Then you’ll have written to all of your swap space and the file will no longer be sparse anyway.

-Chad

Comments are closed.