Why don’t cp and tar use fsync()?

September 13th, 2011 by lucas

I must admit that I’m a bit lost about the conclusions of the Don’t fear the fsync()! (lf.org down; google cache) debate. My understanding was that using fsync() was the right thing to do when we cared about data being written to disk.

When using cp or tar, I usually care about my data being written to the disk, so why don’t they use fsync()? Shouldn’t they?

13 Responses to “Why don’t cp and tar use fsync()?”

  1. Aigars Mahinovs wrote on 09/13/11 at 8:07 pm :

    I thought it was more about caring ‘when’ data will be actually save to the disk. 99% of the time I don’t really care if the cp or tar output is on disk or in cache on its way to the disk eventually. In fact if the tar and cp could do some vfs callback hoopup magic and make the files appear to be copied over and/or extracted and return back while the actuall is still transparently is happening in the background, I would tout that as a superbly wonderful timesaving feature.

  2. lucas wrote on 09/13/11 at 8:21 pm :

    @Aigars: note that this is what is actually happening. ;)

  3. Jon D wrote on 09/13/11 at 8:21 pm :

    I would argue that, the vast majority of the time, one doesn’t care about the data being immediately written to disk. Why do I care if my “cp” write is lazy and happens in the background? As long as the file data is available for my immediate use, I don’t generally care if it is on disk or going to be there in a few seconds. The only time that matters to me is during synchronization (two processes using a single file), shutdown, critical backups, etc. In those rare cases I can force a disk sync as part of my script, app, etc.

    If letting writes be lazy buys me some (even if small) performance improvements (not blocking queued reads, for instance), why do I care if it takes a few seconds for data to hit the disk?

  4. Bob Proulx wrote on 09/13/11 at 8:32 pm :

    Use of fsync is not about whether you care about your data. That would be like asking you to chose which child you love and which you do not. You care about all of your data, right? Using fsync is all about disabling the filesystem buffer cache. It should only be used in those very unusual circumstances when a power loss to your machine would produce a failure to produce a working state upon reboot. A normal cp or tar doesn’t need this as yanking the power cord out of the machine during the copy won’t prevent the machine from rebooting and operating normally afterward. Also if every command were to use fsync and disable the filesystem buffer cache then all systems would slow to disk drive speeds and slow to a crawl. That would be very bad. Even SSDs are not as fast as the ram in the filesystem buffer cache. Somehow we have survived all of these years acceptably well using filesystem buffer cache before this fsync debate started and programs started disabling it. According to the current fervor that wasn’t possible to have happened.

  5. Lucas wrote on 09/13/11 at 8:56 pm :

    @Bob: I can see where you draw the line. On the other hand, there are cases where I’d really like to make sure that the data written using tar is on disk when tar returns, and using sync() for that is way more expensive than fsync() if there are other things going on on the system.

    In the “Don’t fear the fsync!” article, Ted Tso wrote:

    All that is necessary is a kernel patch to allow laptop_mode to disable fsync() calls, since the kernel knows that it is in laptop_mode, and it notices that the disk has spun up, it will sync out everything to disk, since once the energy has been spent to spin up the hard drive, we might as well write everything in memory that needs to be written out right away. Hence, a patch which allows fsync() calls to be disabled while in laptop_mode should do pretty much everything Nate has asked. I need to check to see if laptop_mode does this already, but if it doesn’t force a file system commit when it detects that the hard drive has been spun up, it should obviously do this as well.

    Does someone know if this has been implemented?

  6. Michael wrote on 09/13/11 at 8:56 pm :

    It’s funny because if you want to use fsync on a file, I should also use fsync on parent(s) directories :
    – Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed.

  7. Lucas wrote on 09/13/11 at 9:02 pm :

    Ah, there was a thread about that on lkml. https://lkml.org/lkml/2011/5/19/228 (not implemented)

  8. Sami Liedes wrote on 09/13/11 at 11:08 pm :

    If you absolutely need to care about all your data that much, you can mount the entire filesystem in a synchronous mode. It’s a viable solution if you are OK with 1980ish performance but your data needs to be safe no matter what. Or if your application needs such safety guarantees AND performance, expect to be spending money in the 6-digit range on a mainframe.

    For us mere mortals, some pieces of software do sync their files by default. IIRC Changes to /etc/passwd are fsynced by the tools that do it, any reasonable mail server does it (that’s a core reason why servers serving mere small pieces of mail seem often incomprehensibly slow when you consider the hardware they run on) and I think emacs by default fsyncs a file when you save it.

    Just having fundamental tools like cp do an fsync() on every file would probably drop your system’s performance by a factor of 20. Disk seeks are very expensive, and in most cases you need at least two per file. If you don’t believe me, you can try mounting your file systems -o sync to see what it does to the performance :)

  9. Daniel wrote on 09/14/11 at 8:11 am :

    there is http://packages.debian.org/sid/eatmydata to disable fsync and friends

  10. David Schmitt wrote on 09/14/11 at 8:12 am :

    A much more interesting case than cp is mv. Especially across filesystems.

  11. Julian Andres Klode wrote on 09/14/11 at 10:23 am :

    Cp creates a copy, so I’d say that it does not make much sense to ensure that the copy is safe as you still have the source, and can just copy again in case of power loss.

  12. mirabilos wrote on 09/14/11 at 12:35 pm :

    Just call sync(1) afterwards. There’s no close-rename in here.

  13. LGB wrote on 09/15/11 at 1:35 pm :

    They shouldn’t, in my opinion. If you think that tar and cp needs to sync to the disk, because you expect data is written, then any software which writes to file(s) should do this too, as the expection is the same: they write data, so they should sync. But in this case, write cache should be dropped from kernel totally, since some people would expect that written data is physically written to the disk, so no need for cache at all in the kernel either (for writing, in this case). I think it’s much better theory not to do this, unless you really need this, like with databases etc, they can sync then, if they want, or the user can sync too etc, but why would it be useful to do this _always_? I guess this can be done anyway with mounting the filesystem in sync mode or so: problem solved, if a user wants this.