Jay Taylor's notes

back to listing index

Make disk/disk copy slower

[web search]
Original source (unix.stackexchange.com)
Tags: linux command-line pv control-file-read-data-rate unix.stackexchange.com
Clipped on: 2019-08-06

If you are seeking to limit disk-to-disk copy speed in an effort to be "nice" to other I/O-bound processes in the system, you are probably better off taking advantage of the kernel's ability to tune I/O scheduling instead. Specifically, ionice can be used to ensure that your disk-to-disk copy process is scheduled I/O at a lower priority than regular processes. – Steven Monday Mar 1 '14 at 15:13
  • This is a classic XY problem question. You should instead ask about why your desktop becomes unresponsive when you copy files to a USB device. – Michael Hampton Mar 1 '14 at 18:50
  • Linux actually has ridiculously large I/O buffers these days. RAM sizes have grown faster that mass storage speeds. Maybe you could perform the copy using dd(1) and sync so that it would actually be synced periodically instead of being buffered? And pipe viewer (pv) has a rate limiting option. Something like cat file | pv -L 3k > outfile. Neither are the same as using cp(1), though. – ptman Mar 1 '14 at 18:56
  • @MichaelHampton, there are several unresolved topics on this issue on ArchLinux's forum, so I figured I'll try to cope with it in a different way, just to make it work. – antonone Mar 1 '14 at 22:02
  • @antonone But Unix.SE is not ArchLinux's forums. Someone here might have a solution. – Izkata Mar 2 '14 at 4:51
  • 21

    Thanks for your edit!

    This edit will be visible only to you until it is peer reviewed.

    You can throttle a pipe with pv -qL (or cstream -t provides similar functionality)

    tar -cf - . | pv -q -L 8192 | tar -C /your/usb -xvf -
    

    -q removes stderr progress reporting.

    The -L limit is in bytes.

    More about the --rate-limit/-L flag from the man pv:

    -L RATE, --rate-limit RATE
    
        Limit the transfer to a maximum of RATE bytes per second.
        A suffix of "k", "m", "g", or "t" can be added to denote
        kilobytes (*1024), megabytes, and so on.
    

    This answer originally pointed to throttle but that project is no longer available so has slipped out of some package systems.

    answered Mar 1 '14 at 22:07
    If cp can't be slowed down, then using a custom command is the only option I guess. – antonone Mar 2 '14 at 9:33
  • Sounds too complicated in comparison with the rsync – LinuxSecurityFreak Dec 15 '16 at 8:03
  • looks more complicated but more usable to me. Need to test a file lockingechanism and need slowing down copying down to some bytes/s which seems not possible with rsync. Ill give it a try and 'cat' a file through the throttle pipe – cljk Jul 18 at 11:56
  • @cljk updated to pv. thanks. – Matt Jul 31 at 4:49
  • 7

    If the ionice solution is not enough (whyever) and you really want to limit I/O to an absolute value there are several possibilities:

    1. the probably easiest: ssh. It has a built-in bandwidth limit. You would use e.g. tar (instead of cp) or scp (if that's good enough; I don't know how it handles symlinks and hard links) or rsync. These commands can pipe their data over ssh. In case of tar you write to /dev/stdout (or -) and pipe that into the ssh client which executes another tar on the "remote" side.

    2. elegant but not in the vanilla kernel (AFAIK): The device mapper target ioband. This, of course, works only if you can umount either the source or target volume.

    3. some self-written fun: grep "^write_bytes: " /proc/$PID/io gives you the amount of data a process has written. You could write a script which starts cp in the background, sleeps for e.g. 1/10th second, stops the background cp process (kill -STOP $PID), checks the amount which has been written (and read? about the same value in this case), calculates for how long cp must pause in order to take the average transfer rate down to the intended value, sleeps for that time, wakes up cp (kill -CONT $PID), and so on.

    Yes, normally i'm just using lftp to connect to localhost via scp, and limit the bandwich from there. – antonone Mar 1 '14 at 22:06
    21

    Instead of cp -a /foo /bar you can also use rsync and limit the bandwidth as you need.

    From the rsync's manual:

    --bwlimit=KBPS
    

    limit I/O bandwidth; KBytes per second

    So, the actuall command, also showing the progress, would look like this:

    rsync -av --bwlimit=100 --progress /foo /bar
    
    This sounds like a nice idea for copying old drives I don't want to beat up. – jeremyjjbrown Jun 7 '15 at 22:13
  • Doesn't work for reading from /dev/zero or /dev/random – cdosborn Jan 25 '16 at 21:23
  • rsync -a --bwlimit=1500 /source /destination works perfectly to copy giant folders at a 1,5 MB/s speed (which is a good trade off between avoiding any server slow down and not taking too much time) – lucaferrario Jul 28 '17 at 10:06
  • Sidenote: even while the man page might say that you can use letters for units, e.g. 20m, it is not supported on all platforms, so better stick to the KBytes notation. – Hubert Grzeskowiak Aug 22 '17 at 2:15
  • saved my day! cgroup cgexec -g ... cp /in /out was not working all the time (from terminal worked some times, from script never) and I have no idea why... – Aquarius Power Oct 29 '18 at 22:56
  • 2

    Lower the dirty page limit. The default limit is insane.

    Create /etc/sysctl.d/99-sysctl.conf with:

    vm.dirty_background_ratio = 3
    vm.dirty_ratio = 10
    

    Then run sysctl -p or reboot.

    What's happening is that data is being read faster than it can be written to the destination disk. When linux copies files, what it does is read them into RAM, then mark the pages as dirty for writing to the destination. Dirty pages cannot be swapped out. So if the source disk is faster than the destination disk and you're copying more data than you have free RAM, the copy operation will eat up all available RAM (or at least whatever the dirty page limit is, which could be more than the available RAM) and cause starvation as the dirty pages cannot be swapped out and clean pages get used and marked dirty as they are freed.

    Note that his will not completely solve the problem...what linux really needs is some way to arbitrate creation of dirty pages so a large transfer taking place does not eat up all available RAM/all allowed dirty pages.

    answered Dec 1 '16 at 1:41
    After enabling -o sync, my Internet is faster than write speed to this USB drive. What I don't understand is why kernel doesn't track how quickly cache pages are getting flushed, and schedule future flushes based on that. It's like it always goes full-speed, even if this poor drive can't keep up with the speed. But that's a topic for another question I guess. – antonone Mar 4 '14 at 11:07
    5

    Your problem is probably not with your computer, per se, it's probably fine. But that USB flash transition layer has a processor of its own that has to map out all of your writes to compensate for what could be as much as a 90% faulty flash chip, who knows? You flood it then you flood your buffers then you flood the whole bus, then you're stuck, man - after all, that's where all your stuff is. It may sound counter-intuitive but what you really need is blocking I/O - you need to let the FTL set the pace and then just keep up.

    (On hacking FTL microcontrollers: http://www.bunniestudios.com/blog/?p=3554)

    All of the above answers should work so this is more a "me too!" than anything else: I've totally been there, man. I solved my own issues with rsync's --bwlimit arg (2.5mbs seemed to be the sweet spot for a single, error-free run - anything more and I'd wind up with write-protect errors). rsync was especially suited to my purpose because I was working with entire filesystems - so there were a lot of files - and simply running rsync a second time would fix all of the first run's problems (which was necessary when I'd get impatient and try to ramp past 2.5mbs).

    Still, I guess that's not quite as practical for a single file. In your case you could just pipe to dd set to raw-write - you can handle any input that way, but only one target file at a time (though that single file could be an entire block device, of course).

    ## OBTAIN OPTIMAL IO VALUE FOR TARGET HOST DEV ##
    ## IT'S IMPORTANT THAT YOUR "bs" VALUE IS A MULTIPLE ##
    ## OF YOUR TARGET DEV'S SECTOR SIZE (USUALLY 512b) ##
    % bs=$(blockdev --getoptio /local/target/dev)
    
    ## START LISTENING; PIPE OUT ON INPUT ##
    % nc -l -p $PORT | lz4 |\ 
    ## PIPE THROUGH DECOMPRESSOR TO DD ## 
    >    dd bs=$bs of=/mnt/local/target.file \
    ## AND BE SURE DD'S FLAGS DECLARE RAW IO ##
    >        conv=fsync oflag=direct,sync,nocache
    
    ## OUR RECEIVER'S WAITING; DIAL REMOTE TO BEGIN ##
    % ssh user@remote.host <<-REMOTECMD
    ## JUST REVERSED; NO RAW IO FLAGS NEEDED HERE, THOUGH ## 
    >    dd if=/remote/source.file bs=$bs |\
    >    lz4 -9 | nc local.target.domain $PORT
    > REMOTECMD  
    

    You might find netcat to be a little faster than ssh for the data transport if you give it a shot. Anyway, the other ideas were already taken, so why not?

    [EDIT]: I noticed the mentions of lftp, scp, and ssh in the other post and thought we were talking about a remote copy. Local's a lot easier:

    % bs=$(blockdev --getoptio /local/target/dev)
    % dd if=/src/fi.le bs=$bs iflag=fullblock of=/tgt/fi.le \
    >    conv=fsync oflag=direct,sync,nocache
    

    [EDIT2]: Credit where it's due: just noticed ptman beat me to this by like five hours in the comments.

    Definitely you could tune $bs for performance here with a multiplier - but some filesystems might require it to be a multiple of the target fs's sectorsize so keep that in mind.

    On my machine, the flag is --getioopt, not --getoptio – Michael Mior May 9 '17 at 17:46
    2

    The problem is that the copy is filling up your memory with blocks "in flight," crowding out "useful" data. A known (and very hard to fix) bug in the Linux kernel handling of I/O to slow devices (USB in this case).

    Perhaps you can try to parcel out the copying, e.g. by a script like the following (proof-of-concept sketch, totally untested!):

    while true do
      dd if=infile of=outfile bs=4096 count=... seek=... skip=...
      sleep 5
    done
    

    adjusting seek and skip by count each round. Need to tune count so it doesn't fill up (too much) memory, and 5 to allow it to drain.

    answered Mar 2 '14 at 1:55
    is ionice works well in linux? i read it just "emulate" work and there is no real difference? +1 for links – Nick Nov 30 '15 at 16:51
  • @Nick When I've used it, it has behaved as expected. The process to which I applied ionice slowed significantly, an the other processes that needed I/O were able to perform as expected. With a moderate I/O load from other processes, I was able to effectively suspend a high I/O process by applying maximal 'niceness' as expected. Once there was no competing I/O, the ioniced process performed as normal. – BillThor Dec 2 '15 at 0:47
  • with the 400MB file I was copying from one HD to a SSD, the initial 10s it worked perfectly, then suddenly I saw I high IO load and had to wait for like 1minute machine frozen :/. I have the same problem with cgroup write io throttle where it works sometimes and others it wont work at all. – Aquarius Power Oct 29 '18 at 23:08
  • Your Answer

    community wiki

    Not the answer you're looking for? Browse other questions tagged or ask your own question.

    Hot Network Questions

    Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group.

    This site is not affiliated with Linus Torvalds or The Open Group in any way.