Connection reset by peer


I want to download data by wget. I have submitted a job (Slurm Job_id=27011396) and use the following code.

#! /bin/bash
#SBATCH -o download.out
#SBATCH -e download.err
#SBATCH --end of mail type
#SBATCH --partition long
#SBATCH --mem 250G
#SBATCH --cpus-per-task 10

wget -r --user='' --password=''**** --no-parent --auth-no-challenge

But it fails after 15 hours of running and prompts "Read error at byte 4160870916/8269259363 (Connection reset by peer)"
Is it because of the --mem setting? Because I'm downloading a huge dataset. Should I use the bigmem partition? Thanks a lot.


Dear Rui,

Use 250Go of Memory is clearly overkill...
Use 10 CPU only for a wget command is also clearly overkill...

You can use seff <jobid> command once your job finished to see your usage:

$ seff 27011396
Job ID: 27011396
Cluster: core
User/Group: rzhang/rzhang
State: FAILED (exit code 1)
Nodes: 1
Cores per node: 10
CPU Utilized: 02:39:58
CPU Efficiency: 1.75% of 6-08:07:00 core-walltime
Job Wall-clock time: 15:12:42
Memory Utilized: 6.81 MB
Memory Efficiency: 0.00% of 250.00 GB

Here you use 6 MB (you request 250 000 MB) and only a small part of one CPU (1.75% of 10 CPU).
wget is a simple tool to download and is not using a lot of ressources (the CPU is wating for data retrieving from Internet, and the memory is not really used).

When you have to download big amount of data, it can take a long time, and it's highly probable you will face a network issue (like "Connection reset by peer").
So we highly recommend to use tools like rsync to be able to resume on error.

wget have an option (-c) to continue downloading a file without restart from the beginning:

‘-c’ ‘--continue’
Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. 

I think you should retry with this option.

There is similar option with curl -C (an other tool to download data).
Maybe you can also try other tools (it depends on what protocols are available with the server) or parallelize transfer (but it's not always possible or simple).

Best regards

OK, thanks a lot