wget: command line download tool

Posted by kiwi_uk on Sun, 13 Feb 2022 05:13:14 +0100

1, Grammar

onlylove@ubuntu:~$ wget -help
GNU Wget 1.20.3, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version                   display the version of Wget and exit
  -h,  --help                      print this help
  -b,  --background                go to background after startup
  -e,  --execute=COMMAND           execute a `.wgetrc'-style command

Logging and input file:
  -o,  --output-file=FILE          log messages to FILE
  -a,  --append-output=FILE        append messages to FILE
  -d,  --debug                     print lots of debugging information
  -q,  --quiet                     quiet (no output)
  -v,  --verbose                   be verbose (this is the default)
  -nv, --no-verbose                turn off verboseness, without being quiet
       --report-speed=TYPE         output bandwidth as TYPE.  TYPE can be bits
  -i,  --input-file=FILE           download URLs found in local or external FILE
  -F,  --force-html                treat input file as HTML
  -B,  --base=URL                  resolves HTML input-file links (-i -F)
                                     relative to URL
       --config=FILE               specify config file to use
       --no-config                 do not read any config file
       --rejected-log=FILE         log reasons for URL rejection to FILE

Download:
  -t,  --tries=NUMBER              set number of retries to NUMBER (0 unlimits)
       --retry-connrefused         retry even if connection is refused
       --retry-on-http-error=ERRORS    comma-separated list of HTTP errors to retry
  -O,  --output-document=FILE      write documents to FILE
  -nc, --no-clobber                skip downloads that would download to
                                     existing files (overwriting them)
       --no-netrc                  don't try to obtain credentials from .netrc
  -c,  --continue                  resume getting a partially-downloaded file
       --start-pos=OFFSET          start downloading from zero-based position OFFSET
       --progress=TYPE             select progress gauge type
       --show-progress             display the progress bar in any verbosity mode
  -N,  --timestamping              don't re-retrieve files unless newer than
                                     local
       --no-if-modified-since      don't use conditional if-modified-since get
                                     requests in timestamping mode
       --no-use-server-timestamps  don't set the local file's timestamp by
                                     the one on the server
  -S,  --server-response           print server response
       --spider                    don't download anything
  -T,  --timeout=SECONDS           set all timeout values to SECONDS
       --dns-timeout=SECS          set the DNS lookup timeout to SECS
       --connect-timeout=SECS      set the connect timeout to SECS
       --read-timeout=SECS         set the read timeout to SECS
  -w,  --wait=SECONDS              wait SECONDS between retrievals
       --waitretry=SECONDS         wait 1..SECONDS between retries of a retrieval
       --random-wait               wait from 0.5*WAIT...1.5*WAIT secs between retrievals
       --no-proxy                  explicitly turn off proxy
  -Q,  --quota=NUMBER              set retrieval quota to NUMBER
       --bind-address=ADDRESS      bind to ADDRESS (hostname or IP) on local host
       --limit-rate=RATE           limit download rate to RATE
       --no-dns-cache              disable caching DNS lookups
       --restrict-file-names=OS    restrict chars in file names to ones OS allows
       --ignore-case               ignore case when matching files/directories
  -4,  --inet4-only                connect only to IPv4 addresses
  -6,  --inet6-only                connect only to IPv6 addresses
       --prefer-family=FAMILY      connect first to addresses of specified family,
                                     one of IPv6, IPv4, or none
       --user=USER                 set both ftp and http user to USER
       --password=PASS             set both ftp and http password to PASS
       --ask-password              prompt for passwords
       --use-askpass=COMMAND       specify credential handler for requesting 
                                     username and password.  If no COMMAND is 
                                     specified the WGET_ASKPASS or the SSH_ASKPASS 
                                     environment variable is used.
       --no-iri                    turn off IRI support
       --local-encoding=ENC        use ENC as the local encoding for IRIs
       --remote-encoding=ENC       use ENC as the default remote encoding
       --unlink                    remove file before clobber
       --xattr                     turn on storage of metadata in extended file attributes

Directories:
  -nd, --no-directories            don't create directories
  -x,  --force-directories         force creation of directories
  -nH, --no-host-directories       don't create host directories
       --protocol-directories      use protocol name in directories
  -P,  --directory-prefix=PREFIX   save files to PREFIX/..
       --cut-dirs=NUMBER           ignore NUMBER remote directory components

HTTP options:
       --http-user=USER            set http user to USER
       --http-password=PASS        set http password to PASS
       --no-cache                  disallow server-cached data
       --default-page=NAME         change the default page name (normally
                                     this is 'index.html'.)
  -E,  --adjust-extension          save HTML/CSS documents with proper extensions
       --ignore-length             ignore 'Content-Length' header field
       --header=STRING             insert STRING among the headers
       --compression=TYPE          choose compression, one of auto, gzip and none. (default: none)
       --max-redirect              maximum redirections allowed per page
       --proxy-user=USER           set USER as proxy username
       --proxy-password=PASS       set PASS as proxy password
       --referer=URL               include 'Referer: URL' header in HTTP request
       --save-headers              save the HTTP headers to file
  -U,  --user-agent=AGENT          identify as AGENT instead of Wget/VERSION
       --no-http-keep-alive        disable HTTP keep-alive (persistent connections)
       --no-cookies                don't use cookies
       --load-cookies=FILE         load cookies from FILE before session
       --save-cookies=FILE         save cookies to FILE after session
       --keep-session-cookies      load and save session (non-permanent) cookies
       --post-data=STRING          use the POST method; send STRING as the data
       --post-file=FILE            use the POST method; send contents of FILE
       --method=HTTPMethod         use method "HTTPMethod" in the request
       --body-data=STRING          send STRING as data. --method MUST be set
       --body-file=FILE            send contents of FILE. --method MUST be set
       --content-disposition       honor the Content-Disposition header when
                                     choosing local file names (EXPERIMENTAL)
       --content-on-error          output the received content on server errors
       --auth-no-challenge         send Basic HTTP authentication information
                                     without first waiting for the server's
                                     challenge

HTTPS (SSL/TLS) options:
       --secure-protocol=PR        choose secure protocol, one of auto, SSLv2,
                                     SSLv3, TLSv1, TLSv1_1, TLSv1_2 and PFS
       --https-only                only follow secure HTTPS links
       --no-check-certificate      don't validate the server's certificate
       --certificate=FILE          client certificate file
       --certificate-type=TYPE     client certificate type, PEM or DER
       --private-key=FILE          private key file
       --private-key-type=TYPE     private key type, PEM or DER
       --ca-certificate=FILE       file with the bundle of CAs
       --ca-directory=DIR          directory where hash list of CAs is stored
       --crl-file=FILE             file with bundle of CRLs
       --pinnedpubkey=FILE/HASHES  Public key (PEM/DER) file, or any number
                                   of base64 encoded sha256 hashes preceded by
                                   'sha256//' and separated by ';', to verify
                                   peer against
       --random-file=FILE          file with random data for seeding the SSL PRNG

       --ciphers=STR           Set the priority string (GnuTLS) or cipher list string (OpenSSL) directly.
                                   Use with care. This option overrides --secure-protocol.
                                   The format and syntax of this string depend on the specific SSL/TLS engine.
HSTS options:
       --no-hsts                   disable HSTS
       --hsts-file                 path of HSTS database (will override default)

FTP options:
       --ftp-user=USER             set ftp user to USER
       --ftp-password=PASS         set ftp password to PASS
       --no-remove-listing         don't remove '.listing' files
       --no-glob                   turn off FTP file name globbing
       --no-passive-ftp            disable the "passive" transfer mode
       --preserve-permissions      preserve remote file permissions
       --retr-symlinks             when recursing, get linked-to files (not dir)

FTPS options:
       --ftps-implicit                 use implicit FTPS (default port is 990)
       --ftps-resume-ssl               resume the SSL/TLS session started in the control connection when
                                         opening a data connection
       --ftps-clear-data-connection    cipher the control channel only; all the data will be in plaintext
       --ftps-fallback-to-ftp          fall back to FTP if FTPS is not supported in the target server
WARC options:
       --warc-file=FILENAME        save request/response data to a .warc.gz file
       --warc-header=STRING        insert STRING into the warcinfo record
       --warc-max-size=NUMBER      set maximum size of WARC files to NUMBER
       --warc-cdx                  write CDX index files
       --warc-dedup=FILENAME       do not store records listed in this CDX file
       --no-warc-compression       do not compress WARC files with GZIP
       --no-warc-digests           do not calculate SHA1 digests
       --no-warc-keep-log          do not store the log file in a WARC record
       --warc-tempdir=DIRECTORY    location for temporary files created by the
                                     WARC writer

Recursive download:
  -r,  --recursive                 specify recursive download
  -l,  --level=NUMBER              maximum recursion depth (inf or 0 for infinite)
       --delete-after              delete files locally after downloading them
  -k,  --convert-links             make links in downloaded HTML or CSS point to
                                     local files
       --convert-file-only         convert the file part of the URLs only (usually known as the basename)
       --backups=N                 before writing file X, rotate up to N backup files
  -K,  --backup-converted          before converting file X, back up as X.orig
  -m,  --mirror                    shortcut for -N -r -l inf --no-remove-listing
  -p,  --page-requisites           get all images, etc. needed to display HTML page
       --strict-comments           turn on strict (SGML) handling of HTML comments

Recursive accept/reject:
  -A,  --accept=LIST               comma-separated list of accepted extensions
  -R,  --reject=LIST               comma-separated list of rejected extensions
       --accept-regex=REGEX        regex matching accepted URLs
       --reject-regex=REGEX        regex matching rejected URLs
       --regex-type=TYPE           regex type (posix|pcre)
  -D,  --domains=LIST              comma-separated list of accepted domains
       --exclude-domains=LIST      comma-separated list of rejected domains
       --follow-ftp                follow FTP links from HTML documents
       --follow-tags=LIST          comma-separated list of followed HTML tags
       --ignore-tags=LIST          comma-separated list of ignored HTML tags
  -H,  --span-hosts                go to foreign hosts when recursive
  -L,  --relative                  follow relative links only
  -I,  --include-directories=LIST  list of allowed directories
       --trust-server-names        use the name specified by the redirection
                                     URL's last component
  -X,  --exclude-directories=LIST  list of excluded directories
  -np, --no-parent                 don't ascend to the parent directory

Email bug reports, questions, discussions to <bug-wget@gnu.org>
and/or open issues at https://savannah.gnu.org/bugs/?func=additem&group=wget.
onlylove@ubuntu:~$

2, Parameter description

1,Startup

parameterexplain
-V, --versionDisplays the version of Wget and exit
-h, --helpPrint this help
-b, --backgroundGo to the background after startup
-e, --execute=COMMANDExecute the command of ". wgetrc" style

2,Logging and input file

parameterexplain
-o, --output-file=FILELog message to file
-a, --append-output=FILEAppend message to file
-d, --debugPrint a lot of debugging information
-q, --quietQuiet (no output)
-v, --verboseDetailed (this is the default)
-nv, --no-verboseTurn off lengthy, not quiet
–report-speed=TYPEThe output bandwidth is of type. The type can be bit
-i, --input-file=FILEDownload URL s found in local or external files
-F, --force-htmlTreat input files as HTML
-B, --base=URLParsing HTML input file links relative to URL s (- i -F)
–config=FILESpecify the profile to use
–no-configDo not read any configuration files
–rejected-log=FILERecord the reason for URL rejection to FILE

3,Download

parameterexplain
-t, --tries=NUMBERSet the NUMBER of retries to NUMBER (0 unlimited)
–retry-connrefusedTry again even if the connection is rejected
–retry-on-http-error=ERRORSComma separated list of HTTP errors to try again
-O, --output-document=FILEWrite document to file
-nc, --no-clobberDownload them to existing files (skip downloading them to existing files)
–no-netrcDon't try to start from netrc get credentials
-c, --continueContinue to get some downloaded files
–start-pos=OFFSETDownload location offset from zero
–progress=TYPESelect progress meter type
–show-progressDisplay progress bar in any detail mode
-N, --timestampingDo not retrieve the file again unless it is newer than the local file
–no-if-modified-sinceDo not use conditional if modified since to get requests in timestamp mode
–no-use-server-timestampsDo not use the time stamp on the server to set the time stamp of the local file
-S, --server-responsePrint server response
–spiderDon't download anything
-T, --timeout=SECONDSSet all timeout values to seconds
–dns-timeout=SECSSet DNS lookup timeout to SECS
–connect-timeout=SECSSet the connection timeout to SECS
–read-timeout=SECSSet the read timeout to SECS
-w, --wait=SECONDSThe number of seconds between waiting for retrieval
–waitretry=SECONDSWait 1... Wait 1... Seconds between retries of retrieval
–random-waitWait seconds between 0.5 * wait... 1.5 * retrieval
–no-proxyExplicitly close the agent
-Q, --quota=NUMBERSet the retrieval quota to NUMBER
–bind-address=ADDRESSAddress (hostname or IP) bound to the local host
–limit-rate=RATELimit download rate to
–no-dns-cacheDisable cached DNS lookup
–restrict-file-names=OSLimit the characters in file names to those allowed by the operating system
–ignore-caseIgnore case when matching files / directories
-4, --inet4-onlyConnect to IPv4 addresses only
-6, --inet6-onlyConnect to IPv6 addresses only
–prefer-family=FAMILYFirst connect to the address, IPv6, IPv4, or none of the specified family
–user=USERSet both ftp and http users to USER
–password=PASSSet both ftp and http passwords to PASS
–ask-passwordPrompt for password
–use-askpass=COMMANDSpecifies the credential handler used to request a user name and password.
If no COMMAND is specified, WGet is used_ Askpass or SSH_ASKPASS environment variable.
–no-iriTurn off IRI support
–local-encoding=ENCUse ENC as the local code of IRI
–remote-encoding=ENCUse ENC as the default remote encoding
–unlinkDelete files before clobber
–xattrOpen metadata store in extended file attribute

4,Directories

parameterexplain
-nd, --no-directoriesDo not create directory
-x, --force-directoriesForce directory creation
-nH, --no-host-directoriesDo not create Host Directory
–protocol-directoriesUse protocol name in directory
-P, --directory-prefix=PREFIXSave file to prefix /
–cut-dirs=NUMBERIgnore NUMBER remote directory component

5,HTTP options

parameterexplain
–http-user=USERSet the http USER to USER
–http-password=PASSSet the http password to PASS
–no-cacheDisable server caching data
–default-page=NAMEChange the default page name (usually 'index.html').
-E, --adjust-extensionSave the HTML/CSS document with the appropriate extension
–ignore-lengthIgnore 'content length' header field
–header=STRINGInsert string header
–compression=TYPESelect compression, auto, gzip, and none. (default: none)
–max-redirectMaximum number of redirects allowed per page
–proxy-user=USERSet USER as the proxy USER name
–proxy-password=PASSSet PASS as proxy password
–referer=URLInclude the 'Referer: URL' header in the HTTP request
–save-headersSave HTTP header to file
-U, --user-agent=AGENTIdentify as agent instead of Wget/VERSION
–no-http-keep-aliveDisable HTTP keep alive (persistent connections)
–no-cookiesDo not use cookies
–load-cookies=FILELoad cookies from files before session
–save-cookies=FILESave Cookie to file after session
–keep-session-cookiesLoad and save session (non persistent) cookies
–post-data=STRINGUse POST method; Send STRING as data
–post-file=FILEUse POST method; Send file content
–method=HTTPMethodUse HTTPMethod in request
–body-data=STRINGSend string as data. – method must be set
–body-file=FILESend the contents of the file. – method must be set
–content-dispositionFollow the content handling header when selecting a local file name (experimental)
–content-on-errorOutput received content in case of server error
–auth-no-challengeSend basic HTTP authentication information without waiting for a challenge from the server

6,HTTPS (SSL/TLS) options

parameterexplain
–secure-protocol=PRSelect security protocols, including auto, SSLv2, SSLv3, TLSv1, TLSv1 1, TLSv1 2 and PFS
–https-onlyFollow only secure HTTPS links
–no-check-certificateDo not verify the server's certificate
–certificate=FILEClient certificate file
–certificate-type=TYPEClient certificate type: PEM or DER
–private-key=FILEPrivate key file
–private-key-type=TYPEPrivate key type: PEM or DER
–ca-certificate=FILEFile containing CA bundle
–ca-directory=DIRDirectory to store CA Hash list
–crl-file=FILEFiles containing CRL bundles
–pinnedpubkey=FILE/HASHESPublic key (PEM/DER) files, or any number of base64 encoded sha256 hashes,
The front is "sha256 / /", separated by ";", To verify equivalence
–random-file=FILEFile containing random data used to seed SSL PRNG
–ciphers=STRYou can directly set the priority string (GnuTLS) or password list string (OpenSSL).
Please use with caution. This option overrides -- secure protocol.
The format and syntax of this string depend on the specific SSL/TLS engine.

7,HSTS options

parameterexplain
–no-hstsDisable HSTS
–hsts-filePath to HSTS database (the default value will be overwritten)

8,FTP options

parameterexplain
–ftp-user=USERSet ftp USER to USER
–ftp-password=PASSSet ftp password to PASS
–no-remove-listingDo not delete ". listing" file
–no-globClose FTP file name
–no-passive-ftpDisable passive transmission mode
–preserve-permissionsPreserve remote file permissions
–retr-symlinksWhen recursive, get the file linked to (not dir)

9,FTPS options

parameterexplain
–ftps-implicitUse implicit FTPS (default port is 990)
–ftps-resume-sslResume the SSL/TLS session started in the control connection when the data connection is opened
–ftps-clear-data-connectionEncrypt only the control channel; All data will be displayed in clear text
–ftps-fallback-to-ftpIf the target server does not support FTPS, fallback to FTP

10,WARC options

parameterexplain
–warc-file=FILENAMESave request / response data to warc.gz file
–warc-header=STRINGInsert string into warcinfo record
–warc-max-size=NUMBERSet the maximum size of the WARC file to NUMBER
–warc-cdxWrite CDX index file
–warc-dedup=FILENAMEDo not store the records listed in this CDX file
–no-warc-compressionDo not use GZIP to compress WARC files
–no-warc-digestsDo not calculate SHA1 summary
–no-warc-keep-logDo not store log files in WARC records
–warc-tempdir=DIRECTORYLocation of temporary files created by the WARC writer

11,Recursive download

parameterexplain
-r, --recursiveSpecify recursive Download
-l, --level=NUMBERMaximum recursion depth (inf or 0 means infinite)
–delete-afterDelete the file locally after downloading it
-k, --convert-linksMake the link in the downloaded HTML or CSS point to the local file
–convert-file-onlyConvert only the file portion of the URL (commonly referred to as the base name)
–backups=NRotate up to N backup files before writing to file X
-K, --backup-convertedBefore converting the file X, back it up as X.orig
-m, --mirror-N - R - L inf -- shortcut to no remove listing
-p, --page-requisitesGet all images required to display HTML pages, etc
–strict-commentsEnable strict (SGML) processing of HTML comments

12,Recursive accept/reject

parameterexplain
-A, --accept=LISTComma separated list of accepted extensions
-R, --reject=LISTComma separated list of rejected extensions
–accept-regex=REGEXRegular expression matches accepted URL
–reject-regex=REGEXRegular expressions match rejected URL s
–regex-type=TYPERegular expression type (posix|pcre)
-D, --domains=LISTComma separated list of accepted fields
–exclude-domains=LISTComma separated list of rejected fields
–follow-ftpAccess FTP links from HTML documents
–follow-tags=LISTComma separated list of HTML tags to follow
–ignore-tags=LISTComma separated list of ignored HTML tags
-H, --span-hostsGo to external host on recursion
-L, --relativeFocus only on relative links
-I, --include-directories=LISTList of allowed directories
–trust-server-namesUse the name specified by the last component of the redirect URL
-X, --exclude-directories=LISTExclude directory list
-np, --no-parentDo not rise to the parent directory

3, man wget

To be completed

Topics: Operation & Maintenance network