Eu vi que, para rastrear um site inteiro, esse comando deve funcionar:
wget --spider -r https://wikipedia.org/
Mas a minha pergunta é por que o mesmo comando para rastrear um site inteiro não funciona com a Wikipédia?
Meu objetivo não é rastrear todos os wikiepdia, mas saber a diferença.
Esta é a saída do comando:
Spider mode enabled. Check if remote file exists.
--2016-08-31 17:53:56-- http://wikipedia.org/
Resolving wikipedia.org (wikipedia.org)... 91.198.174.192, 2620:0:862:ed1a::1
Connecting to wikipedia.org (wikipedia.org)|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 301 TLS Redirect
Location: https://wikipedia.org/ [following]
Spider mode enabled. Check if remote file exists.
--2016-08-31 17:53:56-- https://wikipedia.org/
Connecting to wikipedia.org (wikipedia.org)|91.198.174.192|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.wikipedia.org/ [following]
Spider mode enabled. Check if remote file exists.
--2016-08-31 17:53:56-- https://www.wikipedia.org/
Resolving www.wikipedia.org (www.wikipedia.org)... 91.198.174.192, 2620:0:862:ed1a::1
Connecting to www.wikipedia.org (www.wikipedia.org)|91.198.174.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain links to other resources -- retrieving.
--2016-08-31 17:53:56-- https://www.wikipedia.org/
Reusing existing connection to www.wikipedia.org:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘wikipedia.org/index.html’
[ <=> ] 81 292 --.-K/s in 0,03s
2016-08-31 17:53:57 (2,44 MB/s) - ‘wikipedia.org/index.html’ saved [81292]
Removing wikipedia.org/index.html.
Found no broken links.
FINISHED --2016-08-31 17:53:57--
Total wall clock time: 0,2s
Downloaded: 1 files, 79K in 0,03s (2,44 MB/s)