A opção --no-cookies
ajudou (graças ao wag ):
It seems like all the redirection caused wget to interrupt the request. Try with --no-cookies.
Isso foi determinado a partir da leitura do registro em anexo.
Eu tentei usar wget --mirror http://tshepang.net/
, mas ele só recupera uma página, " tshepang.net/index.html ". Isso é um bug no wget?
Aqui está a saída, usando a opção --debug
:
DEBUG output created by Wget 1.12 on linux-gnu.
Enqueuing http://tshepang.net/ at depth 0
Queue count 1, maxcount 1.
[IRI Enqueuing 'http://tshepang.net/' with None
Dequeuing http://tshepang.net/ at depth 0
Queue count 0, maxcount 1.
--2011-01-15 12:32:51-- http://tshepang.net/
Resolving tshepang.net... 66.216.125.32
Caching tshepang.net => 66.216.125.32
Connecting to tshepang.net|66.216.125.32|:80... connected.
Created socket 4.
Releasing 0x089e2be0 (new refcount 1).
---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: tshepang.net
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Found
Server: nginx/0.7.65
Date: Sat, 15 Jan 2011 10:33:45 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Status: 302 Found
Location: http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
X-Runtime: 3
Set-Cookie: cookies_enabled=true; path=/
Cache-Control: no-cache
Content-Length: 141
X-Varnish: 419207385
Age: 0
Via: 1.1 varnish
X-Cache: MISS
---response end---
302 Found
Stored cookie tshepang.net -1 (ANY) / <session> <insecure> [expiry none] cookies_enabled true
Registered socket 4 for persistent reuse.
Location: http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F [following]
Skipping 141 bytes of body: [<html><body>You are being <a href="http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F">redirected</a>.</body></html>] done.
--2011-01-15 12:32:52-- http://posterous.com/sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
conaddr is: 66.216.125.32
Resolving posterous.com... 184.106.20.99
Caching posterous.com => 184.106.20.99
Releasing 0x089e3e20 (new refcount 1).
Found posterous.com in host_name_addresses_map (0x89e3e20)
Connecting to posterous.com|184.106.20.99|:80... connected.
Created socket 5.
Releasing 0x089e3e20 (new refcount 1).
---request begin---
GET /sso/verify/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: posterous.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Found
Server: nginx/0.7.65
Date: Sat, 15 Jan 2011 10:33:46 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Status: 302 Found
Location: http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
X-Runtime: 7
Set-Cookie: _sharebymail_session_id=296a636c8ed3cb6e4e7cabb10256008a; domain=.posterous.com; path=/; HttpOnly
Cache-Control: no-cache
Content-Length: 142
X-Varnish: 2019529137
Age: 0
Via: 1.1 varnish
X-Cache: MISS
---response end---
302 Found
cdm: 1 2
Stored cookie posterous.com -1 (ANY) / <session> <insecure> [expiry none] _sharebymail_session_id 296a636c8ed3cb6e4e7cabb10256008a
Location: http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F [following]
Closed fd 5
--2011-01-15 12:32:53-- http://tshepang.net/sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F
Reusing existing connection to tshepang.net:80.
Reusing fd 4.
---request begin---
GET /sso/recovery/2d35d71b1e728dc99f3c153eaf6f8fa0?jumpto=%2F HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: tshepang.net
Connection: Keep-Alive
Cookie: cookies_enabled=true
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Found
Server: nginx/0.7.65
Date: Sat, 15 Jan 2011 10:33:46 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Status: 302 Found
Location: http://tshepang.net/
X-Runtime: 5
Set-Cookie: _sharebymail_session_id=cab0227db8c38f17e572984ee188dc5e; domain=tshepang.net; path=/; HttpOnly
Cache-Control: no-cache
Content-Length: 86
X-Varnish: 419207606
Age: 0
Via: 1.1 varnish
X-Cache: MISS
---response end---
302 Found
cdm: 1 2
Stored cookie tshepang.net -1 (ANY) / <session> <insecure> [expiry none] _sharebymail_session_id cab0227db8c38f17e572984ee188dc5e
Location: http://tshepang.net/ [following]
Skipping 86 bytes of body: [<html><body>You are being <a href="http://tshepang.net/">redirected</a>.</body></html>] done.
--2011-01-15 12:32:54-- http://tshepang.net/
Reusing existing connection to tshepang.net:80.
Reusing fd 4.
---request begin---
GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: tshepang.net
Connection: Keep-Alive
Cookie: _sharebymail_session_id=cab0227db8c38f17e572984ee188dc5e; cookies_enabled=true
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Server: nginx/0.7.65
Date: Sat, 15 Jan 2011 10:33:49 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Status: 200 OK
ETag: "6ec7aeb4e15e3a80e733f7c2b5e00d6f"
X-Runtime: 1680
Cache-Control: private, max-age=0, must-revalidate
Content-Length: 66513
X-Varnish: 419207692
Age: 0
Via: 1.1 varnish
X-Cache: MISS
---response end---
200 OK
Length: 66513 (65K) [text/html]
Saving to: 'tshepang.net/index.html'
0K .......... .......... .......... .......... .......... 76% 25.7K 1s
50K .......... .... 100% 39.3K=2.3s
2011-01-15 12:32:58 (27.9 KB/s) - 'tshepang.net/index.html' saved [66513/66513]
Deciding whether to enqueue "http://tshepang.net/".
Already on the black list.
Decided NOT to load it.
Redirection "http://tshepang.net/" failed the test.
FINISHED --2011-01-15 12:32:58--
Downloaded: 1 files, 65K in 2.3s (27.9 KB/s)
A opção --no-cookies
ajudou (graças ao wag ):
It seems like all the redirection caused wget to interrupt the request. Try with --no-cookies.
Isso foi determinado a partir da leitura do registro em anexo.
Supondo que wget esteja em seu caminho (se não for, você precisará inserir o caminho completo) emita os seguintes comandos:
mkdir wget_files
cd wget_files
wget --mirror –-wait=2 --page-requisites --html-extension –-convert-links –-directory-prefix wget_files/example1 http://www.yourdomain.com
Você também precisa definir -r
para recursivo e -l X
para profundidade de link, onde X é um inteiro. Também é uma boa idéia definir -A
para definir a lista de tipos de arquivos aceitáveis para manter (caso contrário, você só obtém arquivos HTML).