[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: How to mirror web sites



> 
> > > Have you tried GNU Wget (should be version 1.4.x) ? I think you might find
> > > it in RPM form in any RedHat mirror site.
> > > 
> > 
> > I have tried this, but I find that sometimes it fails to follow certain of
> > the links on a page despite them being at the same site and at the same
> > recursion level, and it's not because it is halting due to having downloaded
> > too much data.
> 
> What kind of links ? Could they have something to do with the robots.txt
> file ?
> 

No, I have the robots.txt recognition turned off.  It seems to be isolated to
HREF links that don't specify the full http://... address and instead just say
something like HREF=file.html.  I've also noticed it sometimes with SRC= items
as well for images.

-- 
+------------------------------------+----------------------------------------+
| Jeff Richards                      | Telephone: (604) 231-2667              |
| MacDonald Dettwiler Ltd.           +----------------------------------------+
| 13800 Commerce Parkway             | Email: jrichard mda ca                 |
| Richmond, BC  CANADA  V6V 2J3      |                                        |
+------------------------------------+----------------------------------------+



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]