[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: OT: autosave of google alert sites?
- From: James Wilkinson <james westexe demon co uk>
- To: For users of Fedora Core releases <fedora-list redhat com>
- Subject: Re: OT: autosave of google alert sites?
- Date: Wed, 13 Oct 2004 23:24:07 +0100
Dave Stevens wrote:
> I get google news alerts
> In an ideal world, I would be able to have a daily program (script?) run
> that would examine that day's alerts, resolve the URLs and save the pages.
Alan Peery suggested:
> 1. Use wget to retrieve the google news alert page to a file
> 2. parse the file with PERL, gaining URLs
> 3. wget those URLs, putting them into subdirectories based on the day
> your script is running
I don't think stage 2 is necessary: man wget suggests:
-i file
--input-file=file
Read URLs from file, in which case no URLs need to be on the com-
mand line. If there are URLs both on the command line and in an
input file, those on the command lines will be the first ones to be
retrieved. The file need not be an HTML document (but no harm if
it is)---it is enough if the URLs are just listed sequentially.
But then I'm not sure stage 3 is necessary, either: wget supports
recursive retrieval:
-r
--recursive
Turn on recursive retrieving.
-l depth
--level=depth
Specify recursion maximum depth level depth. The default maximum
depth is 5.
Take a good look at the options in the wget man page, especially the
examples under --page-requisites. You may need --span-hosts.
(I must admit that I've never really tried using these options, so
you'll need to experiment.)
James.
--
E-mail address: james | A: Because people don't normally read bottom to top.
@westexe.demon.co.uk | Q: Why is top-posting such a bad thing?
| A: Top-posting.
| Q: What is the most annoying thing in e-mail?
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]