MirrorManager crawler patch

Ricky Zhou ricky at fedoraproject.org
Mon Jul 20 15:32:51 UTC 2009


On 2009-07-20 09:34:47 AM, Bruno Wolff III wrote:
> > 2) MirrorManager currently doesn't check timestamps, and the solution to
> >    this isn't trivial, especially since with FTP, which returns
> >    directory listing data as just the text of the output.  This is
> >    almost impossible to parse accurately, especially when time zones are
> >    involved, and when time zone data isn't even returned by FTP.
> 
> Maybe you could check a hash of the repomd.xml file? You shouldn't have to
> track too many different hashes.
For what it's worth, this hash checking already happens on repomd.xml
files for mirrors that are crawled via HTTP, and my patch added that
check to FTP mirrors as well.  

When talking on IRC with Matt, we realized that the check shouldn't be
necessary at all though, since the other files in the repodata are
successfully getting the repodata directory marked outdated (and we did
confirm that this was happening with the last bu mirror crawl).

Overall, I think the crawling has been working fine even without the
timestamp checking (apart from some issues caused by the timestamp
problem we recently saw), I just wanted to mention why that was
currently disabled.  

As another side note, mirrormanager is currently aware of what
directories are repositories:

< mdomsch> sure
< mdomsch> so, MM does know that that dir is a repository
< mdomsch> class Directory:  repository = SingleJoin('Repository')
< mdomsch> bu the crawler doesn't do anything special with that knowledge
< mdomsch> perhaps it should
< mdomsch> by definition, a Repository is a Directory that has a child directory named 'repodata'
< mdomsch> but the whole directory tree starting at that Directory down, is part of the repository

So all of the framework should be in place for marking an entire
repository out of date if the repodata is out of date.  

Thanks,
Ricky
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-infrastructure-list/attachments/20090720/9f552fca/attachment.sig>


More information about the Fedora-infrastructure-list mailing list