yum differential updates

Rudolf Kastl che666 at gmail.com
Mon Apr 10 22:50:45 UTC 2006


2006/4/11, Rudolf Kastl <che666 at gmail.com>:
> 2006/4/10, Jon Burgess <jburgess at uklinux.net>:
> > On Mon, 2006-04-10 at 13:35 -0400, Jesse Keating wrote:
> > > On Mon, 2006-04-10 at 19:19 +0200, Rudolf Kastl wrote:
> > > >
> > > > well the reason for beeing able to mirror the update repos with a
> > > > permanently updated torrent would be simply that people are able to
> > > > share the bandwidth they have open. rsync causes lots of server load
> > > > afaik. mirroring is useful for a variety of reasons.
> > >
> > > But does torrent offer the ability that rsync does, to only grab the
> > > differences?  If you're re-torrenting the whole thing every day that
> > > seems less than optimal.
> >
> > It only grabs the differences, but my understanding is that every
> > operation is done in units of one "piece". The pieces are all of a fixed
> > size which is set when the torrent file is created, e.g. 256kB in
> > bordeaux-DVD-i386.torrent.
> >
> > I think it would work as follows:-
> >
> > 1) RH create a torrent with all current updates and publish on tracker
> > and start seeding.
> >
> > 2) A user starts off with no updates, downloads torrent. Downloads all
> > the files updates from the seed and other users (other users that have
> > been doing the same thing).
> >
> > 3) Some time later, RH publish a new torrent which has a mixture of some
> > of the old files, with some added and some removed.
> >
> > 4) User downloads new torrent. The user adds this to his torrent
> > program, making sure to select the same location as the previous
> > download (this is key).
> >
> > 5) The torrent software will go through every file listed in the new
> > torrent, some of which will be found and some will not.
> >
> > 6) Every "piece" in the new tracker will be part of one or more files.
> > If the user has all the bytes contained in the piece then the software
> > will checksum them to ensure they are correct and then note that this
> > piece is already downloaded.
> >
> > 7) Pieces which have missing data, e.g. the piece contains data from a
> > file which the user doesn't have, then the software will ignore the
> > current contents of the piece and put it in the list of pieces which
> > need to be downloaded.
> >
> > 8) The software proceeds to exchange pieces with other users and the
> > seeds to collect all pieces of the torrent. As each is received it
> > verifies the checksum and writes the contents out to the appropriate
> > files.
> >
> > A long list of observations and thoughts:-
> >
> > - The user must keep downloading to the same location to gain the
> > benefit (/var/cache/yum/update/packages might be good).
> >
> > - The downloads are not as efficient as a delta-RPM since the torrent
> > will still need to download the complete contents of any new RPM. It
> > does however, reduce the load on the mirror system.
> >
> > - The torrent will only exchange data with users running exactly the
> > same torrent file, so if you are the first one to download a new RH
> > torrent then there will be no-one else to get data from (except the
> > initial seed).
> >
> > - Due to the problem above, it probably makes sense to only update the
> > torrent infrequently (maybe once per week). The user should probably
> > rely on using the normal yum mechanisms to download the very latest
> > updates. Provided these get done to the same location as the torrent
> > download and are cached then they won't be downloaded again once the
> > torrent is updated (the user will immeadiately act as a seed for these
> > once he gets the updated torrent).
> >
> > - Nothing will automatically remove old files the users download
> > location. "yum clean packages" would remove the downloaded files, but
> > the torrent would then have to download all the current updates again.
> >
> > - The user may need to make available several GB of storage to hold all
> > the updates even though he might never install some of these RPMs on his
> > system.
> >
> > - It would probably make sense to create separate torrents for the
> > normal and debug RPMs. I guess there should be 2 torrents per
> > ARCH/Release pair, plus maybe a SRPM torrent.
> >
> > - Some users will be unable to use the torrent since they are behind
> > corporate firewalls which block it, it isn't a replacement for yum.
> >
> > - The new "LAN peer mode" in Azureus may enable clients to exchange
> > pieces on a local network at high speeds minimising the need to download
> > from the Internet. This would be a useful addition to the current yum
> > behaviour.
> >
> > - Yum may need to be a little smarter about making certain that RPMs
> > have the right checksum before using them. I know RPM does verify
> > checksums, but I don't think yum does right now. The torrent will
> > typically create all the new RPMS with 0 length and then use sparse
> > writes to reconstitute the file once piece at a time whenever it
> > receives some data. If the torrent download is ceased then many files
> > will not contain the complete data (even though the length may be
> > correct). Yum might like delete the file and re-download it.
> >
> > - There might be scope for a specialised torrent client to automate some
> > of the behaviour above, e.g. pro-actively downloading new torrents,
> > perhaps only downloading "pieces" which are contain data relevant to
> > updates of RPMs currently installed (a client doesn't have to download
> > and store the complete torrent). Deleting files which are no longer
> > present in the latest torrent.
> >
> >         Jon
> >
> >
> > --
> > fedora-test-list mailing list
> > fedora-test-list at redhat.com
> > To unsubscribe:
> > https://www.redhat.com/mailman/listinfo/fedora-test-list
> >
>
> a few thoughts on how one could seriously implement this:
>
> a small commandline utility (python?) that fetches the latest torrent
> from a location and checks if its signed with the correct fedora key
> and after that look if an update seed service is running. if its
> running restart it. if its dead start it if configured so.
>
> it (the commandline util to fetch the torrent)  is invoked by a
> crontabbed task (implementation similar to nightly yum update)
>
> after that it would start downloading the torrent end seed it.

additionally... on the serverside the torrents could be created at
every repo generation time. (to be discussed)

>
> regards,
> rudolf kastl
>




More information about the fedora-test-list mailing list