Yum, Proxy Cache Safety, Storage Backend

Thu Jan 24 16:50:48 UTC 2008

James Antill wrote:

>> I think you are missing my point, which is that it would be a huge win 
>> if yum automatically used typical existing caching proxies with no extra 
>> setup on anyone's part, so that any number of people behind them would 
>> get the cached packages without knowing about each other or that they 
>> need to do something special to defeat the random URLs.
> 
>  HTTP doesn't define a way to do this, much like the Pragma header
> suggestion is a pretty bad abuse of HTTP ...

It's worked for years in browsers. If you have a stale copy in an 
intermediate cache, a ctl-refresh pulls a fresh one.

 > suddenly the origin server
> going down means you can't get the data from the proxy.

Only if the DNS for the origin server points to a single dead IP or the 
client is too dumb to retry the alternates.  Even IE can handle this, so 
it can't be all that difficult...  And yum could have some other retry 
strategy too.

>  Please don't assume half of a solution, that never works well. What you
> _actually want_ is:
> 
>  "On my group my machines X, try not to download data more than once."

Add in "by default, with no prearrangement other than configuring a 
proxy cache to handle large files" and you are close.  The other piece 
is that the same solution should work for all duplicate http downloads.

> ...at the ISP level this is solved by getting your mirror into mirror
> manager directly.

Per-disto, per location setup just isn't going to happen.

>  At the personal level this is likely better solved by having something
> like "Have a zero-conf like service to share pacakges across the local
> network". There has even been some work to do this.

What does 'local' mean in this context?  Many of the machines sharing 
local proxies would not be in broadcast or multicast range of each other.

>  In neither case is "work around HTTPs design in yum" a good solution,
> IMNSHO.

I'd rather call it "using existing infrastructure and protocols 
intelligently" - instead of cluttering everyone's caches with randomized 
URLs to get duplicate files.

-- 
   Les Mikesell
    lesmikesell at gmail.com