[Spacewalk-list] Spacewalk not delivering comps.xml during kickstart

Michael Guidero mg at sococo.com
Tue Oct 1 16:00:16 UTC 2013


Hi,

We have recently experienced a problem with our Spacewalk 1.9 installation.  We maintain a series of channels, Dev, Staging, and Production, with Dev being either synced from public repositories (Scientific Linux, EPEL, etc.) or populated with rhnpush.  We follow the pattern of scheduling patches by cloning the Dev channel into staging, then staging into production.

Following this pattern, for kickstarts we have dev, staging, and production kickstarts for each type of setup that use the matching channels.  This has worked perfectly until recently.

Recently we started deploying a new server type, and last week it was time to start deploying the production systems.  We cloned our staging kickstart and created an appropriate new activation key and set the appropriate channels (the relevant channels already existed prior, for a long time now).  When we went to kickstart the system, we began seeing "retrying download" for comps.xml (and repomd.xml at times).  

In the log on vt3 of the installing system, we saw "WARNING : Try 1/10 for http://<sw-proxy-ip>/ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml failed: [Errno -1] Metadata file does not match checksum" and then progressing through try 10/10.  

On the proxy we saw in squid access.log "1380214098.913    166 127.0.0.1 TCP_MISS/200 4126 GET http://<sw-proxy-hostname>/ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml - DIRECT/<sw-proxy-ip> text/html"

On the main SW server we saw messages such as "<sw-proxy-ip> - - [26/Sep/2013:11:24:32 -0700] "GET /ks/dist/child/prod-clone-sl-6.2-x86_64-epel/SL-62-x86_64-2012-02-06/repodata/comps.xml HTTP/1.1" 200 3724 "-" "Scientific Linux (anaconda)/6.2"

We were not able to reproduce this with the kickstart we cloned from.  We verified that we see the same results when kickstarting through the proxy and directly through the main Spacewalk server.  We also tried creating a new kickstart clone and see the same thing.

When I try to load the URL in the browser through the proxy, I get "file download failed."  If I try through the browser and the main SW server, I get a pop-up that says "A serve error has occurred"  However, based on the dissimilar log messages to what we experience when it is via kickstart, I am not convinced that this is a proper way to check.  Is it?

We can eliminate the problem by changing the new kickstart to use our dev channels.  If we use the production channels, or the staging channels (that apparently work still with the staging kickstart that we cloned from), then the problem reappears.

I am unsure whether I've run into a bug or there has been some sort of corruption somewhere.  We haven't had any hardware events of note, and the PostgreSQL database doesn't appear to have any bad blocks or other issues.  One possible additional symptom is that we had Spacewalk monitoring stop working before we started working on the new kickstarts, and restarting all Spacewalk services did not clear the issue.  We ended up rebooting the Spacewalk server and that did clear up the monitoring issue.

I've already tried adding and removing packages from our channels that are populated with rhnpush, and for the others I have treid resyncing our dev channels and then deleting and re-adding packages to the clone channels, but for some of them I can't fully do this because we are running into https://bugzilla.redhat.com/show_bug.cgi?id=970315 and I had planned an upgrade to SW 2.0 to address this once we had completed the production release on which we were working.  However I am loathe to upgrade to 2.0 with this issue unresolved, unless it's a bug in 1.9 and 2.0 fixes it, of course - but searching this list and the larger internet hasn't indicated to me that others have experienced this.

Everything else in Spacewalk appears to be working fine.  Even the channels that fail during kickstart can be assigned to systems, yum clean all and then yum install a package and there are no complaints from yum.

Anyone have any ideas what might be going on here?  Other ideas to troubleshoot?  And of course, any ideas on how to fix are most welcome.

Thanks,  
Michael Guidero 
Sococo IT 




More information about the Spacewalk-list mailing list