[Pulp-list] Software Content Management (Introducing Pulp)

Chris Murphy chris at castlebranch.com
Wed Mar 19 18:56:45 UTC 2008


Hey all,

This is Chris Murphy from Castle Branch.  Glad to see some traffic on 
this after FUDCon.  As I said there I have very little experience with 
the python aspect but do manage a couple sets of repos for our different 
distributions.  I thought I would fill this out as a way of getting back 
into the project and look forward to seeing it progress.

Just a quick overview of our repo management that is currently in 
place.  This example is for our desktops which use Fedora and can use 
all public mirrors.  When I started here they were using yam from Dag's 
repo's which I moved to a newer server and upgraded/changed to his new 
version/name mrepo.  The biggest problem I had with it was the whole 
symlinking setup.  I didn't see the need or want to have soft links from 
/var/mrepo to /var/www/html/mrepo.  So I had our intern strip out all of 
the symlink stuff so that it would just sync directly to a directory 
that was defined in the configuration file.  Then added a few defaults 
for mirroring like bandwidth usage and exclude dirs for rsync and lftp.  
Then we set up a few extra options:

    def version(self):
        print MYNAME+' '+MYVERSION
        print 'Written by Dag Wieers <dag at wieers.com>'
        print 'Rewritten by Tyler Gates <tjgates at castlebranch.com>'
        print
        print 'platform '+os.name+'/'+sys.platform
        print 'python '+sys.version
        print

    def usage(self):
        print 'usage: '+MYNAME+' [options] [--repo=dist1,[dist2-arch ..]]'

    def help(self):
        print '''Set up a distribution server

options:
  -h, --help                      show help message and exit
  -c, --config=file               specify alternative configfile
  -s, --sync                      sync against mirrors
  -g, --generate                  generate metadata from synced mirrors
  -e, --regenerate                regenerate metadata from synced mirrors
  -u, --update                    update metadata from synced mirrors
  -r, --repo=repo1,repo2          target repos
  -i, --include=mirror1,mirror2   include only mirror(s)
  -e, --exclude=mirror1,mirror2   exclude mirror(s)
  -d, --dry                       do a dry run
  -v, --version                   show version and exit
'''
The reason was so that we could specify the createrepo options easier 
using just one flag ie:
-g ->    default_generate_metadatacmd    = 'createrepo -d -v -p REPO_DIR'
-e ->    default_update_metadatacmd      = 'createrepo -d -v -update -p 
REPO_DIR'

I think mrepo had this originally but I know that the createrepo options 
in the /etc/mrepo.conf were getting overwritten somewhere because I 
couldn't get it to create useful metadata after Fedora 4 (we jumped from 
FC4 to F7, F8...).  And lastly added a place in the configuration file 
(taken from the mrepo.conf) for the path to the comps files (ie. 
/svn/workstations/fedora/7/i386/comps/comps-f7-desktops.chris.02-25.xml 
) that we wanted to associate to that repo and would get appended to the 
createrepo call:

[fedora/7/i386/unstable]
comps_file = 
/svn/workstations/fedora/7/i386/comps/comps-f7-desktops.chris.02-25.xml
everything = 
rsync://mirror.anl.gov/fedora/linux/releases/7/Fedora/i386/os/Fedora/
updates = rsync://fedora.mirror.iweb.ca/fedora/updates/7/i386/
livna = rsync://livna.cat.pdx.edu/rpm.livna.org-fedora/7/i386/
custom = file:///repodata/fedora/7/i386/stable/custom/
custom-testing = file:///repodata/fedora/7/i386/stable/custom-testing/
remi = http://iut-info.ens.univ-reims.fr/remirpms/fc7.i386
dries = 
ftp://ftp.pbone.net/mirror/dries.studentenweb.org/apt/fedora/fc7/i386/RPMS.dries/

So that's a basic outline of our very KISS style setup.

Here's the answers to the questions:
> Hi folks,
>
> As some of you who have participated in them already know, over the 
> past couple of years or so Red Hat has been conducting some studies on 
> how folks manage their systems using the Red Hat Network and Satellite 
> products. We've learned a lot about the processes many of you have 
> established for managing your systems and the strengths and weaknesses 
> of the RHN products in supporting those processes. In addition to 
> this, it is also clear that the free and open source management tools 
> available for Fedora, RHEL, and CentOS (as well as other *nixes) don't 
> sufficiently cover some of the areas of need that the current 
> Satellite product addresses.
>
> Over the past few weeks some Red Hat folks and Fedora community 
> members have been working on a free and open source project that will 
> not only attempt to fill one of the gaps in free & open source systems 
> management tools, but also to take some of the things we've learned 
> from talking with Satellite and RHN customers and improve upon how we 
> could address one area of systems management. This project is called 
> 'Pulp', and its scope is centered around the management of software 
> content. From the Pulp Fedora project website [1]:
>
> "Pulp is an application for managing the software installed on your 
> systems. Suppose you want to control what machines on your network get 
> what software updates, to establish testing/stage repositories, to 
> mirror 3rd party content, to create your own repositories, or to add 
> new content to existing repositories. Pulp will provide an easy web, 
> web-services, and command line interface for managing all of this."
>
> REPOSITORY CREATION AND MIRRORING MANAGEMENT
>
> To start, we were thinking that Pulp could be a way of improving upon 
> the custom channel management capabilities of Satellite, using yum 
> repositories instead of RHN channels. Last month Michael DeHaan hosted 
> a discussion introducing Pulp at FUDCon in Raleigh [2]. What we have 
> taken away from the participants of that session is that they would 
> like to see less emphasis on system <=> content mapping, and would 
> like tools that focus on mirroring contents from many different 
> 'upstream' sources and organizing them neatly in one place. It could 
> be kind of like mirror manager, except instead of managing the 
> mirroring of a particular set of content across many sites, it would 
> manage the mirroring of MANY different sets of content at ONE 
> particular site. On top of this it could greatly simplify the creation 
> and management of yum repositories from this mirrored content as well 
> as from other local content sources. (This today has some annoying 
> manual process involved.)
>
> CONTENT INVENTORY, ACCESS CONTROL, AND DELIVERY
>
> We have also thought about Pulp as a way of managing which content 
> gets to which systems and maintaining an inventory of which content is 
> which systems. For example, maybe using Pulp to get a list of which 
> systems are allowed to connect to which repositories, and maybe on a 
> more granular level, using Pulp to store black or whitelists of 
> packages that the system is allowed to access. Or maybe using it to 
> create a system whereby using some logical/policy statements you can 
> create virtual yum repositories that compose content from many sources 
> in a particular way and then contrl access to those.
>
> The group at FUDCon seemed to care less about content access control 
> and delivery, seeming to prefer letting their configuration management 
> systems (eg cfengine) handle content access and delivery to systems 
> and having Pulp stop at providing yum repos for these configuration 
> management tools to access. I do think, from talking with several 
> different types of Satellite and RHN users, that some folks may still 
> be interested in content access control, but at this point it seems 
> that repository creation and mirroring management is one area that 
> both groups of people would find great value in.
>
> DISCUSSION
>
> Many of the folks subscribed to these lists are seasoned Linux system 
> engineers, system administrators, and/or release engineers for 
> software content, so we would love to hear some of your thoughts on 
> what problems areas you'd like to see addressed by free and open 
> source management tools like Pulp. If you have any thoughts on the 
> following topics or others that are related but maybe not mentioned 
> here, please let's discuss them here and see if we figure out the best 
> way to make Pulp useful for you!:
>
> - Do you host internal mirrors of external content? What kind of 
> content? How many mirrors? Do you have mirrors available for multiple 
> geographic locations within your organization?
>
Yes, all synced from public mirrors and the local mirrors are setup for 
two separate offices. 
> - How many different 'upstream' sources of content need to be made 
> available for systems at your organization? Hardware drivers from 
> hardware vendors? Operating systems from OS vendors or from FOSS 
> repos? Non-FOSS proprietary applications from application vendors? 
> In-house application/software development teams?
>
N/A
> - How often do you pull down content ('sync' maybe could be a term) 
> from these different upstream content sources?
>
Monthly to the unstable yum repos.  Once unstable is ready to be used, 
it's hard linked to the stable.  Forgot that part in the intro, mrepo 
(just for clarification and so that I don't cause any static, I didn't 
really spend a lot of time looking at the code for mrepo so the 
"problems" I've cited are most likely RTFM errors on my part) had a 
tendency to randomly delete content not on source during rsync's which 
would cause havoc with the hard linked rpms.
> - How do you organize all of the software content that is delivered to 
> your systems right now? What are the strengths you've found to your 
> approach today? What are the weaknesses you'd like to address?
>
Mostly through the comps.xml, kickstart and yum.conf
> - How much customization/general 'mucking' do you do with the content 
> you pull down from various sources? Are you more interested in simply 
> making all the content available or do you have requirements for 
> modifying/customizing it as well?
>
Almost none at all.  If we do, it goes into the "custom" repo and we 
maintain and pull down updates manually
> - If you do customize the content, to what extent do you need to do 
> this? Branding? Localization? Etc.?
>
Mostly these are cpan2rpm that we create and a few purchased rpm's.
> - How strict are your policies for which systems have access to which 
> kind of content? Is access completely open, is access constrained by 
> which system owners have purchased licenses/entitlements to which 
> content? Is access constrained by security concerns? Is access 
> constrained by stability concerns (e.g., production systems must never 
> be able to have development level content deployed to them?)
None
>
> - What kind of requirements do you have for producing data about which 
> systems had which content installed when, if any?
This is a definite area of weakness.  I'm using OCS Inventory right now 
because it's the easiest and it can be used to update the few Windows 
boxes we have.  It's not perfect but since we have the comps groups 
clearly defined and OCS has good filtering for deployment, it's as easy 
as say: selecting SalesDept and deploying the command yum -y groupupdate 
<group> which is why I wanted to be able to easily toggle the --update 
flag to regenerate groups and add packages often.  This is especially 
critical in the development repos.  OCS generates a graph of which hosts 
have completed the updates, which is consistently around 98-99% of them, 
the rest I do by hand.
>
> - How many different environments do you manage content for? Do you 
> manage content for development / qa / production environments?
>
All three
> - How do you prefer to deploy content to systems? Do you prefer to 
> have a software management tool to do that or do you prefer to tie 
> this into a configuration management tool?
>
Tied in with answer above.
> - At what level of granularity do you perform software-management 
> related tasks on your systems? For example, do you find yourself most 
> often:
>   - automatically selecting and deploying content to many systems at 
> once in a uniform fashion
>   - automatically selecting and deploying content to smaller groupings 
> of systems with carefully defined templates
>   - manually selecting and deploying content to many systems at once
Yes, exclusively
>
>   - manually selecting and deploying content to individual systems 
> one-by-one
>   What level of importance does each of these abilities have to you?
>
> SHAMELESS PLUG
>
> Pulp is an open project, stop by the mailing list (cc'ed :) ) to say 
> hi! Feedback, bug reports, ideas, and patches are always welcome. :)
>
> Thanks,
> ~m and the Pulp Team :)
>
> [1] https://fedorahosted.org/pulp
>
> [2] Notes available here: 
> https://fedorahosted.org/pulp/wiki/FudConOhEightNotes
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
>




More information about the Pulp-list mailing list