[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Pulp-list] grinder yum filters



Hi list, I just discovered pulp the other day while looking for a more convenient way to synch yum repos than my crazy homebrew bash scripts with 'plugins' and per-repo config files.

Mainly, rsyncing whole repos is fine for the small sites I manage. However, a number of custom RPMs depend on small numbers of packages from 3rd party repos, Pulp initially looked right to synch partial repos with its 'whitelist' feature, but it didn't take long to trace the lack of this feature for remote repos down to its absence in the grinder package.

For the small shop use case, grinder is fine, so I added a quick filtering function to it. The filters have their own class similar to pulp's, so maybe it'll be easy for someone to plumb it into pulp down the road. However, the attached patch can probably be called hackish, so maybe by the time pulp has filtering for remote repos, it'll be a complete rewrite. I don't feel too embarrassed about the hackishness, though, since grinder appears to be a bit of a patchwork itself; I'm guessing that the RHN and yum parts were two separate scripts that were quickly thrown together into a single library?

The patch is pretty well self-documenting. Testing was done with something like the following command line:

PYTHONPATH=src python bin/grinder yum --label somerepo \
	-b /tmp -U 'http://www.somesite.com/somerepo' \
	--filter=whitelist --filter_regex='cool-package.*' --debug

BTW, 'run-tests.py' will go through its routine by adding the line 'sys.path.append("itests/")', but lots of failures, so I didn't bother with unit tests.

This patch doesn't deal with filtering __getDRPMs. It would be easy to add, but my sites don't need it just yet, and this submission is partly just to test the warmth of the waters.

This patch doesn't deal with running createrepo. Pulp has its own utilities to take care of that. With this patch, you'll end up with the original repodata but of course missing any packages excluded by the filters, so repodata must be regenerated elsehow.

More disclaimers: I'm not a programmer. I'm not a python programmer. It's the first time I've touched python in three years (last shop ran Perl, whew it's nice to be doing python again!). I'm too lazy to read and understand the whole grinder code, so this patch probably makes no sense in its greater design. I ramble too much.

Future projects: maybe add /etc/grinder/yum.yml config (already have a good start), and maybe add createrepo functionality.

	John
diff --git a/src/grinder/Filter.py b/src/grinder/Filter.py
new file mode 100644
index 0000000..79d570f
--- /dev/null
+++ b/src/grinder/Filter.py
@@ -0,0 +1,104 @@
+# This software is licensed to you under the GNU General Public
+# License as published by the Free Software Foundation; either version
+# 2 of the License (GPLv2) or (at your option) any later version.
+# There is NO WARRANTY for this software, express or implied,
+# including the implied warranties of MERCHANTABILITY,
+# NON-INFRINGEMENT, or FITNESS FOR A PARTICULAR PURPOSE. You should
+# have received a copy of GPLv2 along with this software; if not, see
+# http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt.
+
+# Filters for syncing remote package repos
+# Grabbed from Pulp by John Morris <john zultron com>
+#
+# This is the main feature I wanted Pulp for; my use case is downloading
+# a limited set of packages (whitelist) from a repo without having
+# to sync the whole thing, but still retain yum's smarts for grabbing new
+# versions and removing old ones
+
+
+import re
+import logging
+
+LOG = logging.getLogger("grinder.Filter")
+
+class Filter(object):
+    """
+    Class represents a 'blacklist' or 'whitelist' filter type that can be
+    applied when syncing a local repository
+
+    regex_list is a list of regex strings to be applied to package 'filename'
+    (see below); if any regex matches, the Filter.test() will be true for
+    whitelists or false for blacklists
+
+    use the set_regex_list method to change the regex list after object
+    creation; this ensures that the regexes are compiled
+
+    (actually, using 'filename' seems hackish, but it's easy to do from
+    the command line and with simple regexes)
+    (more hackish still, because the closest yum.packages.PackageObject 
+    appears to have to a 'filename' is its '__str__()', used instead
+    of some actual RPM filename)
+    """
+    def __init__(self, type, description=None, regex_list=None):
+        self.description = description
+        self.type = type
+        self.set_regex_list(regex_list)
+
+    def set_regex_list(self,regex_list):
+        """
+        Set the list of regexes & list of compiled regexes
+        """
+        self.regex_list = []
+        self.regex_obj_list = []
+        if not regex_list:
+            return
+        for regex in regex_list:
+            self.regex_list.append(regex)
+            self.regex_obj_list.append(re.compile(regex))
+
+    def iswhitelist(self):
+        """
+        return true if self is a whitelist
+        """
+        return self.type == "whitelist"
+
+    def isblacklist(self):
+        """
+        return true if self is a blacklist
+        """
+        return self.type == "blacklist"
+
+    def test(self, pkg):
+        """
+        return pkg if pkg passes through the filter, else None
+
+        pkg is a yum package object
+        """
+
+        # the string we match against
+        pkg_filename = str(pkg)
+
+        # compare pkg to each regex & break if there's a match
+        match_result = None
+        for regex_obj in self.regex_obj_list:
+            if regex_obj.match(pkg_filename):
+                match_result = regex_obj.pattern
+                break
+
+        # return result based on match and filter type
+        if self.iswhitelist():
+            if match_result:
+                LOG.debug ("package %s:  passed whitelist, matched %s" %
+                           (pkg_filename, match_result))
+                return pkg
+            else:
+                LOG.debug ("package %s:  blocked by whitelist" % pkg_filename)
+                return None
+        else:
+            if match_result:
+                LOG.debug ("package %s:  blocked by blacklist, matched %s" %
+                           (pkg_filename, match_result))
+                return None
+            else:
+                LOG.debug ("package %s:  passed blacklist" % pkg_filename)
+                return pkg
diff --git a/src/grinder/GrinderCLI.py b/src/grinder/GrinderCLI.py
index 5c7f58e..8c57b9f 100755
--- a/src/grinder/GrinderCLI.py
+++ b/src/grinder/GrinderCLI.py
@@ -22,6 +22,7 @@ from grinder.RepoFetch import YumRepoGrinder
 from grinder.RHNSync import RHNSync
 from grinder.GrinderExceptions import *
 from grinder.FileFetch import FileGrinder
+from grinder.Filter import Filter
 
 LOG = logging.getLogger("grinder.GrinderCLI")
 
@@ -228,6 +229,10 @@ class RepoDriver(CliDriver):
                           help="skip verify size of existing packages")
         self.parser.add_option('--skip_verify_checksum', action="store_true",
                           help="skip verify checksum of existing packages")
+        self.parser.add_option('--filter', action="store",
+                          help="add a filter, either whitelist or blacklist")
+        self.parser.add_option('--filter_regex', action="append",
+                          help="add a filter regex; may be use multiple times")
 
     def _validate_options(self):
         if not self.options.label:
@@ -241,6 +246,19 @@ class RepoDriver(CliDriver):
         if self.options.parallel:
             self.parallel = self.options.parallel
 
+        if self.options.filter:
+            if ((self.options.filter != "whitelist") and 
+                (self.options.filter != "blacklist")):
+                print("--filter=<type> should be either " +
+                      "'whitelist' or 'blacklist'")
+                sys.exit(-1)
+            if not self.options.filter_regex:
+                print("please provide a --filter_regex when using --filter")
+                sys.exit(-1)
+            LOG.debug("--filter=%s --filter_regex=%s" % 
+                      (self.options.filter, 
+                       self.options.filter_regex))
+
     def _do_command(self):
         """
         Executes the command.
@@ -257,14 +275,20 @@ class RepoDriver(CliDriver):
             verify_options["size"] = False
         if self.options.skip_verify_checksum:
             verify_options["checksum"] = False
-        self.yfetch = YumRepoGrinder(self.options.label, self.options.url, \
-                                self.parallel, cacert=self.options.cacert, \
-                                clicert=self.options.clicert, clikey=self.options.clikey, \
-                                proxy_url=self.options.proxy_url, 
-                                proxy_port=self.options.proxy_port, \
-                                proxy_user=self.options.proxy_user, \
-                                proxy_pass=self.options.proxy_pass,
-                                sslverify=sslverify, max_speed=limit)
+        if self.options.filter:
+            self.options.filter = Filter(self.options.filter, 
+                                         regex_list=self.options.filter_regex)
+        self.yfetch = YumRepoGrinder(
+            self.options.label, self.options.url,
+            self.parallel, cacert=self.options.cacert,
+            clicert=self.options.clicert,
+            clikey=self.options.clikey,
+            proxy_url=self.options.proxy_url, 
+            proxy_port=self.options.proxy_port,
+            proxy_user=self.options.proxy_user,
+            proxy_pass=self.options.proxy_pass,
+            sslverify=sslverify, max_speed=limit,
+            filter=self.options.filter)
         if self.options.basepath:
             self.yfetch.fetchYumRepo(self.options.basepath, verify_options=verify_options)
         else:
diff --git a/src/grinder/RepoFetch.py b/src/grinder/RepoFetch.py
index 292a1f3..2243b21 100644
--- a/src/grinder/RepoFetch.py
+++ b/src/grinder/RepoFetch.py
@@ -55,12 +55,13 @@ class YumRepoGrinder(object):
     """
       Driver class to fetch content from a Yum Repository
     """
-    def __init__(self, repo_label, repo_url, parallel=10, mirrors=None, \
-                       newest=False, cacert=None, clicert=None, clikey=None, \
-                       proxy_url=None, proxy_port=None, proxy_user=None, \
-                       proxy_pass=None, sslverify=1, packages_location=None, \
-                       remove_old=False, numOldPackages=2, skip=None, max_speed=None, \
-                       purge_orphaned=True, distro_location=None, tmp_path=None):
+    def __init__(self, repo_label, repo_url, parallel=10, mirrors=None,
+                 newest=False, cacert=None, clicert=None, clikey=None,
+                 proxy_url=None, proxy_port=None, proxy_user=None,
+                 proxy_pass=None, sslverify=1, packages_location=None,
+                 remove_old=False, numOldPackages=2, skip=None, max_speed=None,
+                 purge_orphaned=True, distro_location=None, tmp_path=None,
+                 filter=None):
         self.repo_label = repo_label
         self.repo_url = repo_url
         self.repo_dir = None
@@ -96,6 +97,7 @@ class YumRepoGrinder(object):
         self.rpmlist = []
         self.drpmlist = []
         self.tmp_path = tmp_path
+        self.filter = filter
 
     def getRPMItems(self):
         return self.rpmlist
@@ -132,13 +134,16 @@ class YumRepoGrinder(object):
         self.fetchPkgs = ParallelFetch(self.repoFetch, self.numThreads, callback=callback)
         self.fetchPkgs.processCallback(ProgressReport.DownloadMetadata)
 
-        info = YumInfo(repo_label=self.repo_label, repo_url=self.repo_url, mirrors = self.mirrors,
-                        repo_dir=self.repo_dir, packages_location=self.pkgpath,
-                        newest=self.newest, remove_old=self.remove_old, numOldPackages=self.numOldPackages,
-                        cacert=self.sslcacert, clicert=self.sslclientcert, clikey=self.sslclientkey,
-                        proxy_url=self.proxy_url, proxy_port=self.proxy_port,
-                        proxy_user=self.proxy_user, proxy_pass=self.proxy_pass,
-                        sslverify=self.sslverify, skip=self.skip, tmp_path=self.tmp_path)
+        info = YumInfo(
+            repo_label=self.repo_label, repo_url=self.repo_url, 
+            mirrors = self.mirrors, repo_dir=self.repo_dir, 
+            packages_location=self.pkgpath, newest=self.newest,
+            remove_old=self.remove_old, numOldPackages=self.numOldPackages,
+            cacert=self.sslcacert, clicert=self.sslclientcert, 
+            clikey=self.sslclientkey, proxy_url=self.proxy_url, 
+            proxy_port=self.proxy_port, proxy_user=self.proxy_user, 
+            proxy_pass=self.proxy_pass, sslverify=self.sslverify, skip=self.skip,
+            tmp_path=self.tmp_path, filter=self.filter)
         info.setUp()
         self.rpmlist = info.rpms
         self.drpmlist = info.drpms
diff --git a/src/grinder/YumInfo.py b/src/grinder/YumInfo.py
index 3604d8c..118ce8a 100644
--- a/src/grinder/YumInfo.py
+++ b/src/grinder/YumInfo.py
@@ -43,7 +43,7 @@ class YumMetadataObj(object):
                  mirrorlist=None,
                  proxy_url=None, proxy_port=None,
                  proxy_user=None, proxy_pass=None,
-                 sslverify=1, tmp_path=None):
+                 sslverify=1, tmp_path=None, filter=None):
         self.repo = None
         self.repo_label = repo_label
         self.repo_url = repo_url.encode('ascii', 'ignore')
@@ -59,6 +59,7 @@ class YumMetadataObj(object):
         self.proxy_pass = proxy_pass
         self.sslverify  = sslverify
         self.tmp_path = tmp_path
+        self.filter = filter
 
     def getDownloadItems(self, repo_dir="./", packages_location=None,
                          skip=None, newest=False, remove_old=False, numOldPackages=None):
@@ -197,6 +198,8 @@ class YumMetadataObj(object):
         pkglist = self.__getPackageList(newest)
         if remove_old and not newest:
             pkglist = self._prune_package_list(pkglist, numOldPackages)
+        if self.filter:
+            pkglist = self._filter_package_list(pkglist)
         for pkg in pkglist:
             info = {}
             #urljoin doesnt like epoch in rpm name so using string concat
@@ -287,6 +290,22 @@ class YumMetadataObj(object):
         LOG.debug("_prune_package_list() returning %s pruned package list" % (len(pkglist)))
         return pkglist
 
+    def _filter_package_list(self, pkglist):
+        """
+        run pkglist through self.filter
+        pkglist: list of packages as returned from yum's package sack
+        """
+        if pkglist:
+            LOG.debug("YumInfo._filter_package_list(pkglist=<%s packages>)" 
+                      % (len(pkglist)))
+        if not self.filter:
+            LOG.debug("_filter_package_list() called with no filter")
+            return pkglist
+        pkglist_filtered = [ pkg for pkg in pkglist if self.filter.test(pkg) ]
+        LOG.debug("_filter_package_list():  %s packages after filtering" % 
+                  (len(pkglist_filtered)))
+        return pkglist_filtered
+
     def __getstate__(self):
         """
         Get the object state for pickling.
@@ -299,13 +318,14 @@ class YumMetadataObj(object):
 
 class YumInfo(object):
     def __init__(self, repo_label, repo_url, repo_dir="./",
-                packages_location=None,
-                mirrors=None, newest=False,
-                cacert=None, clicert=None, clikey=None,
-                proxy_url=None, proxy_port=None, proxy_user=None,
-                proxy_pass=None, sslverify=1,
-                remove_old=False, numOldPackages=2, skip=None, max_speed=None,
-                purge_orphaned=True, distro_location=None, tmp_path=None):
+                 packages_location=None,
+                 mirrors=None, newest=False,
+                 cacert=None, clicert=None, clikey=None,
+                 proxy_url=None, proxy_port=None, proxy_user=None,
+                 proxy_pass=None, sslverify=1,
+                 remove_old=False, numOldPackages=2, skip=None, max_speed=None,
+                 purge_orphaned=True, distro_location=None, tmp_path=None,
+                 filter=None):
         self.rpms = []
         self.drpms = []
         self.repo_label = repo_label
@@ -335,14 +355,18 @@ class YumInfo(object):
         self.stopped = False
         self.distropath = distro_location
         self.tmp_path = tmp_path
+        self.filter = filter
 
     def setUp(self):
-        yum_metadata_obj = YumMetadataObj(repo_label=self.repo_label, repo_url=self.repo_url,
-                                          mirrorlist=self.mirrors,
-                                          cacert=self.sslcacert, clicert=self.sslclientcert, clikey=self.sslclientkey,
-                                          proxy_url=self.proxy_url, proxy_port=self.proxy_port,
-                                          proxy_user=self.proxy_user, proxy_pass=self.proxy_pass,
-                                          sslverify=self.sslverify, tmp_path=self.tmp_path)
+        yum_metadata_obj = YumMetadataObj(
+            repo_label=self.repo_label, repo_url=self.repo_url,
+            mirrorlist=self.mirrors,
+            cacert=self.sslcacert, clicert=self.sslclientcert, 
+            clikey=self.sslclientkey,
+            proxy_url=self.proxy_url, proxy_port=self.proxy_port,
+            proxy_user=self.proxy_user, proxy_pass=self.proxy_pass,
+            sslverify=self.sslverify, tmp_path=self.tmp_path,
+            filter=self.filter)
         yumAO = None
         try:
             yumAO = ActiveObject(yum_metadata_obj)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]