fix u-m-d-l

Matt Domsch Matt_Domsch at dell.com
Sat Apr 25 04:59:14 UTC 2009


On Fri, Apr 24, 2009 at 11:48:16PM -0500, Matt Domsch wrote:
> On Fri, Apr 24, 2009 at 07:44:50PM -0500, Matt Domsch wrote:
> > If you see me monkey with u-m-d-l on bapp1, that's what I'm trying to
> > figure out...
> 
> Found it...
> 
> update-master-directory-list was trying to be smart and failed.  If it
> saw that a directory's ctime hadn't changed, it skipped it and moved
> on.  But, a directory's ctime won't change if one of its _subdirectories' ctime_
> changes.  Because u-m-d-l runs every 30 minutes or so, it appears to
> catch tree updates mid-flight.  In one run it sees updates/10/x86_64/
> has changed, but that repodata/ under that has not (yet).  So it
> marks updates/10/x86_64 as changed and moves on.  On the next pass,
> updates/10/x86_64 of course _has not changed_, but it's repodata
> subdir has.  This is what it was missing...  It would skip processing
> the repodata subdir.
> 
> (and yes, this would throw off the crawler too, which people have been
> complaining about being added and removed from the list somewhat
> randomly...)
> 
> I'm working on a fix, which will involve changing
> update-master-directory-list.  But that should be the only change.

This is the patch I want to apply on bapp1 to
update-master-directory-list.  It ensures that changes in repodata/
directories are handled, even if the parent directories don't appear
to have changed.  It still tries to be smart by not stat()ing files in
a directory which hasn't changed it's ctime.

Oh what I would give if inotify/dnotify worked on NFS...

Can I get some +1s?


--- update-master-directory-list	2009-04-07 03:53:55.000000000 +0000
+++ /home/fedora/mdomsch/update-master-directory-list	2009-04-25 04:50:18.000000000 +0000
@@ -168,8 +168,9 @@
     
 
 def make_repomd_file_details(dir):
-    repodataDir = dir.name + '/repodata'
-    repomd_fname = os.path.join(rootdir, dir.name, 'repodata', 'repomd.xml')
+    if not dir.name.endswith('/repodata'):
+        return
+    repomd_fname = os.path.join(rootdir, dir.name, 'repomd.xml')
     if not os.path.exists(repomd_fname):
         return
     try:
@@ -267,7 +268,7 @@
         try:
             category_directories[parent_dname]['isRepository'] = True
         except KeyError:
-            category_directories[parent_dname] = {'files':{}, 'isRepository':True, 'readable':readable}
+            category_directories[parent_dname] = {'files':{}, 'isRepository':True, 'readable':readable, 'ctime':ctime}
             
     return dname, category_directories
 
@@ -328,16 +329,17 @@
         except SQLObjectNotFound:
             dir = Directory(name=dirpath,readable=value['readable'], ctime=value['ctime'])
             dir.addCategory(category)
-        if dir.files != short_filelist(value['files']):
-            dir.files = short_filelist(value['files'])
+        if value['changed']:
+            if dir.files != short_filelist(value['files']):
+                dir.files = short_filelist(value['files'])
         make_file_details_from_checksums(dir)
 
     # this has to be a second pass to be sure the child repodata/ dir is created in the db first
     for dirpath, value in category_directories.iteritems():
+        dir = Directory.byName(dirpath)
         if value['isRepository']:
-            dir = Directory.byName(dirpath)
             make_repository(dir, category)
-            make_repomd_file_details(dir)
+        make_repomd_file_details(dir)
     ageFileDetails()
 
 def parse_rsync_listing(cname, f):
@@ -417,27 +419,31 @@
         dname = dname.rstrip('/')
         try:
             d = Directory.byName(dname)
-            if d.ctime == ctime:
-                # break out here because nothing has changed
-                continue
+            d_ctime = d.ctime
         except SQLObjectNotFound:
             # we'll need to create it
-            pass
+            d_ctime = 0
 
-        print "%s has changed" % dname
         mode = s.st_mode
         readable = (mode & stat.S_IRWXO & (stat.S_IROTH|stat.S_IXOTH))
         if not readable:
             unreadable_dirs[dname] = True
         isRepo = 'repodata' in dirnames
-        category_directories[dname] = {'files':{}, 'isRepository':isRepo, 'readable':readable, 'ctime':ctime}
-        for f in filenames:
-            try:
-                s = os.stat(os.path.join(dirpath, f))
-            except OSError:
-                continue
-            category_directories[dname]['files'][f] = {'size':str(s.st_size),
-                                                       'stat':s[stat.ST_CTIME]}
+
+        changed = (d_ctime != ctime)
+        if changed:
+            print "%s has changed" % dname
+        category_directories[dname] = {'files':{}, 'isRepository':isRepo, 'readable':readable, 'ctime':ctime, 'changed':changed}
+
+        # skip per-file stat()s if the directory hasn't changed
+        if changed:
+            for f in filenames:
+                try:
+                    s = os.stat(os.path.join(dirpath, f))
+                except OSError:
+                    continue
+                category_directories[dname]['files'][f] = {'size':str(s.st_size),
+                                                           'stat':s[stat.ST_CTIME]}
 
     sync_category_directories(category, category_directories)
 



-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux




More information about the Fedora-infrastructure-list mailing list