[libvirt] [PATCH 03/11] util: Add a util to traverse directory tree

Osier Yang jyang at redhat.com
Wed Jun 19 13:56:14 UTC 2013


On 19/06/13 21:44, John Ferlan wrote:
> On 06/07/2013 01:03 PM, Osier Yang wrote:
>> There is POSIX calls to walk through direcotry tree, nftw(3), but
> s/is/are
> s/direcotry/directory
> s/tree/trees  or s/tree/a tree/

hm... I reviewed it times before posting it. :(

>
>> there is no way to allow one to pass user data to the callback:
>>
>> int nftw(const char *dirpath,
>>           int (*fn) (const char *fpath, const struct stat *sb,
>>                      int typeflag, struct FTW *ftwbuf),
>>           int nopenfd, int flags);
>>
>> Assuming a simple requirement: one wants to simply find out all the
>> files which have name "vendor" and it has a specific value, under a
>> directory, It's not possible to achieve with nftw(3). To solve the
>> requirements like this, this new util function comes.
> Not sure the last sentence is necessary
>
>> int virTraverseDirectory(const char *dirpath,
>>                           int depth,
>>                           unsigned int flags,
>>                           virTraverseDirectoryCallback cb,
>>                           virTraverseDirectorySkipCallback skipcb,
>>                           void *opaque,
>>                           char ***filepaths);
>>
>>   * Which allow to traverse a directory limited to specified @depth
>> (relative to the @dirpath)
>>
>>   * Two callbacks are provided, callback @cb is to handle the file path
>> passed to it, with the user data @opaque; callback @skipcb is to
>> allow one to make the rules to skip the passed file path. For example,
>> one can define regex patterns in @skipcb to skip all the file path
>> which matches it.
>>
>>   * @cb should return 0 to indicate the file path is successfully
>> handled, otherwise -1 should be returned.
>>
>>   * Like @cb, @skipcb also should return 0 to indiate the file path
>> will be skipped to handle, otherwise -1 should be returned.
>>
>>   * If passed @filepath is not NULL, it will be filled with the file
>> paths which are successfully handled by @cb.
> What use would a NULL filepath be?  The whole purpose of the code is to
> generate a list of files with a specific pattern.  The end result may be
> an empty list, but passing NULL

I think there are some use cases won't care about the file paths it
processed, it's a future consideration though, I don't use it in that
way currently.

>>   * @flags can be used to control the general behaviour of the
>> traversing. E.g. Don't follow symbol links.
>>
>> A simple example:
>>
>> /*
>>   * utiltest.c: Prints all the file paths whose basename is
>>   *             "vendor" under directory "./testdir".
>>   */
>> static int
>> traverseCallback(const char *path,
>>                   void *opaque)
>> {
>>      const char *name = opaque;
>>      char *p = NULL;
>>
>>      if ((p = strrchr(path, '/')))
>>          p++;
>>
>>      if (STRNEQ(p, name))
>>          return -1;
>>
>>      return 0;
>> }
>>
>> static int
>> findVendor(void)
>> {
>>      char **filepaths = NULL;
>>      unsigned int flags = 0;
>>      const char *name = "vendor";
>>      int n;
>>      int i;
>>
>>      flags |= VIR_TRAVERSE_DIRECTORY_FALL_THROUGH |
>>               VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES ;
>>
>>      if ((n = virTraverseDirectory("./testdir", 2, flags,
>>                                    traverseCallback, NULL,
>>                                    (void *)name, &filepaths)) < 0)
>>          return -1;
>>
>>      for (i = 0; i < n; i++)
>>          printf("Match%d: %s\n", i, filepaths[i]);
>>
>>      for (i = 0; i < n; i++)
>>          VIR_FREE(filepaths[i]);
>>      VIR_FREE(filepaths);
>>
>>      return 0;
>> }
>>
>> Tree view of "./testdir":
>> % tree -a ./testdir/
>> ./testdir/
>> |-- device
>> |-- dir1
>> |   |-- device
>> |   |-- dir2
>> |   |   |-- device
>> |   |   |-- dir3
>> |   |   |   `-- vendor
>> |   |   `-- vendor
>> |   |-- .dir7
>> |   |   `-- vendor
>> |   |-- dir8 -> ../dir4
>> |   `-- vendor
>> |-- dir4
>> |   |-- device
>> |   |-- dir5
>> |   |   |-- device
>> |   |   `-- vendor
>> |   `-- vendor
>> `-- vendor
>>
>> 7 directories, 12 files
>>
>> % ./utiltest
>> Match0: ./testdir/dir1/dir2/vendor
>> Match1: ./testdir/dir4/dir5/vendor
>>
>> Only "vendor" in second depth are returned (flag *_FALL_THROUGH took
>> effect), and the symbol link is not followed.
>>
>> "virTraverseDirectory" is implemented using recursion method, which
>> is generally not good for performance and memory use, and even may
>> eats up the file descriptors. But since we allow to specify the
>> tree "depth", and I don't think libvirt has use cases which need to
>> traverse a very deep directory tree. And on the other hand,
>> Implementing it using either iteration or stack will be much less
>> readable.
>>
>> Later patches will take use of virTraverseDirectory, and tests
>> will be added.
>> ---
>>   src/libvirt_private.syms |   1 +
>>   src/util/virutil.c       | 151 ++++++++++++++++++++++++++++++++++++++++++++++-
>>   src/util/virutil.h       |  52 ++++++++++++++++
>>   3 files changed, 203 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
>> index 1ea7467..fe182e8 100644
>> --- a/src/libvirt_private.syms
>> +++ b/src/libvirt_private.syms
>> @@ -1965,6 +1965,7 @@ virSetNonBlock;
>>   virSetUIDGID;
>>   virSetUIDGIDWithCaps;
>>   virStrIsPrint;
>> +virTraverseDirectory;
>>   virValidateWWN;
>>   
>>   
>> diff --git a/src/util/virutil.c b/src/util/virutil.c
>> index c5246bc..c1938f9 100644
>> --- a/src/util/virutil.c
>> +++ b/src/util/virutil.c
>> @@ -2048,7 +2048,6 @@ virFindFCHostCapableVport(const char *sysfs_prefix ATTRIBUTE_UNUSED)
>>       virReportSystemError(ENOSYS, "%s", _("Not supported on this platform"));
>>       return NULL;
>>   }
>> -
>>   #endif /* __linux__ */
>>   
>>   /**
>> @@ -2070,3 +2069,153 @@ virCompareLimitUlong(unsigned long long a, unsigned long b)
>>   
>>       return -1;
>>   }
>> +
>> +/*
>> + * virTraverseDirectory:
>> + * @dirpath: Directory path, (relative or absolute)
>> + * @depth: The max directory tree depth for traversing
>> + * @cb: Callback to handle file (not directory) during traversing
>> + * @skibcb: Callback to define rules to skip entry
> s/skibcb/skipcb
>
>> + * @opaque: User data to pass to @cb
>> + * @filepaths: Pointer to array to store the file paths, the file
>> + *             path passed to @cb is stored into @filepaths as long
>> + *             as @cb returns 0.
>> + *
>> + * Traverse a directory tree into specified @depth, handing the file
> s/handing/handling
>
>> + * with callback in the process. Caller must free @filepaths and
>> + * its elements after use.
>> + *
>> + * Returns the number of file paths successfully handled by @cb on
>> + * success, with @fillpaths (if passed @fillpath is not NULL) filled,
> s/@fillpaths/@filepaths  (there's 2 of them)
>
>> + * or -1 on failure.
>> + */
> NOTE: If you had run 'make -C tests valgrind' as part of your process
> you would have seen there's a memory leak here and the 'utiltest' fails.
> The output from my run has the following footprints repeated a few times:
>
> ==29216== 8 bytes in 1 blocks are definitely lost in loss record 2 of 39
> ==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
> ==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
> ==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
> ==29216==    by 0x4CBC251: virTraverseDirectory (virutil.c:2592)
> ==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
> ==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
> ==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
> ==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
> ==29216==    by 0x40326F: virtTestRun (testutils.c:158)
> ==29216==    by 0x401D20: mymain (utiltest.c:305)
> ==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
> ==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)
> ==29216==
> ==29216== 8 bytes in 1 blocks are definitely lost in loss record 3 of 39
> ==29216==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
> ==29216==    by 0x4A089F0: realloc (vg_replace_malloc.c:662)
> ==29216==    by 0x4C6C7EE: virReallocN (viralloc.c:184)
> ==29216==    by 0x4CBC3B4: virTraverseDirectory (virutil.c:2569)
> ==29216==    by 0x4CBC384: virTraverseDirectory (virutil.c:2562)
> ==29216==    by 0x4CBCC7D: virFindPCIDeviceByVPD (virutil.c:2084)
> ==29216==    by 0x40202D: testFindPCIDeviceByVPD (utiltest.c:175)
> ==29216==    by 0x40326F: virtTestRun (testutils.c:158)
> ==29216==    by 0x401D20: mymain (utiltest.c:305)
> ==29216==    by 0x4038AA: virtTestMain (testutils.c:722)
> ==29216==    by 0x37C1021A04: (below main) (libc-start.c:225)
>
>
> Beyond that the traversal seems OK to me - I think you're covering the
> basics with hidden files and links. I do remember a recent discussion
> regarding virFileDeleteTree() which may be useful to revisit to ensure
> nothing else is forgotten.

Hm, what virFileDeleteTree does should be able to achieve with

virTraverseDirectory, that's the use case which doesn't have to care
about the returned file paths I think.


>
>> +int
>> +virTraverseDirectory(const char *dirpath,
>> +                     int depth,
>> +                     unsigned int flags,
>> +                     virTraverseDirectoryCallback cb,
>> +                     virTraverseDirectorySkipCallback skipcb,
>> +                     void *opaque,
>> +                     char ***filepaths)
>> +{
>> +    struct dirent *entry = NULL;
>> +    DIR *dir = NULL;
>> +    char *fpath = NULL;
>> +    struct stat st;
>> +    int nfilepaths = 0;
>> +    int ret = -1;
>> +    int i;
>> +
>> +    if (depth < 0)
>> +        return 0;
>> +
>> +    if (filepaths)
>> +        *filepaths = NULL;
>> +
>> +    if (!(dir = opendir(dirpath))) {
>> +        virReportSystemError(errno,
>> +                             _("Failed to open directory '%s'"),
>> +                             dirpath);
>> +        return -1;
>> +    }
>> +
>> +    while ((entry = readdir(dir))) {
>> +        /* Always Ignore "." and ".." */
>> +        if (STREQ(entry->d_name, ".") || STREQ(entry->d_name, ".."))
>> +            continue;
>> +
>> +        /* Ignore hidden files if it's required */
>> +        if (flags & VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES &&
>> +            entry->d_name[0] == '.')
>> +            continue;
>> +
>> +        if (virAsprintf(&fpath, "%s/%s", dirpath, entry->d_name) < 0) {
>> +            virReportOOMError();
>> +            goto error;
>> +        }
>> +
>> +        /* Skip to handle the entry as long as @skipcb returns 0 for it */
>> +        if (skipcb && skipcb(fpath, opaque) == 0) {
>> +            VIR_FREE(fpath);
>> +            continue;
>> +        }
>> +
>> +        if (lstat(fpath, &st) < 0) {
>> +            virReportSystemError(errno,
>> +                                 _("Failed to stat '%s'"),
> technically failed to 'lstat'
>
>> +                                 fpath);
>> +            goto error;
>> +        }
>> +
>> +        /* Don't follow symbol link unless it's explicitly required */
>> +        if (S_ISLNK(st.st_mode) &&
>> +            !(flags & VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK)) {
>> +                VIR_FREE(fpath);
>> +                continue;
>> +        }
>> +
>> +        if (S_ISDIR(st.st_mode)) {
>> +            char **tmp = NULL;
>> +            int rc;
>> +
>> +            if ((rc = virTraverseDirectory(fpath, depth - 1, flags, cb, skipcb,
>> +                                           opaque, filepaths ? &tmp : NULL)) < 0)
>> +                goto error;
>> +
>> +            if (rc > 0) {
>> +                /* Copy the filepaths returned by recursive call */
>> +                if (filepaths) {
>> +                    if (VIR_REALLOC_N((*filepaths), nfilepaths + rc) < 0) {
>> +                        for (i = 0; i < rc; i++)
>> +                            VIR_FREE(tmp[i]);
>> +                        VIR_FREE(tmp);
>> +                        virReportOOMError();
>> +                        goto error;
>> +                    }
>> +
>> +                    for (i = 0; i < rc; i++)
>> +                        (*filepaths)[nfilepaths + i] = tmp[i];
>> +                }
>> +                nfilepaths += rc;
>> +            }
> Perhaps the valgrind error is that the else clause here or the one for
> "if (filepaths)" in the (rc > 0) condition.
>
> In either case, tmp wouldn't be free()'d properly.
>
> If you required filepaths to be non NULL - I believe the whole
> mess/issue would be irrelevant...
>
>> +        } else {
>> +             /* Don't handle the file unless it's the last depth */
>> +             if ((flags & VIR_TRAVERSE_DIRECTORY_FALL_THROUGH) &&
>> +                 depth != 0) {
>> +                 VIR_FREE(fpath);
>> +                 continue;
>> +             }
>> +
>> +             if (cb(fpath, opaque) == 0) {
>> +                if (filepaths) {
>> +                    if (VIR_REALLOC_N((*filepaths), nfilepaths + 1) < 0) {
>> +                        virReportOOMError();
>> +                        goto error;
>> +                    }
>> +
>> +                    if (VIR_STRDUP((*filepaths)[nfilepaths], fpath) < 0)
>> +                        goto error;
>> +
>> +                }
>> +                nfilepaths++;
>> +            }
>> +        }
>> +
>> +        VIR_FREE(fpath);
>> +    }
>> +
>> +    ret = nfilepaths;
>> +
>> +cleanup:
>> +    closedir(dir);
>> +    return ret;
>> +
>> +error:
>> +    VIR_FREE(fpath);
>> +    if (filepaths) {
>> +        for (i = 0; i < nfilepaths; i++)
>> +            VIR_FREE((*filepaths)[i]);
>> +        VIR_FREE(*filepaths);
>> +    }
>> +    goto cleanup;
>> +}
>> diff --git a/src/util/virutil.h b/src/util/virutil.h
>> index 280a18d..6c46f23 100644
>> --- a/src/util/virutil.h
>> +++ b/src/util/virutil.h
>> @@ -166,4 +166,56 @@ char *virFindFCHostCapableVport(const char *sysfs_prefix);
>>   
>>   int virCompareLimitUlong(unsigned long long a, unsigned long b);
>>   
>> +/**
>> + * virTraverseDirectoryCallback:
>> + * @fpath: file path to handle
>> + * @opaque: User data to pass to the callback
>> + *
>> + * Callback for "virTraverseDirectory".
>> + *
>> + * Returns 0 on success, -1 on failure.
>> + */
>> +typedef int (*virTraverseDirectoryCallback)(const char *fpath,
>> +                                            void *opaque);
>> +
>> +/**
>> + * virTraverseDirectorySkipCallback:
>> + * @fpath: file path to handle
>> + * @opaque: User data to pass to the callback
>> + *
>> + * Callback to control whether virTraverseDirectory should ship
>> + * @fpath E.g. Skipping files whose name match a regex pattern.
>> + *
>> + * Return 0 to let "virTraverseDirectory" skip handling the @fpath,
>> + * -1 to not skip.
>> + */
>> +typedef int (*virTraverseDirectorySkipCallback)(const char *fpath,
>> +                                                void *opaque);
>> +
>> +/**
>> + * virTraverseDirectoryFlags:
>> + *
>> + * Except virTraverseDirectorySkipCallback, one can use these flags to
>> + * control the general behaviour of virTraverseDirectory
>> + */
>> +enum {
>> +    /* Follow symbol links */
>> +    VIR_TRAVERSE_DIRECTORY_FOLLOW_LINK = 1 << 0,
>> +
>> +    /* Ignore hidden files or directory */
>> +    VIR_TRAVERSE_DIRECTORY_IGNORE_HIDDEN_FILES = 1 << 1,
>> +
>> +    /* Don't handle files unless it's in the last depth */
>> +    VIR_TRAVERSE_DIRECTORY_FALL_THROUGH = 1 << 2,
> Better named "MATCH_DEPTH_ONLY"?  FALL_THROUGH just has other

Not sure, I know FALL_THROUGH is not a good name, but I don't see
MATCH_DEPTH_ONLY makes it better either.. :-), since whether it's
matching or anything else, completely depends on what @cb does.


>
>> +} virTraverseDirectoryFlags;
>> +
>> +int virTraverseDirectory(const char *dirpath,
>> +                         int depth,
>> +                         unsigned int flags,
>> +                         virTraverseDirectoryCallback cb,
>> +                         virTraverseDirectorySkipCallback skipcb,
>> +                         void *opaque,
>> +                         char ***filepaths);
>> +    ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3)
> Not sure "3" is what you mean - perhaps you meant 4 (eg, cb).

yes.. I made too many typos..

>
> It also stands to reason filepaths shouldn't be NULL.  I guess I just
> don't see the reason to return just a count, but perhaps there's a use
> case I'm not thinking about or will be apparent in future patches.
>
> John
>
>> +
>>   #endif /* __VIR_UTIL_H__ */
>>




More information about the libvir-list mailing list