[libvirt] [PATCH v3 05/22] build-aux: rewrite whitespace checker in Python

Daniel P. Berrangé berrange at redhat.com
Fri Sep 27 09:33:45 UTC 2019


On Thu, Sep 26, 2019 at 06:08:14PM +0200, Ján Tomko wrote:
> On Tue, Sep 24, 2019 at 03:58:46PM +0100, Daniel P. Berrangé wrote:
> > As part of an goal to eliminate Perl from libvirt build tools,
> > rewrite the check-spacing.pl tool in Python.
> > 
> > This was a straight conversion, manually going line-by-line to
> > change the syntax from Perl to Python. Thus the overall structure
> > of the file and approach is the same.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange at redhat.com>
> > ---
> > Makefile.am                |   2 +-
> > build-aux/check-spacing.pl | 198 --------------------------------
> > build-aux/check-spacing.py | 229 +++++++++++++++++++++++++++++++++++++
> > cfg.mk                     |   4 +-
> > 4 files changed, 232 insertions(+), 201 deletions(-)
> > delete mode 100755 build-aux/check-spacing.pl
> > create mode 100755 build-aux/check-spacing.py
> > 
> > diff --git a/build-aux/check-spacing.py b/build-aux/check-spacing.py
> > new file mode 100755
> > index 0000000000..6b9f3ec1ba
> > --- /dev/null
> > +++ b/build-aux/check-spacing.py
> > @@ -0,0 +1,229 @@
> > +#!/usr/bin/env python
> > +#
> > +# Copyright (C) 2012-2019 Red Hat, Inc.
> > +#
> > +# check-spacing.pl: Report any usage of 'function (..args..)'
> > +# Also check for other syntax issues, such as correct use of ';'
> > +#
> > +# This library is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU Lesser General Public
> > +# License as published by the Free Software Foundation; either
> > +# version 2.1 of the License, or (at your option) any later version.
> > +#
> > +# This library is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +# Lesser General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU Lesser General Public
> > +# License along with this library.  If not, see
> > +# <http://www.gnu.org/licenses/>.
> > +
> > +from __future__ import print_function
> > +
> > +import re
> > +import sys
> > +
> > +
> > +def check_whitespace(filename):
> > +    errs = False
> > +    with open(filename, 'r') as fh:
> > +        quotedmetaprog = re.compile(r"""'[";,=]'""")
> > +        quotedstringprog = re.compile(r'''"(?:[^\\\"]|\\.)*"''')
> > +        commentstartprog = re.compile(r'''^(.*)/\*.*$''')
> > +        commentendprog = re.compile(r'''^.*\*/(.*)$''')
> > +        commentprog = re.compile(r'''^(.*)/\*.*\*/(.*)''')
> > +        funcprog = re.compile(r'''(\w+)\s\((?!\*)''')
> > +        keywordprog = re.compile(
> > +            r'''^.*\b(?:if|for|while|switch|return)\(.*$''')
> > +        functypedefprog = re.compile(r'''^.*\(\*\w+\)\s+\(.*$''')
> > +        whitespaceprog1 = re.compile(r'''^.*\s\).*$''')
> > +        whitespaceprog2 = re.compile(r'''^\s+\);?$''')
> > +        whitespaceprog3 = re.compile(r'''^.*\((?!$)\s.*''')
> > +        commasemiprog1 = re.compile(r'''.*\s[;,].*''')
> > +        commasemiprog2 = re.compile(r'''.*\S; ; .*''')
> > +        commasemiprog3 = re.compile(r'''^\s+;''')
> > +        semicolonprog = re.compile(r'''.*;[^	 \\\n;)].*''')
> > +        commaprog = re.compile(r'''.*,[^ \\\n)}].*''')
> > +        assignprog1 = re.compile(r'''[^ ]\b[!<>&|\-+*\/%\^=]?=''')
> > +        assignprog2 = re.compile(r'''=[^= \\\n]''')
> > +        condstartprog = re.compile(r'''^\s*(if|while|for)\b.*\{$''')
> > +        statementprog = re.compile(r'''^[^;]*;[^;]*$''')
> > +        condendprog = re.compile(r'''^\s*}\s*$''')
> > +
> 
> I think even with descriptive names for the regexes and the Python
> rewrite, this script is hard to read and comes too close to be a C
> parser.

Yeah, trying to parse C code using regular expressions was not
our most sensible decision.

> Also, the execution time goes from 1.5s to 6.5s, which is longer than
> all the other checks combined run on my 8-core machine.

Thanks for pointing that out. I'll see if there's any low hanging
fruit to optimize but I'm doubtful.

I dont mind postponing this patch, as all patches in this series
are independant of each other.

> Can we get rid of it completely in favor of some proper formatting tool?

I think that would be a good idea because this script only handles
a tiny subset of formatting rules we like.

> I have played with clang-format to try to match our style, the main
> problems seem to be:
> * pre-processor directives are indented by the same offset as code
>  (I see that cppi is specifically intended to add just one space)

That's an interesting approach. I wouldn't object to such indentation
style myself.

> * function calls sometimes leave an empty opening parenthesis

> * always rewrapping function arguments might create unnecessary churn
> * parameters wrapping might not be smart enough, e.g. we like to do
>  virReportError(VIR_ERR_CODE, "%s",
>                 _("string"));
>  in a lot of places.

Yeah these last two points are the ones I struggled with too when I
looked at clang-format 6 months back.

The churn does worry me as it would make cherry-picking patches a
big pain for downstreams.

In the long term I think we'd win by having an explicit code formatting 
tool that everyone is expected to comply with, even if it isn't quite the
same style that we currently use. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list