[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 2/3] Improve tokenizing of linkable terms

Am Donnerstag 11 August 2011 21:45:18 schrieb Eric Blake:
> On 08/11/2011 06:44 AM, Philipp Hahn wrote:
> > Currently only tabs and blanks are used for tokenizing the description,
> > which breaks when a term is at the end of a line or has () appended to
> > it.
> > 1. Use also other white space characters such as new-lines and carriage
> >     return for splitting.
> > 2. Remove some common non-word characters from the token before lookup.
> >
> > Signed-off-by: Philipp Hahn<hahn univention de>
> > ---
> >   docs/newapi.xsl |    9 ++++++---
> >   1 files changed, 6 insertions(+), 3 deletions(-)
> I am not fluent in reading xsl files.  How would I go about testing if
> the output is more visually appealing?  I can ack on the basis of
> comparison on before vs. after appearance, but only if I figure out
> which files are affected to compare a view in my browser of the
> generated html.

Go to <http://libvirt.org/html/libvirt-libvirt.html> at search for "()" or 
strings starting with "VIR_": You'll notice lots of them, which aren't links, 
but many others are. This hapens because only space characters were used to 
separate word, which then looked for "virDomainGetVcpus()" instead of 
just "virDomainGetVcpus". The other case was the keywords were at the end on 
line, where the "\n" wasn't used for word breaking, so the search went for 
"VIR_DOMAIN_MEMORY_HARD_LIMIT\nsomething" instead of 

It's still not perfect, because only "_function_()" is clickable and 
not "function()", but at least it is. It would have been easier with RegExps 
<http://www.exslt.org/regexp/> for stemming, but that isn't supported by 

Philipp Hahn
Philipp Hahn           Open Source Software Engineer      hahn univention de
Univention GmbH        Linux for Your Business        fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen                 fax: +49 421 22 232-99

Attachment: signature.asc
Description: This is a digitally signed message part.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]