[augeas-devel] Basic support for multipart key

Mon Oct 18 23:39:09 UTC 2010

On Mon, 2010-10-18 at 16:19 -0400, Francis Giraldeau wrote:
> module Multipart =
> 
> let dir = [ label "@" . key /[a]+/ . del "x" "x" ]
> let sec = [ label "#" . key /[a]+/ . del "y" "y" ]
> let foo = (dir|sec)*
> test foo get "axaay" = ?
> test foo put "ax" after clear "/#aa" = ?
> 
> $ augparse -I lenses/ --nostdinc multipart.aug
> Test result: multipart.aug:6.0-.24:
>    { "@a" }
>    { "#aa" }
> Test result: multipart.aug:7.0-.40:
> "axaay"
> 
> Then the put ambiguity is avoided and the tree structure is more  
> natural, because the key is on the tree path.

I like this extension a lot; I do have a few comments on the proposed
patch:

> >From 5113df2f28cfdbacba57c1755c565fdcc2d73a4e Mon Sep 17 00:00:00 2001
> From: Francis Giraldeau <francis.giraldeau at usherbrooke.ca>
> Date: Mon, 18 Oct 2010 15:38:56 -0400
> Subject: [PATCH] Basic support for multipart keys
> 
> diff --git a/src/get.c b/src/get.c
> index 11ea5de..1bea57b 100644
> --- a/src/get.c
> +++ b/src/get.c
> @@ -433,10 +433,26 @@ static struct skel *parse_value(struct lens *lens,
>  
>  static struct tree *get_key(struct lens *lens, struct state *state) {
>      ensure0(lens->tag == L_KEY, state->info);
> -    if (! REG_MATCHED(state))
> +    char *tok, *key;
> +    int len = 0;
> +    if (! REG_MATCHED(state)) {
>          no_match_error(state, lens);
> -    else
> -        state->key = token(state);
> +        return NULL;
> +    }
> +    tok = token(state);
> +    if (state->key != NULL) {
> +        len = strlen(state->key) + strlen(tok) + 1;
> +        key = malloc(len);
> +        // FIXME: handle no memory error
> +        //if (key < 0)
> +        //    return ENOMEM;
> +        snprintf(key, len, "%s%s", state->key, tok);
> +        free(state->key);
> +        free(tok);
> +        state->key = key;

Don't use malloc/calloc/realloc directly - they have all kinds of
problems; instead use the wrappers from memory.h. Using them, get_key
will look something like:

        static struct tree *get_key(struct lens *lens, struct state *state) {
            ensure0(lens->tag == L_KEY, state->info);
            int r;

            if (! REG_MATCHED(state)) {
                no_match_error(state, lens);
            } else {
                char *tok = token(state);
                if (tok == NULL) {
                   // FIXME: get_error will likely crash and burn on OOM
                   get_error(state, lens, "Out of memory");
                   return NULL;
                }
                r = REALLOC_N(state->key, strlen(state->key) + strlen(tok) + 1)
                if (r < 0) {
                  get_error(state, lens, "Out of memory");
                  return NULL;
                }
                strcat(state->key, tok);
            }
            return NULL;
        }

Doing the concatenation in get_key seems a little unclean to me, though
I think it's ok. Conceptually, concatenation belongs in get_concat and
get_quant_star - I can't think of a case where doing concatenation in
the get_key/get_label functions will cause trouble though.

> diff --git a/src/lens.c b/src/lens.c
> index 29859f5..d73ac7e 100644
> --- a/src/lens.c
> +++ b/src/lens.c
> @@ -198,6 +198,7 @@ static struct lens *make_lens_binop(enum lens_tag tag, struct info *info,
>      for (int i=0; i < lens->nchildren; i++) {
>          lens->value = lens->value || lens->children[i]->value;
>          lens->key = lens->key || lens->children[i]->key;
> +        lens->label = lens->label || lens->children[i]->label;
>      }
>  
>      if (ALLOC_N(types, lens->nchildren) < 0)
> @@ -269,9 +270,14 @@ struct value *lns_make_concat(struct info *info,
>          return make_exn_value(info, "Multiple stores in concat");
>      }
>      if (l1->key && l2->key) {
> -        return make_exn_value(info, "Multiple keys/labels in concat");
> +        return make_exn_value(info, "Multiple keys in concat");
> +    }
> +    if (l1->label && l2->label) {
> +        return make_exn_value(info, "Multiple labels in concat");
> +    }
> +    if (l1->key && l2->label) {
> +        return make_exn_value(info, "Label lens following a key lens is not allowed");
>      }

I don't think we need to flag this as an error anymore; shouldn't it be
possible now to say something like

        let l = [ label "a" . (label "b" . del "x" "x" | label "c" . del "y" "y") ]

IOW, it should be possible to have concatenations of any number of keys
and labels, assuming the lenses are unambiguously concatenable.

As far as I can see, we should be able to get rid of the key flag in
struct lens altogether.

I'd also expect that the typechecker now needs to get involved with
building a ktype from the concatenation/starring of lenses. Before this
change, for the lens l = l1 . l2, the ktype was either the ktype of l1
or the ktype of l2, since we required that only one of them could have a
non-empty ktype.

Now, we say that ktype(l) = ktype(l1) . ktype(l2); as a consequence, we
need to check that ktype(l1) and ktype(l2) are unambiguously
concatenable. For the put direction, we can decompose the label of a
tree node into the l1 and l2 bits by properly parenthesizing the two
regular expressions for hte ktypes of l1 and l2, and then looking into
the right match registers, completely analogous to how put_concat does
its thing.

I'd have to dig a little deeper, but I believe that this might just work
if the put/create for key and label use the matched substring belonging
to 'their' group in the overall atype; it would require that the
combinators all track the current register similar to what their cousins
in the get_* functions do.

Another way to think about this is: the ktype of a subtree lens gives
you a properly parenthesized regular expression that you can match
against the label of the current tree node. As you process the lens
structure inside the subtree lens, track what group in the overall ktype
regexp corresponds to the current lens. When you reach a key lens,
output the substring corresponding to the group for the current lens;
for a label lens you should check that the string for the current group
is the lens->string, otherwise there's a problem with the register/group
tracking.

You can think of the registers/groups in the regular expression as a
parse tree for the entire lens structure inside a subtree lens (where
further nested subtree lenses are considered leaves with an empty ktype)

> diff --git a/tests/modules/fail_multi_label_concat.aug b/tests/modules/fail_multi_label_concat.aug
> new file mode 100644
> index 0000000..2b782bf
> --- /dev/null
> +++ b/tests/modules/fail_multi_label_concat.aug
> @@ -0,0 +1,3 @@
> +module Fail_multi_label_concat =
> +
> +  let lns = label "a" . label "b"

This should really be a passing test.

> diff --git a/tests/modules/fail_multipart_order.aug b/tests/modules/fail_multipart_order.aug
> new file mode 100644
> index 0000000..97ba828
> --- /dev/null
> +++ b/tests/modules/fail_multipart_order.aug
> @@ -0,0 +1,3 @@
> +module Fail_multipart_order = 
> +
> +    let k1 = [ key /[a]*/ . label "b" ]

This should also be a passing test.

David