[augeas-devel] Some ideas about how to use Augeas with IPA

Dmitri Pal dpal at redhat.com
Tue May 13 15:39:29 UTC 2008


Hi David,

Thank you for your responses.
I think there is a conceptual thing and may be I do not understand the 
concept. Here is how I view things.
The internal representation is the tree. Tree is abstracted from lenses. 
Tree has nodes, labels and values.
I can write a lenses that will save this tree into the configuration 
file. Using the same lenses I would be able to restore the tree from the 
file.
Now I can create another lenses that will store the same tree in the XML 
file and restore the tree from this file.
Now with these two lenses I can convert the data from XML to config file 
and back.
Library can't do this at the moment since the tree is associated with 
only one lenses. To achieve what I need I would have to manually go 
through the tree and copy nodes from one tree to another using set 
command. I view it as an overhead. It seems it is much easier to have a 
"copy tree command" in the library, and/or allow to associate tree with 
a different lenses. This is only possible and would work if the 
developer knows that two lenses create same tree.

Another point is that the library works only with file. For our purposes 
we would need to feed library with a buffer. Library will behave the 
same way it will parse the data using specified lenses but not from 
file. This would allow us to accomplish format conversion that we have 
to do.
If it is not a native part of the library it would be much more work for 
us. We would have to store the buffer in the file and give file to 
library. This is not a good practice. Alternatively we would have to 
implement the functionality similar to what library provides but outside 
it. It is duplication of effort. I think it would be beneficial to the 
library to be able to read data from memory buffer as well as file.

I do not understand how the regular expressions can change when nodes 
are inserted or deleted.
So I do not understand the explanation on why the validation can't be 
performed.

I am trying to  optimize things. I do not see a reason why an 
application would have to create a parallel in memory representation of 
the tree outside Augeas library. The library already creates a tree. It 
would be much more valuable to be able to manipulate it directly (which 
to some extent already exists with the commands) by copying tree, 
sorting tree, changing order and so on. There are configuration files 
where order matters but in many others it does not. It is up to the 
developer to understand the ordering requirements of the data and file 
and act with it accordingly. Library shall not restrict or enforce the 
ordering.

The notion that the files are small is completely wrong. I have seen 
hosts file with 10000 lines. We are building the centralized management 
for thousands of the machines and we shall assume that the volumes of 
information we are going to deal will be substantial. Out client 
application that will do a lot of policy merging and conversion can't 
spend time doing n-square lookups. It will not fly.

I hope that I explained the reasoning.

I understand that you try to keep the library very file focused. Without 
extending the library into the direction I am talking about we would 
have to duplicate a lot of what library already does in our code. I 
think it is not efficient.
I am not saying that we would not help. I think we need to come to 
agreement of what can/would be done and depending upon that what can be 
our involvement.

Thanks
Dmitri

David Lutterkort wrote:
> On Mon, 2008-05-12 at 19:20 -0400, Dmitri Pal wrote:
>   
>>>> 1) When setting the value validate the provided data against the regular 
>>>> expression.
>>>> Let us say that we have a file that reads several comma separated values 
>>>> per rule.
>>>> Then the lenses will contain a definition of the "word" as any character 
>>>> that is not a space or a comma. The defined "word" will be used in the 
>>>> parsing rule. If then when use set command to update the values in the 
>>>> entry we provide value that contains comma the set command should fail 
>>>> since the data when saved will violate the lenses grammar and library 
>>>> will fail to parse it back.
>>>>     
>>>>         
>>> In general, it's not possible to perform those checks at any other time
>>> than when aug_save is called (and the caller says implicitly "the tree
>>> should be ok now") The regular expression that a value must match can
>>> change when other nodes are inserted/deleted into the tree. Though it
>>> will be rare, it would be very hard to detect whether the regular
>>> expression can change or not.
>>>
>>> Augeas does refuse to modify files if those modifications would result
>>> in a file that is not parseable. (Though there are probably a good
>>> numebr of bugs in that area ;)
>>>
>>>   
>>>       
>> I am talking about checking the value at the moment of the set command 
>> or function.
>> The label and the token that go after it in the file (when read from 
>> file) are matched with the regular expression.
>> Use the same regular expression to check that the data that is currently 
>> set for this label matches the regular expression defined for this label 
>> in lenses.
>>     
>
> I understood that - did my explanation of why that is not possible (or
> at the very least, very difficult) make sense ?
>
>   
>>> But you are right, more powerful path expressions is certainly something
>>> that would be interesting.
>>>
>>>   
>>>       
>> I am talking about the regular expression in the <value> field not in 
>> the <path> field.
>>     
>
> Ahh ... why can't you do what augtool does ? It does:
>
>             cnt = aug_match(aug, pattern, &matches);
>             for (int i=0; i < cnt; i++) {
>                 const char *val;
>                 aug_get(aug, matches[i], &val);
>                 if (val matches regexp) {        /* <--- Filter by regexp */
>                     .. act on VAL ..
>                 }
>                 free((void *) matches[i]);
>             }
>             free(matches);
>         
> At some point, there probably needs to be an iterator-like interface to
> avoid allocating all the paths for matches athte same time. I don't see
> the point of pushing the logic of selecting based on value into the
> library for aug_match.
>
>   
>>> Are you talking about the file being stored in its entirety in a DB or
>>> LDAP or just some parts of it, e.g. certain key values that you want to
>>> be able to control from your application ?
>>>   
>>>       
>> File can be stored in some form for example each node on the server can 
>> be a database record or LDAP attribute or XML file.
>> The records or attributes or files that only make sense in the context 
>> of the specific machine will be selected and send over to the client to 
>> be stored in the actual file. Allowing library to spit the data for a 
>> node of the whole tree into the memory buffer using a provided lenses 
>> would solve the problem. Same is with reading data from buffer.
>>     
>
> I am not quite sure I understand .. are you talking about having Augeas
> pull things from a network-wide store that contains settings for many
> hosts and select the settings that apply to the local host ? If that's
> the case, that will never be part of Augeas' functionality - instead
> that logic has to live one layer higher in a separate tool that uses
> Augeas. 
>
> Most config mgmt systems (e.g., puppet) have functionality to generate
> the config for one host from a sitewide description and they all differ
> in how that is expressed - adding something like that to Augeas would
> limit its usefulness rather than enhance it.
>
>   
>> The tree that you have is just an ordered list of the value pairs. If I 
>> have two different lenses that can save on and the same data in 
>> different formats and read it back my problem is solved.
>>     
>
> Keep in mind that a lens is a pair of functions: one going from the
> file/string to the tree and one going back from the tree (plus original
> file/string) back to a modified file/string.
>
> There are no guarantees on the overall behavior if you use one direction
> of one lens and then the other direction of another lens - the whole
> point of lenses is that by bundling the two directions you can give
> certain guarantees on how the two behave when you do a roundtrip.
>
>   
>>  This would 
>> allow reading data from the database/LDAP/XML; constructing a new tree 
>> from pieces; saving the tree into the buffer using a data transit 
>> lenses; receiving data on the client; restoring tree using data transit 
>> lenses and saving the data into the file using the format specific to 
>> the file.
>>     
>
> There are projects that have tried that (e.g., Elektra, there have been
> others, too) and the sad experience is that this process is so complex
> that you get nowhere fast. None of the config tools that tried what you
> describe have gotten anywhere.
>
> The most promising approach to the above is probably Harmony[1] which
> focuses on synchronizing data between different data formats; the
> general approach is to transform each input format (parsed into a tree)
> to some common format (also a tree) using a lens, sync between trees in
> the common format, and then transform back using that same lens. With
> that power comes also a good deal of complexity, and when you write your
> input format -> common format lenses you need to think carefully about
> the sync semantics you ultimately want.
>
> Having said all that, the overall architecture I have in mind is that
> puppet be used as the sitewide config management system (which also
> deals with a host of issues that Augeas does not deal with, like package
> management, enabling and starting services etc.) and that Puppet be
> enabled to do its low-level changes on the client using Augeas where
> that makes sense.
>
> For things like installers, UI's etc. that inherently only deal with one
> system, Augeas is useful as the low-level config file editor.
> Eventually, I hope that things like UI's are supported through a dbus
> service with PolicyKit - Harald Hoyer did a very nice experiment where
> he hooked system-config-boot up to Augeas in this manner.
>
>   
>> But you have to use the number larger than the number you already have 
>> in the tree, right? This adds the node to the end of the of the list and 
>> I do not see a way to add thing in the middle. Am I missing something 
>> and insert with existing number will add a new entry and shift others?
>> I have not tried that.
>>     
>
> Node are not sorted by their label; their are strictly kept in the order
> in which they were created. Have a look at tests/rec-hosts-add.rb from
> the tarball or the source checkout.
>
>   
>> When you have to merge two lists the best way is to sort them and use 
>> traverse once.
>>     
>
> But you lose the initial order; and almost all of the lists we are
> talking about here are very short, so the O(n^2) complexity of going
> through one list and looking each entry up in the other vs. the O(n log
> n) complexity of sorting and then walking the lists in lockstep won't
> make much of a difference in practice.
>
>   
>> I am trying to discourage the tool from being that file centric. It will 
>> be very convenient to use the library for the format transformation in 
>> the memory. The data being stored after some filtering, sorting, merging 
>> , serializing, deserializing ends up in the file in the order and format 
>> we need. With very minor changes (I think) but related to the 
>> disconnecting library from the file we can make it applicable to a much 
>> broader group of tasks that it was not originally intended for (IPA). 
>> Otherwise we would have to re-implement a lot of logic and functionality 
>> that library currently does but does not expose to the outside world.  
>>     
>
> Again, this would be much easier for me to understand with some very
> concrete examples (what data gets pulled from where, what is modified
> etc.); keep in mind that Augeas is not a generic data synchronization
> tool. It aims at making configuration data stored in local files more
> easily accessible and safely modifiable - because that data is used in
> so many different ways, more complex tasks should be built on top of it.
>
> David
>
> [1] http://alliance.seas.upenn.edu/~harmony/
>
>
>   


-- 
Dmitri Pal
Engineering Manager
Red Hat Inc. 




More information about the augeas-devel mailing list