[Freeipa-devel] [PATCH] 0015 Only split CSV strings once

Rob Crittenden rcritten at redhat.com
Tue Mar 20 21:08:17 UTC 2012


Petr Viktorin wrote:
> On 03/16/2012 12:55 PM, Petr Viktorin wrote:
>> On 03/15/2012 08:55 PM, Rob Crittenden wrote:
>>> Petr Viktorin wrote:
>>>> https://fedorahosted.org/freeipa/ticket/2227 (Unable to add certain
>>>> sudo
>>>> commands to groups). What an interesting bug to get :)
>>>>
>>>>
>>>> One problem with our CSV splitting is that it's not idempotent
>>>> (baskslashes are "eaten" when there are escaped commas), but when we
>>>> forward a call it gets done on both the client and the server. The
>>>> attached patch fixes that in a somewhat hacky way. It's not a complete
>>>> fix for the issue though, for that I need to ask what to do.
>>>>
>>>>
>>>> Some Params use the CSV format when we just need a list of
>>>> comma-separated values. Our flavor of CSV does escaping in two
>>>> different
>>>> ways. This is pretty bad for predictability, least-surprise,
>>>> documentation, ...
>>>>
>>>> To recap, the two ways are escaping (escaped commas are ignored,
>>>> backslashes also need to be escaped) and quoting (values are optionally
>>>> enclosed in double quotes, to include a '"' in a quoted string, use
>>>> '""').
>>>> Escaping is good because users are used to it, but currently literal
>>>> backslashes need to be doubled, making multivalue syntax different from
>>>> single-value syntax, even if there are no commas in the value.
>>>> Quoting is weird, but complete by itself so it doesn't also need
>>>> escaping.
>>>>
>>>> Do we need to use both? Is this documented somewhere? Some of our tests
>>>> check only for one, some assume the other. Users are probably even more
>>>> confused.
>>>>
>>>>
>>>> If we only keep one of those, the fix for #2227 should be quite easy.
>>>> If not (backwards compatibility), we need to document this properly,
>>>> test all the corner cases, and fix the UI to handle CSV escaping.
>>>> Since it's subtly broken in lots of places, maybe it's best to break
>>>> backwards compatibility and go for a simple solution now.
>>>>
>>>> Anyway, if I want to do a proper fix, CSV handling should be only done
>>>> on the client. Unfortunately turning off CSV handling in the server
>>>> will
>>>> break the UI, which at places uses `join(',')` (no escaping commas, no
>>>> single place to change it), even though it can just send a list.
>>>
>>> I'm with Honza, not too keen on the _forwarded_call part. I'd rather you
>>> do a mix of comparing self.env.in_server and whether the value is a
>>> basestring to determine if we need to split it.
>>
>> This was a hack necessary because the WebUI relied on server-side
>> splitting (in a few cases, and it did not escape correctly). Now that
>> Petr Vobornik's patch 99 is in, it is unnecessary.
>> This discussion thread has an updated patch (0015-03); what I'm
>> attaching now builds on top of that to add several new changes in tests.
>> To summarize it: the splitting is only done in the client; from the RPC
>> call on no CSV magic is involved at all.
>>
>> Please tell me your reasons for basestring checking. I think it's
>> entirely unnecessary and just adds complexity.
>> In the rest of the framework a string, as a parameter value, has the
>> same effect as a list of that one string. Why break this?
>> We are concerned about the API here, not the command line. Shortcuts to
>> save the user's keystrokes in the common case should *not* win over
>> predictability and making sure the easier way is the correct, robust way
>> to do it. Compare -- with basestring checking:
>>
>> some_command(arg=escape_commas(some_string))
>> - is correct
>>
>> some_command(arg=[some_string])
>> - is also correct
>>
>> some_command(arg=some_string)
>> - is WRONG because the string is not escaped
>> - but is the easiest solution!
>>
>> The problem is much worse because it is subtle: it only affects values
>> with commas and backslashes, so *the wrong solution would work* in most
>> of the cases.
>> And I can assure you plugin writers *would* pick the wrong way if it
>> would usually work. Our own web UI had examples.
>>
>> Contrast to my proposed behavior:
>> some_command(arg=some_string)
>> - is correct
>>
>> some_command(arg=[some_string])
>> - is also correct
>>
>> some_command(arg=escape_commas(some_string))
>> - is wrong, but doesn't make much sense because at this level you don't
>> have to worry about CSV at all
>>
>> Basestring checking would add unnecessary complexity for both us *and*
>> the plugin writer. The only case it would make it easier for the plugin
>> writer is if the plugin took CSV values ­-- but why force a random
>> format, tailored for our specific frontend, into RPC clients?
>>
>> The client should translate the command line into a RPC call. There's no
>> reason to arbitrarily expect the server to do part of the job.
>>
>>
>>> I'm not sure why this is broken out of normalize. Isn't this normalizing
>>> the incoming values into something more sane?
>>
>> "Normalization" implies it's an idempotent operation, which CSV
>> splitting is not (at least its current incarnation, without the
>> basestring checking):
>> r"a,b\,c" splits to ["a", "b,c"] which splits to ["a", "b", "c"]
>>
>>
>
> Just to make things clear: freeipa-pviktori-0015-04 is the latest
> version of this patch.
> It's complemented by freeipa-pviktori-0021-03 which adds the tests for
> this (and other things) (but contains one more refactor).
>
>
> If and when that's all reviewed and pushed, I'd like to discuss
> freeipa-pviktori-0018 (Simplify CSV escaping syntax). I'm not saying my
> solution there is perfect, but the flavor of Python's CSV parsing we're
> using now is quite quirky. It would be nice to get the syntax right, or
> at least better, while we're fixing this from a totally-broken state.
> Later there'll be backwards compatibility problems.
> But, don't let that hold back this patch.
>

I'm still not completely convinced. This patch looks ok and I'm sold on 
client side only parsing but this doesn't actually fix any of the CSV 
bugs we have open.

I'd like to evaluate this in more context.

I think the simplest fix is to drop escapechar processing in the 
csvreader and have users quote strings with commas instead, this is how 
I originally intended it to work. The UI won't have to do anything 
special since it will already be split and it isn't a lot to ask to put 
quotes around a value on the CLI.

This is my proposed fix on top of your patch:

diff --git a/ipalib/parameters.py b/ipalib/parameters.py
index 80b175d..ec6020e 100644
--- a/ipalib/parameters.py
+++ b/ipalib/parameters.py
@@ -702,7 +702,7 @@ class Param(ReadOnly):
          # csv.py doesn't do Unicode; encode temporarily as UTF-8:
          csv_reader = csv.reader(self.__utf_8_encoder(unicode_csv_data),
                                  dialect=dialect,
-                                delimiter=self.csv_separator, 
escapechar='\\',
+                                delimiter=self.csv_separator, 
quotechar='"',
                                  skipinitialspace=self.csv_skipspace,
                                  **kwargs)
          for row in csv_reader:

With this change:
- all the self-tests pass
- https://fedorahosted.org/freeipa/ticket/2417 is fixed
- https://fedorahosted.org/freeipa/ticket/2227 is fixed

I tested with:

$ ipa sudocmdgroup-add-member test --sudocmd='/bin/chown -R 
apache\:developers /var/www/*/shared/log'

$ ipa role-add-privilege --privileges='"u,r stuff"' test

rob




More information about the Freeipa-devel mailing list