[libvirt] [PATCH] esx: Handle name escaping properly
Matthias Bolte
matthias.bolte at googlemail.com
Wed Oct 13 20:47:57 UTC 2010
2010/10/13 Daniel Veillard <veillard at redhat.com>:
> On Wed, Oct 13, 2010 at 11:06:44AM +0200, Matthias Bolte wrote:
>> VMware uses a mix of percent-, pipe- and base64-encoding in
>> different combinations in different places.
>>
>> Add a testcase for this.
>> ---
>> src/esx/README | 25 ++++
>> src/esx/esx_driver.c | 72 ++++++-----
>> src/esx/esx_storage_driver.c | 42 ++++++-
>> src/esx/esx_util.c | 198 ++++++++++++++++++++++++++++++
>> src/esx/esx_util.h | 18 +++
>> src/esx/esx_vi.c | 6 +
>> src/esx/esx_vmx.c | 88 +++++---------
>> tests/esxutilstest.c | 51 ++++++++
>> tests/xml2vmxdata/xml2vmx-annotation.vmx | 2 +-
>> 9 files changed, 405 insertions(+), 97 deletions(-)
>
> That sounds vaguely familiar, I think I reviewed such a patch last
> month, right ?
I've sworn about this on IRC last week, so that's probably why it
sounds familiar to you :)
> [...]
>>
>> - virBufferURIEncodeString(&buffer, def->name);
>> + escapedName = esxUtil_EscapeDatastoreItem(def->name);
>> +
>
> okay, we really need to make sure the escaping is XML compatible
> though
ESX in general accepts UTF-8 and the escaping as it is doesn't add XML
incompatible characters like <. But I'm not sure if the SOAP binding
in the driver uses virBufferEscapeString in all critical places.
During testing this I also used domain names that contained < and I
didn't hit any obvious problems.
I'll have to review the SOAP bindings for proper XML escaping. I think
there are some places that don't escape properly, but didn't surface
as problems yet.
>> @@ -621,3 +622,200 @@ esxUtil_ReformatUuid(const char *input, char *output)
> [...]
>> +char *
>> +esxUtil_EscapeDatastoreItem(const char *string)
>> +{
>> + char *replaced = strdup(string);
>> + char *escaped1;
>> + char *escaped2 = NULL;
>> +
>> + if (replaced == NULL) {
>> + virReportOOMError();
>> + return NULL;
>> + }
>> +
>> + esxUtil_ReplaceSpecialWindowsPathChars(replaced);
>> +
>> + escaped1 = esxUtil_EscapeHexPercent(replaced);
>> +
>> + if (escaped1 == NULL) {
>> + goto cleanup;
>> + }
>> +
>> + escaped2 = esxUtil_EscapeBase64(escaped1);
>> +
>
> Okay none of those may lead to a problem for XML ...
As I said the newly added escaping doesn't create new XML related problems.
> [...]
>> --- a/tests/esxutilstest.c
>> +++ b/tests/esxutilstest.c
>> @@ -227,6 +227,56 @@ testConvertDateTimeToCalendarTime(const void *data ATTRIBUTE_UNUSED)
>>
>>
>>
>> +struct testDatastoreItem {
>> + const char *string;
>> + const char *escaped;
>> +};
>> +
>> +static struct testDatastoreItem datastoreItems[] = {
>> + { "normal", "normal" },
>> + { "Aä1ö2ü3ß4#5~6!7§8/9%Z",
>> + "A+w6Q-1+w7Y-2+w7w-3+w58-4+Iw-5+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-25Z" },
>> + { "Z~6!7§8/9%0#1\"2'3`4&A",
>> + "Z+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-250+Iw-1_2'3+YA-4+Jg-A" },
>> + { "標準語", "+5qiZ5rqW6Kqe" },
>> + { "!\"#$%&'()*+,-./0123456789:;<=>?",
>> + "+IQ-_+IyQl-25+Jg-'()_+Kw-,-.+JQ-2f0123456789_+Ow-_+PQ-__" },
>> + { "A Z[\\]^_B", "A Z+WyU-5c+XV4-_B" },
>> + { "A`B@{|}~DEL", "A+YA-B+QHs-_+fX4-DEL" },
>> + { "hÀÁÂÃÄÅH", "h+w4DDgcOCw4PDhMOF-H" },
>> + { "A쿀Z", "A+7L+A-Z" },
>> + { "!쿀A", "+Iey,gA-A" },
>> + { "~~~", "+fn5+" },
>> + { "~~~A", "+fn5+-A" },
>> + { "K%U/H\\Z", "K+JQ-25U+JQ-2fH+JQ-5cZ" },
>> +};
>
> Ouch :-)
> One question for the C purists. How do we know how characters outside of
> the ASCII range may be interpreted by a compiler ? The mail as received is
> made of one mime part, not multipart, it's defined as
>
> Content-Type: text/plain; charset="utf-8"
> Content-Transfer-Encoding: base64
>
> so my mail reader probably assumes that this is utf8 characters, carried
> as a base64 encoded chunk of ASCII. Since my terminal is displaying
> in UTF-8 I apparently see your patch as it's expected, but as soon
> as it's saved on disk, most Unix/Linux tools will interpret those
> character using the locale of the user, i.e. if I were to compile this
> in a terminal using a non-UTF-8 locale, the compiler may very well
> interpret this slightly differently, okay this is data so maybe it's
> just bytes being copied without risk, but I'm still a bit worried there.
>
> I like the idea of testing this API seriously, but at the C level
> shouldn't we encode those in an ASCII way to be sure ?
>
> Daniel
>
I didn't thought of this issue.
As discussed on IRC, I replaced all non-ASCII chars with their UTF-8
encoding in octal, for example ä -> \303\244.
So here is v2 attached that fixes the test case encoding issue.
Matthias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-esx-Handle-name-escaping-properly.diff
Type: text/x-diff
Size: 25052 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20101013/6c464450/attachment-0001.bin>
More information about the libvir-list
mailing list