[Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names
Richard W.M. Jones
rjones at redhat.com
Mon Nov 25 22:35:08 UTC 2013
On Mon, Nov 25, 2013 at 10:52:30PM +0100, Hilko Bengen wrote:
> * Richard W.M. Jones:
>
> >> - nk->name_len = htole16 (strlen (name));
> >> - strcpy (nk->name, name);
> >> + nk->name_len = htole16 (recoded_name_len);
> >> + memcpy (nk->name, recoded_name, recoded_name_len);
> >> + free(recoded_name);
> >
> > Please put spaces after function names! It improves readability:
>
> Sorry, I'll fix those. I also forgot to add a free() in
> hivex_node_set_values.
>
> >> /* Update max_subkey_name_len in parent nk. */
> >> - uint16_t max = le16toh (parent_nk->max_subkey_name_len);
> >> - if (max < strlen (name) * 2) /* *2 because "recoded" in UTF16-LE. */
> >> - parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2);
> >> + size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;
> >
> > * 2 is probably wrong here for non-BMP characters, but the original
> > code makes the same mistake ... Could we get the true length from the
> > hivex_encode_string function?
[Note: I read this email after my reply to version 2 of this patch]
> Are there any non-BMP characters that can be encoded in Latin1 -- or
> whatever 1-byte encoding one is supposed to use there?
OK I guess the original code is correct.
> Peter Norris' master's thesis[1] suggests that
>
> recoded_name_len : recoded_name_len * 2
>
> is probably right.
However I still think *2 is incorrect, despite what the thesis says.
(The thesis is -- how shall I put this -- "unclear" in the way he uses
the word "Unicode", never mentioning "UTF" at all).
For example, the encoding of U13057 (a rather elegant Egyptian
hieroglyph 𓁗, if you can find a font that can render it) is
{ 0xD80C, 0xDC57 } (4 bytes) in UTF-16.
However, I also doubt that Windows works correctly with non-BMP
UTF-16. ie. Windows probably means UCS-2.
> Cheers,
> -Hilko
>
> [1] http://amnesia.gtisc.gatech.edu/~moyix/suzibandit.ltd.uk/MSc/Registry%20Structure%20-%20Main%20V4.pdf, p.79
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
More information about the Libguestfs
mailing list