[Libguestfs] [PATCH 3/3] lib: Add support for creating nodes (keys) and values with UTF-16LE-encoded names

Richard W.M. Jones rjones at redhat.com
Mon Nov 25 22:35:08 UTC 2013


On Mon, Nov 25, 2013 at 10:52:30PM +0100, Hilko Bengen wrote:
> * Richard W.M. Jones:
> 
> >> -  nk->name_len = htole16 (strlen (name));
> >> -  strcpy (nk->name, name);
> >> +  nk->name_len = htole16 (recoded_name_len);
> >> +  memcpy (nk->name, recoded_name, recoded_name_len);
> >> +  free(recoded_name);
> >
> > Please put spaces after function names!  It improves readability:
> 
> Sorry, I'll fix those. I also forgot to add a free() in
> hivex_node_set_values.
> 
> >>    /* Update max_subkey_name_len in parent nk. */
> >> -  uint16_t max = le16toh (parent_nk->max_subkey_name_len);
> >> -  if (max < strlen (name) * 2)  /* *2 because "recoded" in UTF16-LE. */
> >> -    parent_nk->max_subkey_name_len = htole16 (strlen (name) * 2);
> >> +  size_t utf16_len = use_utf16 ? recoded_name_len : recoded_name_len * 2;
> >
> > * 2 is probably wrong here for non-BMP characters, but the original
> > code makes the same mistake ...  Could we get the true length from the
> > hivex_encode_string function?

[Note: I read this email after my reply to version 2 of this patch]

> Are there any non-BMP characters that can be encoded in Latin1 -- or
> whatever 1-byte encoding one is supposed to use there?

OK I guess the original code is correct.

> Peter Norris' master's thesis[1] suggests that
> 
>     recoded_name_len : recoded_name_len * 2
> 
> is probably right.

However I still think *2 is incorrect, despite what the thesis says.
(The thesis is -- how shall I put this -- "unclear" in the way he uses
the word "Unicode", never mentioning "UTF" at all).

For example, the encoding of U13057 (a rather elegant Egyptian
hieroglyph 𓁗, if you can find a font that can render it) is
{ 0xD80C, 0xDC57 } (4 bytes) in UTF-16.

However, I also doubt that Windows works correctly with non-BMP
UTF-16.  ie. Windows probably means UCS-2.

> Cheers,
> -Hilko
> 
> [1] http://amnesia.gtisc.gatech.edu/~moyix/suzibandit.ltd.uk/MSc/Registry%20Structure%20-%20Main%20V4.pdf, p.79

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)




More information about the Libguestfs mailing list