[libvirt] [PATCH 1/4] virsh-domain: use correct base for virStrToLong_ui

Laine Stump laine at laine.org
Fri Oct 23 16:48:07 UTC 2015


On 10/23/2015 03:59 AM, Pavel Hrdina wrote:
> On Fri, Oct 23, 2015 at 09:33:49AM +0200, Peter Krempa wrote:
>> On Thu, Oct 22, 2015 at 11:24:02 -0400, Laine Stump wrote:
>>> On 10/21/2015 08:22 AM, Pavel Hrdina wrote:
>>>> While parsing device addresses we should use correct base and don't
>>>> count on auto-detect.  For example, PCI address uses hex numbers, but
>>>> each number starting with 0 will be auto-detected as octal number and
>>>> that's wrong.  Another wrong use-case is for PCI address if for example
>>>> bus is 10, than it's incorrectly parsed as decimal number.
>>>>
>>>> PCI and CCW addresses have all values as hex numbers, IDE and SCSI
>>>> addresses are in decimal numbers.
>>> I've seen examples for PCI that use decimal a number for the slot (which
>>> is the one item that's likely to have a value > 7 unless you have a
>>> system with a whole bunch of PCI controllers)[*], and those that use hex
>>> numbers always prefix the number with "0x". libvirt itself always a
>> lspci doesn't really use 0x prefix:

My statement above was for any libvirt documentation, not "any program 
anywhere displaying a PCI address". The standard notation for a "unified 
PCI address" (coining my own term :-) is as they are displayed in lspci 
(and in all the entries in sysfs).


>>
>> 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03)
>> 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)
>> 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
>>
>> Btw that's a laptop, not a super special system.
>>
>>> prefixes the domain, bus, and slot with 0x, so an auto-detected base
>>> will always get those right.
>> Well, not entirely:
>>
>> nodedev identificators use hex, possibly padded with zeroes:
>>
>> pci_0000_00_03_0
>> pci_0000_00_14_0
>> pci_0000_00_16_0
>> pci_0000_00_19_0
>> pci_0000_00_1b_0
>>
>> Which may result also in some octal 'fun' on boxes with 16 buses ;).

Sigh. Yes, I was only considering the <address> element, not the stuff 
in node device XML and virsh nodedev-* output :-(. What you have pointed 
out makes me even more wary of changing libvirt's parsing behavior.


>>
>>
>> nodedev XML uses decimal in the parsed address:
>>
>> $ virsh nodedev-dumpxml pci_0000_00_19_0
>> <device>
>>    <name>pci_0000_00_19_0</name>
>>    <path>/sys/devices/pci0000:00/0000:00:19.0</path>
>>    <parent>computer</parent>
>>    <driver>
>>      <name>e1000e</name>
>>    </driver>
>>    <capability type='pci'>
>>      <domain>0</domain>
>>      <bus>0</bus>
>>      <slot>25</slot>
>>      <function>0</function>
>>      <product id='0x15a2'>Ethernet Connection (3) I218-LM</product>
>>      <vendor id='0x8086'>Intel Corporation</vendor>
>>    </capability>
>> </device>
>>
>> (Uhhh, so 19 ... or 25? ... or perhaps 31?)
>>
>>
>> And for hostdevs we are promoting the use of 0x prefixed hex:
>>
>>      <address domain='0x0000' bus='0x06' slot='0x02' function='0x0'/>
>>
>> As well as with target addresses. Fortunately.
>>
>> <sarcasm>Well that's very consistent.</sarcasm>


Yes, the inconsistency is actually part of my point. If we silently 
change the parsing to now recognize "25" (no "0x" prefix) as a hex 
number, it's going to give the wrong results for any XML written using a 
decimal slot# that worked properly on "old" libvirt, and it's going to 
make everything confusing in a different way since everyone will need to 
keep straight which version of libvirt interprets it which way - you'll 
*never* be able to write examples for documentation that omit the "0x" 
anyway, since you can't be sure the reader of the documentation has a 
new enough version of libvirt to force parsing as hex.

For example, if someone has a script that uses XML they created "a long 
time ago" by copy/pasting the slot# from the nodedev-dumpxml output (or 
maybe just hand-writing it) into XML that they use for "virsh create" or 
"virsh attach-device" (i.e. the XML is stored/generated external to 
libvirt in its original form, *not* normalized by the libvirt 
parse/format cycle), and upgraded libvirt will magically/silently begin 
interpreting any slot# > 9 differently, leading to anything from a 
different layout of devices on the guest PCI bus, to failure to attach a 
device because it wasn't found (or even worse, silently/erroneously 
attaching the *wrong* host device to the guest). I don't think that is 
acceptable.


If we want to get rid of the inconsistency, without creating the 
possibility of *silently* performing incorrect operations, we should:

* require the input in the <address> element to have a "0x" prefix to be 
sure that there is no assumption on the part of the user that they are 
entering a decimal number.

* change the node device XML (nodedev-dumpxml) to output the domain, 
bus, and slot elements as hex, including the 0x (function is irrelevant, 
since by definition it can never be > 7)

Even doing that would carry the danger of causing existing functioning 
systems to fail though (although they would at least fail in a vocal 
manner).

(BTW, I looked for any places in libvirt that currently parse a number 
*in config* as hex without the 0x prefix - the only one I can find is this:

   * <address type='spapr-vio' reg='0x3000'/> <- reg is forced base16

(and a deprecated "devaddr" in the status (so probably unused in many 
years) that parses a "unified" address). Anything else that is read from 
config is either forced to base 10, or is base 0. So there isn't much 
precedence for interpreting numbers in libvirt config with no "0x" 
prefix as hex.)

>>
>>> So I I think the existing code is correct, and don't see an upside to
>>> making this restriction/invisible change in semantics, and it could
>>> potentially lead to incorrect results if someone is thinking that
>>> they're entering decimal numbers.
>> The existing code is correct only perhaps from a historical point of
>> view. From a functional it's more than flawed. ...
>>
>>> [*] There was one particular document that even went to the trouble of
>>> explaining how to convert the hex value of slot into a decimal number to
>>> use in the libvirt config!
>> Well, this hints that we do actually something wrong here. The ambiguity
>> in parsing of numbers with virStrToLong_ui has already bitten us (I
>> won't bother looking up the commit though).

It's not ambiguous to the parser. Any given string is parsed to exactly 
one result. The problem is that the users expect the parser to behave 
differently than it does. Well, *some* of the users do. Maybe even 
*most* of them. But not *all* of them :-).

As for previous problems with differing bases, this reminded me of a 
very troublesome problem that I spent the time to look up just to fill 
the blank in my memory - commit 8efebd176. This is a case where libvirt 
was formatting the bus/device numbers of a USB device on the qemu 
commandline with %.3d (forcing leading 0's) and qemu was then 
interpreting the numbers as octal, so silently/magically attaching the 
wrong USB device. This problem wasn't solved by making qemu always parse 
USB bus/device as decimal though; it was instead solved by making 
libvirt aware of / follow the rules of qemu's parser. (I realize it's a 
bit more difficult to "patch the user", but that is also an option :-)


>>
>> The problem is that if you use 0 as a base argument it's not really
>> clear what you've parsed and that will result in situations like this.
>> The developer may be lucky in trying numbers that were parsed correctly
>> misleading him that he used the correct base.
>>
>> I think that we shouldn't be using 0 as base in ANY place in our code.

I think that it is dangerous to have a mix of base 16 and base 10 
numbers in a file while not requiring a clear indicator on every number 
of what base was used to encode them. Sure there are some numbers that 
are obviously hex (an address in memory) and there are some that are 
obviously decimal (timeout in seconds), but there are some where it's 
just not clear to the user. The case of PCI addresses is one of those 
(mostly due to the node device XML. But simply changing that won't 
eliminate all the existing examples out on the internet (nor will it 
immediately upgrade all the existing installations of libvirt).


(BTW, I also think there is absolutely *no* place for octal 
representations of numbers anywhere in anything libvirt does. This is 
not 1975, and we're not all (waiting in line for our turn at) sitting in 
front of a TTY connected to a PDP-11 :-P)


>>
>> As of this particular case: If we format it in base 16, we should parse
>> it in base 16. Having ambiguity in the parser code will only end up in
>> problems.

Okay, then to take that to its logical conclusion - if we format it with 
a leading 0x, then we should require the leading 0x when parsing.


>>
>>>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
>>>> ---
>>>>    tools/virsh-domain.c | 30 +++++++++++++++---------------
>>>>    1 file changed, 15 insertions(+), 15 deletions(-)
>>>>
>> Peter
> A agree with Peter, as there is no reasonable point to represent PCI address in
> decimal numbers, only libvirt does that and as said, it's wrong.  I don't know
> any other place, where the PCI address is printed as decimal number.
>
> I've been motivated to update this behavior after I've used '0000:05:10.1' as
> an argument for the '--srouce' and it parsed the slot as decimal number and
> printed in to the XML as '0x0a'.

Yeah I can understand that happening (and your annoyance when it did), 
and if all pci address info everywhere in libvirt and its documentation 
was (and always had been) output in hex (*without* the leading "0x") I 
would probably agree with this patch. But due to the history (and 
current state) I think it's just going to create as many problems as it 
solves. I *might* agree with it if it was done by requiring a leading 
'0x' (having the effect of causing failure for any existing XML that had 
been using decimal - I'm not 100% sure that would be acceptable either, 
but at least it is a *visible* failure, rather than a silent one), and I 
would be more prone to agree if the node device XML was modified to 
output the bus/slot/port in hex (with leading 0x - again, I can't say 
for sure that even that wouldn't screw up some existing management 
application though, so I'm *still* hesitant).






More information about the libvir-list mailing list