[Freeipa-devel] JSON problems (the woes of binary data)

Dmitri Pal dpal at redhat.com
Fri Feb 26 21:28:39 UTC 2010


John Dennis wrote:
> The Problem:
> ------------
>
> I've been looking at the encoding exception which is being thrown when
> you click on the "Services" menu item in our current implementation.
> By default we seem to be using JSON as our RPC mechanism. The
> exception is being thrown when the JSON encoder hits a certificate.
> Recall that we store certificates in LDAP as binary data and in our
> implementation we distinguish binary data from text by Python object
> type, text is *always* a unicode object and binary data is *always* a
> str object. However in Python 2.x str objects are believed to be text
> and are subject to encoding/decoding in many parts of the Python world.
>
> Unlike XML-RPC JSON does *not* have a binary type. In JSON there are
> *only* unicode strings. So what is happening is that that when the
> JSON encoder sees our certificate data in a str object it says "str
> objects are text and we have to produce a UTF-8 unicode encoding from
> that str object". There's the problem! It's completely nonsensical to
> try and encode binary to to UTF-8.
>
> The right way to handle this is to encode the binary data to base64
> ASCII text and then hand it to JSON. FWIW our XML-RPC handler does
> this already because XML-RPC knows about binary data and elects to
> encode/decode it to base64 as it's marshaled and unmarshaled. But JSON
> can't do this during marhasling and unmarshaling because the JSON
> protocol has no concept of binary data.
>
> The python JSON encoder class does give us the option to hook into the
> encoder and check if the object is a str object and then base64
> encode. But that doesn't help us at the opposite end. How would we
> know when unmarshaling that a given string is supposed to be base64
> decoded back into binary data? We could prepend a special string and
> hope that string never gets used by normal text (yuck). Keeping a list
> of what needs base64 decoding is not an option within JSON because at
> the time of decoding we have no information available about the
> context of the JSON objects.
>
> That means if we want to use JSON we really should push the base64
> encode/decode to the parts of the code which have a priori knowledge
> about the objects they're pushing through the command interface. This
> would mean any command which passes a certificate should base64 encode
> it prior to sending it and base64 decode after it come back from a
> command result. Actually it would be preferable to use PEM encoding,
> and by the way, the whole reason why PEM encodings for certificates
> was developed was exactly for this scenario: transporting a
> certificate through a text based interchange mechanism!
>
> Possible Solutions:
> -------------------
>
> As I see it we have these options in front of us for how to deal with
> this problem:
>
> * Drop support for JSON, only use XML-RPC
>
> * Once we read a certificate from LDAP immediately convert it to PEM
> format. Adopt the convention that anytime we exchange certificates it
> will be in PEM format. Only convert from PEM format when the target
> demands binary (e.g. storing it in LDAP, passing it to a library
> expecting DER encoded data, etc.).
>
> * Come up with some hacky protocol on top of JSON which signals "this
> string is really binary" and check for it on every JSON encode/decode
> and cross our fingers no one tries to send a legitimate string which
> would trigger the encode/decode.
>
> Question: Are certificates the one and only example of binary data we
> exchange?
>
> Recommendation:
> ---------------
>
> My personal recommendation is we adopt the convention that
> certificates are always PEM encoded. We've already run into many
> problems trying to deduce what format a certificate is (e.g. binary,
> base64, PEM) I think it would be good if we just put a stake in the
> ground and said "certificates are always PEM encoded" and be done with
> all these problems we keep having with the data type of certificates.
>
> As an aside I'm also skeptical of the robustness of allowing binary
> data at all in our implementation. Trying to support binary data has
> been nothing but a headache and a source of many many bugs. Do we
> really need it?
>
Yeah, a good Friday afternoon problem to solve...
+1 to your recommendations, though I am not a specialist, but suggestion
seems logical.

-- 
Thank you,
Dmitri Pal

Engineering Manager IPA project,
Red Hat Inc.


-------------------------------
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




More information about the Freeipa-devel mailing list