[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: stylesheet fun



On Thu, 17 Apr 2003, Jeff Johnson wrote:

> On Thu, Apr 17, 2003 at 04:01:20PM -0400, Joe Shaw wrote:
> > 
> > Yikes.  This maps well to the tags within an RPM header, but it
> > makes parsing this info efficiently using a SAX parser impossible,
> > because you have to keep around state for every file in a package
> > through the entire parsing context.
>
> Any reason the above can't be transformed into what you want using
> XSLT?

I think Joe was saying that the XML output Jeff originally posted,

  <rpmTag name="Filesizes">
    <integer>10576</integer>
  </rpmTag>
  <rpmTag name="Basenames">
    <string>time</string>
  </rpmTag>
  <rpmTag name="Dirnames">
    <string>/usr/bin/</string>
  </rpmTag>

is solidly in the xml-as-document camp. It stuff makes things hard on
folks who like to do SAX parsing, since they've got to read/store a
fairly large bitsteam before they can merge all the tags related to a
single file, e.g.,

  rpmHeader/rpmTag[@name='Filesizes']/integer[1]
  rpmHeader/rpmTag[@name='Basenames']/string[1]
  rpmHeader/rpmTag[@name='Dirnames']/string[1]
  rpmHeader/rpmTag[@name='Dirnames']/string[1]
  rpmHeader/rpmTag[@name='Filegroupname']/string[1]
  rpmHeader/rpmTag[@name='Fileusername']/string[1]
  ...

Joe's markup,

 <file>
   <basename><string>time</string></basename>
   <dirname><string>/usr/bin/</string></basename>
   <filesize><integer>10576</integer></filesize>
   ...
 </file>

is more in the xml-as-data-stream vein. You'd only have to store the
stream from <file> to </file> to get all the info about a single file.

At least, I think that's what Joe is saying. :-)

I do mostly DOM-based parsing, so I'm not terribly sensitive to SAX
concerns, but I think I'd second his suggestion for some other 
reasons:

* in Jeff's markup, you've got to know which of the many rpmTag
  attributes are going to contain arrays related to files, whereas 
  Joe's suggestion keeps all file-related into together in a <file> 
  wrapper

* Joe's suggestion would allow you to do some "strong typing" of the 
  tags, if you used XML Schema rather than a DTD to define the 
  doctype. You could do away with, e.g., the <integer> tags like

    <filesize><integer>10576</integer></filesize>

  and just define <filesize> as an element that can only contain only 
  positive integers:

    <xs:element name="filesize">
      <xs:simpleType>
        <xs:restriction base="xs:integer">
        <xs:minInclusive value="0"/>
      </xs:restriction>
    </xs:simpleType>

  Doing that with attributes is pretty hacky.

The downside, of course, is that the DTD/Schema would have to be kept 
in line with whatever headers RPM is storing -- which may be more work 
than it's worth.

--Paul Heinlein <heinlein@cse.ogi.edu>





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index] []