[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Anyone know of a tasteful LGPL HTML parser in C?

On Wed, Nov 24, 2004 at 12:33:58PM -0500, Jeff Johnson wrote:
> I'd like to attempt to support
>    rpm -qp http://download.fedora.redhat.com/.../*.rpm
> within rpm by applying fnmatch(3) against parsed HTML hrefs.
> So I'm questing existing HTML parser imp[ementations before hacking up 
> something myself.

  libxml2 HTML parser

> The constraints on my rpm problem/implementation space are:
>   a) must be LGPL


>   b) must be in C.


>   c) must be reasonably small and reliable.

  if you link against the shared lib and use demand paging it's not too
  big, otherwise it won't fit

>   d) should work on a significant variety of HTML dialects without problem.

  people have been using it to build commercial grade Web indexing software


Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]