[Pulp-list] Pulp 3.0 Technology Stack Justifications

Sean Myers sean.myers at redhat.com
Thu May 12 14:51:58 UTC 2016


Early planning for Pulp 3.0 is building up some steam, and it's
a good time to go over the proposed technology stack that we're
looking at right now that we're looking at to build on. For all
of these choices, once Pulp's basic needs are met, the major
deciding factor for what library to use is decided by "meta"
factors, like community support, release processes, etc. Special
thanks to Jeff Ortel for making sure my assumptions about these
tools got challenged so the right choices get made.

We're using postgres as the DB for 3.0. Since we're going
relational, the next thing we'd want is a good ORM. Several team
members have experience with the Django ORM, and Pulp is actually
already using it in its views. It has a fantastic community, is
well documented, and comes with a vast multitude of third-party
plugins to help us fill in any gaps in functionality that may be
found. Our current tasking system is build on Celery[0], which is
among those third-party plugins with excellent Django support,
which potentially means that using Django with a relational DB
can help us get rid of code where we overlap functionality that
may be provided by django-celery.

Other ORM options were considered, but only SQLAlchemy (another
very good ORM) stood out as something we could use if there was
a compelling reason to switch from Django, but at this time there
is no such reason. Django does the job well. Most other ORMs are
either not robust enough in their feature-set or apparently not
being actively maintained, and were rejected as alternatives.
Also rejected outright was not using an ORM (or other form of
data mapper) at all, since my sense is that we all agree that
we don't want to manually be writing SQL. :)

This leads to the next big building block, which is the tool we
should use to build our REST APIs. I've used django-tastypie in
the past, as have a few other team members, and it was my front-
runner for this job. After looking around though, it looks like
django-rest-framework (DRF) is currently dominating this space
in the Django community[0]. Going through some of their tutorials
and examples, it's looking like tastypie is out of the running,
and DRF is the winner. Both would be adequate for Pulp's needs
when it comes to putting a REST API on top of our data model, so
it makes sense to go with the more "popular" option. In addition,
I think its documentation and API are easier to work with than
tastypie's, so it's simultaneously easier to use and easier to
*learn how* to use.

Finally, we're looking at bringing in a search engine for the 
search views in the API. We're currently doing search using
mongodb, using mongo-specific search criteria, but will be
decoupling the search API from the search engine. As with Django,
a few team members have experience using elasticsearch (myself
included). Elasticsearch is java-based, running on top of the
Lucene indexer, with a simple REST API on top of it, and so at
the moment it's my preferred search engine.

I looked at a few other search engines in recent testing, including
the pure-python engine "Whoosh", Solr (also uses lucene), Xapian,
and Sphinx (the search engine, not the document builder). Of these,
only Whoosh and Elasticsearch have first-party support by the
django-haystack project[2], which is both my preferred and the most
commonly used django search plugin[3]. Given my previous positive
experience with Elasticsearch, I think it's probably the best choice
for a search indexer at this time.

The Whoosh plugin for Haystack currently doesn't support a very
useful feature that Whoosh itself does support, which is faceting.
This feature gap is something that would need to be closed (likely
by us) to get feature parity between the elasticsearch and whoosh
backends.

While there are other libraries that appear to live in the same space
as haystack (integrate a search indexer with Django models, providing
Django QuerySet/Model results), none of them have the robust features
and community support seen in haystack. Again, though, decoupling the
search interface from the search implementation means that this piece
is likely to be easy to change out if we find better options in the
future (especially if we write it with this in mind).

Summary:
- Django ORM on postgres
- django-rest-Framework to build API views
- django-haystack to provide search capabilities, using Elasticsearch
  to start, possible switching to Whoosh after some development -- this
  switch should occur before any release of 3.0

[0]: http://docs.celeryproject.org/en/latest/django/
[1]: https://www.djangopackages.com/grids/g/rest/
[2]: http://django-haystack.readthedocs.io/en/stable/backend_support.html
[3]: https://www.djangopackages.com/grids/g/search/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160512/5354420b/attachment.sig>


More information about the Pulp-list mailing list