[katello-devel] search over rest api - interface design

Amos Benari abenari at redhat.com
Thu Jul 14 10:01:53 UTC 2011


Hi All,
I want to start a discussion on our REST search interface. I prefer to do that by email because there are a lot of details involved.

Background:
-----------
The search in Katello UI is build it on top of a rails package called scoped_search.
The scoped_search enables a free-text search as well as specific keys search, it supports a powerful query language that includes:
logical operators, negation, brackets and more, it helps the users get familiarized with the syntax by offering a syntax auto-completer.
scoped_search translate the users query into SQL query, it relays on Rails ActiveRecord to be database agnostic.

Since Katello is build out of several entities (Pulp, Candlepin and Foreman) a large portion of the data is managed by the other entities.
Katello doesn't have direct access to the respective databases. This leads to the point of asking how do we support a search on remote
entities?

To understand the issues involved I suggest looking at an example:
Let's say that a user is looking for a package that was updated yesterday and called something like pulp.
  The user can type the following in the GUI: 
  Package search: "updated = yesterday and name ~ pulp*"
  If the data was in Katello db, scoped_search could parse it and create the following query:
  "select * from packages where updated_at >= July-13-2011 and updated_at < July-14-2011 and name like pulp%"

The query is  processed by the scoped_search in two steps:
1. The user query is validated and parsed into an abstract tree.
2. An sql query is build, using the language definition, the application model, and the specific database sql dialect.

To learn more about scoped_search:
---------------------------------
source: https://github.com/wvanbergen/scoped_search
documentation: https://github.com/wvanbergen/scoped_search/wiki
blog: http://scopedsearch.wordpress.com/


To the point:
-------------
Three options comes to mined when looking at where to put the interface boundary:
1. At the user language end, lets make all our participating parties understand the user query and convert them into the data source that they are using.
essentially, in our case, this means to converting the scoped_search to java and python.
2. Use an intermediate language.
3. Use SQL or mongodb-Jason as an interface.

The 3rd option was rejected in a previous discussion because it is very much un-secure and it exposes internal structure that we probably don't want as a stable API.
The first option of porting scoped_search to python and java will take about 6 weeks per porting effort, and then we will need to maintain all three projects, this might make sense in the long run, depending on my ability to form an active community around the original project.

That leaves us with the second option an intermediate interface.

Now is this going to get to the point?
--------------------------------------
I have started my quest for interface by looking at the vast field of existing query languages. I was looking at languages listed here http://en.wikipedia.org/wiki/Query_language,
I did found some interesting ideas (YQL) but found them to be mostly over-kill for our purpose.
To make the search interface simple enough to be easily translated into query on the receiving end, and yet powerful enough to be useful, I suggest the following guidelines and limitations on the search interface:
a. It will be made of a number of conditions.
b. A conditions will always describe a single infix term (for example:  "arch = i383") and will always refer to a single property ( name, arch, create_date, etc.).
c. A logical AND will be used between conditions. This means that every additional condition will further limit the result set.
   If a user wants to expand the result set he will not be able to use logical OR he will be forced to run yet another query.
   This limitation simplify the interface a lot because it also eliminates the need for brackets.
d. No prefix negation. The interface will not allow writing "NOT updated = yesterday".
   It can be proved that combination of limitation c and d means that some queries are simply impossible in a single query.
   However it is greatly simplifies the interface.
e. An exception to (b) and (c) is the free-text element. The interface will allow query such as "free-text ~ pulp*" this will be translated 
   into a query where each "string/text" property in the searched element will matched with the value.
   For example the result of the above example can be something like "name like pulp% OR description like pulp%". 

So what will the suggested interface look like?
Going back to the example: 
the user typed:
"updated = yesterday and name ~ pulp*" 

query = [name =       [" ~ pulp*"]
         updated_at = [" >= July-13-2011", " < July-14-2011"]
        ] 
Pulp.get_package_by_repo(repoId, query)

rest call will look like:
http-get: /pulp/api/repos/repo_id/packages?query[]=&name[]= ~ pulp*&updated_at[]= >= July-13-2011&updated_at[]= < July-14-2011

I haven't encoded the url for obvious reasons :) 


I hope this is not too long mail, and waiting for comments on the interface suggestion. 
Thanks,
 Amos.

























  




More information about the katello-devel mailing list