[katello-devel] initial stories for non-AR-search

Wed Jun 29 15:41:17 UTC 2011

----- Original Message -----
> From: "Bryan Kearney" <bkearney at redhat.com>
> To: "Amos Benari" <abenari at redhat.com>
> Cc: "Mike McCune" <mmccune at redhat.com>, katello-devel at redhat.com
> Sent: Wednesday, June 29, 2011 3:17:07 PM
> Subject: Re: [katello-devel] initial stories for non-AR-search
> On 06/29/2011 06:41 AM, Amos Benari wrote:
> >
> >
> > ----- Original Message -----
> >> From: "Mike McCune"<mmccune at redhat.com>
> >> To: "Partha Aji"<paji at redhat.com>
> >> Cc: katello-devel at redhat.com
> >> Sent: Tuesday, June 28, 2011 11:12:46 PM
> >> Subject: Re: [katello-devel] initial stories for non-AR-search
> >> On 06/28/2011 11:19 AM, Partha Aji wrote:
> >>> On 06/28/2011 01:11 PM, Justin Sherrill wrote:
> >>>> 1. As a user I want to find a package in a specific repository.
> >>>> 2. As a user I want to find all the repos that contain a package.
> >>>> 3. As a user I want to find all the products that uses a package.
> >>>> 4. As a user I want to find all the environments that has a
> >>>> package
> >>> Same as 1,2 3,4 + errata story but add "for a system" instead of
> >>> repo
> >>>
> >>> Also for erratum based searches we want the search to be based of
> >>> errata
> >>> type (as in
> >>> Bug/Enhancement/Security).
> >>>
> >>> *. As a user I want to find a package that is installed on a
> >>> system.
> >>> *. As a user I want to find a package that is not installed but
> >>> available to be installed on a system.
> >>> *. As a user I want to find updatable packages on a system.
> >>
> >> to me these are less of a search query but more just a standard API
> >> query in itself. We should show the lists of:
> >>
> >> * updatable packages on a system.
> >> * package that is not installed but available to be installed on a
> >> system.
> >> * packages that are installed on a system.
> >>
> >> but then allow search *within* the above results. I don't think it
> >> is
> >> search's job to drive the initial set of results, just to filter
> >> them
> >> after the fact.
> >
> > Having a search capabilities on (almost) every entity and entity
> > relations as we have in Katello and Foreman,
> > can result in a small and simple API, for example for the Package we
> > could have:
> >   * Package.search
> > instead of:
> >   * Package.get_by_repo
> >   * Package.get_by_errata
> >   * Package.get_by_system
> >   * Package.get_by_available_for_system
> >
> > This definition makes the search mechanism more complicated to
> > implement and define but I think it can be useful in both UI and
> > API.
> > I think I can implement such a search for Pulp in Katello by adding
> > support for MongoDB in the search plugin and modeling the Pulp
> > entities in Katello (without replicating the data itself).
> > The API can then be something like
> > /package?search={MongoDB-JSON-where-clause}.
> >
> > The main feature I would like to avoid in the search is joining
> > several data-sources in a single query,
> > because it will add considerable code and performance complexity, so
> > I assume that at leased for the moment it is not needed.
> >
> > Thank you all for adding user stories.
> > Comments on my API design ideas are also welcome :)
> > Amos.
> 
> 
> So... what would this look like for Candlepin then? How would I show
> systems with a fact X = Y?

My line of thinking of the interface definition is like this:

The search language is a "humanized language", the search plugin translate it into the relevant database SQL statement.
If I'll define an interface-language the result will be "humanized" translated to "interface" translated to "SQL", this seems wrong to me.
Two translations on different platforms are just doubling the pain.

This leads me to two alternatives:
1. Send "humenized" query to the remote system
2. Send "SQL where clause" / "MongoDB-JSON" as an interface. To do that we need to:
    a. Model the remote database on Katello.
    b. Sanitize the SQL on the receiving side to prevent SQL injections.

Option (1) works fine for Foreman integration because it's API already expose the relevant "humenized" search. We might need to solve an issue with the different terms used in Foreman and Katello (hostgroups in foreman are templates in Katello etc.) but this is probably not that difficult.
Option (2) will be possible solution for Pulp search after I'll expend the search plugin to support MongoDB.
Option (2) can be a solution for Candlepin if "SQL where clause" is OK as an API.

The obvious drawbacks of (2) is:
A. It exposes the database structure to the API.
B. non-ruby clients will have a hard time using this API.

If the above solution (2) is not acceptable and we can go in one of the following alternatives:
3. Have a limited search capabilities embedded into the API. (I think this is more or less what Mike was referring to)
   e.g. have an API Package.get_by_repo(repo_id, filter, order_by, skip, limit) 
   - filter is a condition over package.
   - order_by used for sorting.
   - skip and limit is used for paging.
 (in REST it would look something like "/repos/repo_id/packages?filter=[name~'foo*',arch='x86_64']&order_by=['name','acs']&skip=0&limit=20")

In this solution the API will be larger, the search capability are going to be some what limited, and the search needs to be implemented in Pulp and Candlepin for each API,
but it's simple to implement.
On the Katello side we are left with a parser and auto-completer to implement.

4. Implement "humanized" search in Java and Python, this can be fun, if we are interested in this direction I'll need to evaluate the cost.

Thanks,
  Amos.

P.S. after writing this email I change my vote and join Mike on (3).
Amos.

> 
> -- bk