[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: wiki madness



Matt Domsch wrote:
On Fri, Nov 02, 2007 at 11:06:11AM -0700, Toshio Kuratomi wrote:
Chuck Anderson wrote:

Won't there be performance problems with a TurboGears-based wiki? I thought MirrorManager was having issues with TG performance and had to enable form-data caching to get acceptable performance at the cost of possibly stale data. I don't know the details behind it, but that was the reason I was given for why when you edit forms in MM it sometimes returns old pre-edit field values.

We might have performance issues but I'm confident they'll be different performance issues than we're currently experiencing ;-)

The issues we're running into with moin right now are largely caused by Moin's philosophy of having to run off the filesystem, not a db. This means 1) we're unable to spread the load among multiple different app servers so we are constrained to a single server's memory and CPU resources, 2) it makes multiple views of data much harder than it needs to (in the subscription list case, Moin has to walk the filesystem, finding each user's prefs file, parsing it for a watchlist, if the watchlist exists, checking if the page and page categories are in that watchlist, and finally being able to send the notification. With a db, we'd have a separate table for the watchlist and have indexes for the userid and the pagename. Searching for a page wouldn't have to open a file for every single one of our users, instead it would access a single table and pull out the users which were in the watchlist.)

With MirrorManager I know we've had memory and db query speed issues trying to serve the mirrorlist directly from the TG app. I wasn't aware that mirrormanager was having trouble keeping up with it's management functionality, Matt is that still true or is caching a leftover from when the two functions were combined?

I'm sure it's still true, it predated having any mirrorlist
functionality at all.

The short story is, TG (well, SQLObject) either caches data very
aggresively, so you can see stale data on changes, or not at all, so
each field read in each row results in a DB query.  Even with
object.sync() calls scattered through the UI actions like I did,
leaving caching enabled we do still see stale data on occasion.
Disabling caching, generating the UI pages or certainly the publiclist
pages takes _forever_, hundreds of thousands of small DB queries.

Maybe SQLAlchemy has a better caching mechanism, I don't know.

I've just taken an extremely quick look at this and I don't know where the stale data problem is coming from, but it does look like SQLObject could make more db calls than SQLAlchemy even with caching on. The first part of this is okay::

  In [30]: import model

  In [31]: sites = model.Site.select(orderBy='name')

  In [32]: for site in sites:
     ....:     pass
     ....:
1/Select : SELECT site.id, site.name, site.password, site.org_url, site.private, site.admin_active, site.user_active, site.created_at, site.created_by, site.all_sites_can_pull_from_me, site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name 1/QueryR : SELECT site.id, site.name, site.password, site.org_url, site.private, site.admin_active, site.user_active, site.created_at, site.created_by, site.all_sites_can_pull_from_me, site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name
   1/COMMIT  :  auto

This second part is inefficient::

  In [33]: site.hosts
   1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
   1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
   1/COMMIT  :  auto
   1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
   1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
   1/COMMIT  :  auto
  Out[33]:

[Snip values of site.hosts]

  In [34]: site.hosts
   1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
   1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
   1/COMMIT  :  auto
   1/QueryAll:  SELECT id FROM host WHERE site_id = (173)
   1/QueryR  :  SELECT id FROM host WHERE site_id = (173)
   1/COMMIT  :  auto
  Out[34]:

The list of hosts is retrieved from the db each time the variable is accessed even though caching is enabled. This will make a difference if you access a variable more than once, for instance, printing all the site.hosts.name in a menu of links at the top of the page and then looping through site.hosts to print out a complete record for each.

For the stale data problem I'd have to know how to reproduce it. Is the data stale when two people are editing the same information? Is it stale on a page refresh? Etc.

-Toshio


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]