Performance Tuning

The Persistence system does all it can reasonably do to be efficient. However, since developers know more about the system they are developing than the generic persistence code does, in some cases they can further increase the system's performance. There are several ways that developers can give the system "hints" about how to access information that can increase its speed.

The first hint that was discussed dealt with Filters and the OPTIONS block. By default, the system wraps all queries that use filters but provides a way for developers to signal that they do not want this wrapping. Performance can also be increased in cases where standard, automatically generated events are not efficient enough. In such cases, the system lets you override events. Using the same motivation, the system provides developers with ways to specify how object attributes should be loaded as well as ways to make sure that resources are being used efficiently.

Lazy Loading of Attributes

When developing an application, developers normally only need access to a few of the attributes of any given object type. Normally, loading all of the attributes from the database does not cause any extra overhead and is therefore a desirable feature. However, in some cases the developer may not want all of the attributes loaded, for example, if the program needs access to a database LOB but does not necessarily need it for every object. If the PDL is defined in a normal manner, whenever the object is accessed, the LOB is read into memory from the database and thus uses valuable time and resources.

One way to avoid the unnecessary loading of LOBs is to use lazy loading. Lazy Loading allows the developer to specify certain attributes that should not be loaded by default. Instead, they are only loaded when they are explicitly requested. Using lazy loading is easy if you follow three steps:

The following PDL definition for Author could be used in place of this definition in order to lazy load the Blob attribute portrait.

object type Author {
    BigInteger[1..1] id = authors.author_id INTEGER;
    String[1..1] firstName = author.first_name VARCHAR(700);
    String[1..1] lastName = author.last_name VARCHAR(700);
    Blob[0..1] portrait = authors.portrait;
    ScreenName[0..1] screenName = 
         join authors.screen_name_id to screen_names.name_id;

    object key(id);

    retrieve {
        do {
            select first_name, last_name
              from authors
             where author_id = :id
        } map {
            firstName = authors.first_name;
            lastName = authors.last_name;
        }
    }

    retrieve portrait {
        do {
            select portrait
            from authors
            where author_id = :id
        } map {
            portrait = authors.portrait;
        }
    }

    // we can leave the rest of the definition blank because MDSQL
    // will generate the rest of the of the events for us.
}

Given the above PDL, the persistence layer will load only the firstName and lastName when an Author is retrieved, and fetch the portrait the first time the portrait property is requested. Currently, there is no way to tell the MDSQL generator to produce SQL like this; such a mechanism may be added in the future. Meanwhile, it is possible to use MDSQL for the majority of the events and simply override with static SQL when necessary.

Finally, please note that the system performs an independent database query for each event. Therefore, a separate database query is used for each attribute that is lazy loaded. In the above example, whenever the portrait is retrieved, there is a database hit for all of the other attributes and a database hit for the portrait. Therefore, if the portrait is used most of the time, it may be more efficient to not lazy load it.

Aggressive Loading of Attributes

While it is useful to be able to specify that an attribute not be loaded until requested through lazy loading, it is sometimes just as useful to be able to aggressively load attributes. That is, it can be useful to be able to load all information for an object in a single database query instead of multiple queries. Suppose, for instance, that whenever you want to retrieve an author, you also want to load the associated email address. If an author has a one-to-one association with an Email object and you want the address attribute to be loaded whenever an author is loaded, you can specify this with the aggressive load keyword. For instance:

object type Author {
    BigInteger[1..1] id = authors.author_id INTEGER;
    String[1..1] firstName = author.first_name VARCHAR(700);
    String[1..1] lastName = author.last_name VARCHAR(700);
    Blob[0..1] portrait = authors.portrait;
    Email[0..1] email = join authors.author_id to emails.author_id;

    object key(id);

    aggressive load (email.address);

By specifying the aggressive load, the retrieve event for the Author that is generated by the system will be similar to the following example:

retrieve {
    do {
      select authors.author_id, authors.first_name, 
             authors.last_name, authors.portrait, 
             email.email_id, email.address
      from authors, email
      where authors.author_id = email.author_id(+)
    } map {
      id = authors.author_id;
      firstName = authors.first_name;
      lastName = authors.last_name;
      portrait = authors.portrait;
      email.id = emails.email_id;
      email.address = emails.address;
    }
}

You should note a few points about this example. First, because it is a 0..1 association, it uses an outer-join to join the two tables. Second, it automatically retrieves the object key for the associated object (in this case, it is email_id). To access the aggressively loaded attribute, your Java code will look exactly as it would if the attribute was not aggressively loaded (retrieve the associated data object with get("email") and then call get("address")). The only difference is that it will not need to hit the database a second time.

Closing Result Sets

The way that Java accesses the database requires the use of ResultSets. These ResultSets are essentially database cursors. Having too many of them open at the same time can cause problems for other threads that are trying to access the database. In an effort to help you determine when ResultSets are not being cleaned up by the code (and are actually being closed by the garbage collector), the Persistence system includes a warning message in the log, similar to the following:

ResultSet was not closed by programmer: <Result Set toString here>, 
closing in garbage collection.  Lots of these messages can indicate the 
cause of an out of cursors error.

In standard use, the Persistence system closes ResultSets for the developer. However, there are certain ways that the system may be used that would leave ResultSets opened. Specifically, if a developer does not iterate through all of the rows returned by a ResultSet, the system will not close them. For instance, if the developer executes a DataQuery that returns 20 rows and then only iterates through the first 10 rows, the system does not automatically close the ResultSet, in case the code needs to access it later. Then, when the garbage collector cleans up the ResultSet, it prints out the above error. If you iterate through all 20 rows (until calling next() returns false), the persistence layer is smart enough to close the ResultSet for you.

It is important to minimize the number of open ResultSets because each open ResultSet is equivalent to an open cursor in the database. Since database cursors are a valuable resource, it is imperative to use them only when they are needed and return them to the pool when you are done with them. More than likely, this will not cause a problem, but it can cause problems on high-volume sites where the number of open cursors at any given time can exceed the maximum number allowed by the database. When this happens, the database becomes inaccessible and the site goes down. Thus, by manually closing Result Sets that are not fully iterated through (by calling close()), you are saving resources.

Data Objects versus Data Queries: Reducing the number of database hits

When developers create pages with the persistence system, they tend to use Data Objects and Domain Objects for almost everything they do. However, this is not always the most efficient implementation. Data Objects and Domain Objects are designed to be used when inserting, updating, and deleting information from the database, as well as when a small number of objects are being retrieved. However, when many items are being used, it is not always a good idea to use DataObjects because of the memory overhead that is created. In such a situation, one of the following solutions may be of use:

The important thing to remember is to try to use the standard Associations and DataCollections as much as possible before using DataQueries so that you can maintain abstraction within your PDL. However, if the Association is performing too many database hits, a DataQuery may be able to substantially improve this performance.