| Red Hat Docs > Manuals > Red Hat Web Application Framework > |
The Metadata-Driven SQL Persistence subsystem provides application developers with the ability to:
Define object types and their associated table schemas in PDL files, and have the system generate all SQL to create, retrieve, update, and delete objects of that type. The Persistence layer will also allow the application developer to specify associations for the pre-defined object types.
At run time, create new object types that are subtypes of existing object types and allow the application code to interact with these objects in a manner similar to predefined object types.
Create associations between dynamically created object types and other object types (both dynamically created and predefined). The system must allow for "0..1", "0..n", and "1..n" associations.
The Metadata-Driven SQL Persistence subsystem is not designed to replace standard SQL or PDL files. It can be used to reduce the complexity of the PDL file through its auto-generation of events but it does not replace the SQL files as it does not automatically generate DDL statements for statically defined object types.
The Metadata-Driven SQL Persistence subsystem meets the requirements set forth in the requirements document by adding additional functionality to PDL and extending the functionality of both the Operational Persistence layer and the Metadata Persistence layer. The public interface for dynamic subtyping can be found in the Operational Persistence while the code for dynamically generating create, retrieve, update, and delete statements for statically defined object types can be found in the Metadata Persistence.
The Metadata-Driven SQL system is intended for use by application developers within their code. There is no user interface. The following use case assumes that the application has a user interface to allow a site administrator to enter information.
Suppose a Content Management System developer wants to allow a site administrator to create arbitrary new content types. The following is a list of events that demonstrate how this system can be used.
The developer decides to create a base Object Type called content_item which contains fields for an id and a name, and that can be extended at run time. He defines how to create objects of that type in a PDL file.
Through an interface built by the application developer, a site administrator specifies that he wants to store hotel information. He indicates that he would like to store a description (a string), the address (a string), and the number of rooms (an integer). The application then uses the public persistence API to create a dynamic object type (a sub-type of content_item) named "Hotel" with the attributes "name", "id", "description", "address", and "number of rooms". "description", "address", and "number of rooms" must be stored within a different table. So, the developer calls the save() method and the system automatically generates the appropriate DDL statements to create the table. The system also creates an internal mapping of the Attributes to table columns and outputs the appropriate PDL file. Note: it is up to the application to keep track of the attribute names so that they can access the information in the persistence layer.
The site administrator decides that he would like to create a list of possible amenities for the hotels (for example pool, spa, gym, etc). He goes through the same process as above and creates an object type to represent the amenities. He can then go and map the hotels to their amenities. The application uses the public persistence API to create this mapping.
The site administrator decides that he wants to be able to add a description to each Hotel-Amenity mapping. Using the application provided interface, he does this and the application uses the persistence API to create a Link Attribute on the Association.
A year later, one of the hotels in the system has undergone some renovation and no longer has one of the Amenities. The administrator can specify this through the application user interface and the application uses the persistence API to remove a single association in the exact same way it would remove an association for a static object type.
Another year goes by and the site administrator decides that it is easier for hotel amenities to just be listed in the description instead of as separate associations. He deletes the content type "Amenity" through the user interface provided by the application and the application uses the public persistence API to disable the "Amenity" object type. The persistence system does not delete object types after they are created so that no information is lost.
When an application had the requirement to be able to store unknown data on the fly in previous version of Red Hat Web Application Framework (CCM), there were two distinct approaches used.
The first approach was to use "skinny tables." In this approach, the application would have a table that contained an "attribute name" and an "attribute value" as well as some other identifying information (for instance an id or a foreign key to reference the row to the owning table). This approach has the problem of not scaling well. That is, when the number of objects using the same "skinny table" became large, accessing the table became slow. And, it was often the case that a single query would have to join with the table many times or would have to be broken in to multiple queries.
The second approach was to build up DDL SQL statements on the fly and then execute them using the web server. This included "create", "alter", and "drop" statements for tables and "create or replace" and "drop" statements for views. This approach has proven to be the most flexible approach available. The major downside of this approach is that the developer must be careful to not allow the code to accidentally delete any information (especially tables) from the database.
When a developer wants to subtype an existing object type, there must be a way for the system to store the information about the new objects in the database. There are four ways that can be done and any combination of these can be used.
One approach is to use several central tables, each with many columns. These tables would be specified by the persistence system and would be designated as flexfield tables. Each table would represent a column type. For instance, there could be one table with 10 columns that represent all String columns in subtypes. Another table may have 10 BigDecimal columns and yet another table could have 10 Boolean columns. This has the advantage of allowing any object type in the system to be subclassed. However, it has several disadvantages. First, it may have a large overhead if there are many applications using the system. We found with the acs_objects table that having a large number of rows in a single generic table does not scale well. Another disadvantage is that this approach could potentially cause a large number of joins and a lot of empty space taken up by unused columns. Combining the separate tables into a single table would eliminate the joins but could potentially take up more space and would not be flexible on a case-by-case basis. A third disadvantage is that it would require "special case" code when a particular object type has more attributes of a given type than there are columns available (for instance, if an object type needs 11 String attributes and there are only 10 String columns available). In this case, there must be extra overhead to store this extra information.
If the "wide tables" approach is taken by the persistence system then, in addition to a table for each attribute type, the system must also provide a generic mapping table to allow subtypes to be associated with other object types. This has the disadvantage of lacking foreign key constraints and having the potential for major scalability issues.
The major advantage to the "wide tables" approach is that any data object type could be subtyped at any time. The disadvantage of this is that there could potentially be scalability issues and it is difficult to design a set of generic tables to meet the needs of all developers.
A skinny table is a table that has three columns containing a reference to the owning object as well as the attribute name and the value. This solves the problem faced by the "wide tables" of having a maximum number of values for a given object type and it also solves the problem of having empty storage. However, it has the significant problem that a join must occur for every single attribute. Thus, if an object has 5 attributes, the same table must be joined 5 times. This approach has serious scalability problems that would have to be solved in order to be viable.
Creating database tables to store dynamically created subtypes through dynamically generated DDL statements is a solution that will solve all of the problems presented by wide tables, skinny tables, and FlexFields. It allows developers to tune any dynamically defined tables, it does not have a "hard" limit on the number of a given type of column, and it will create columns of the correct type for the correct attribute. In addition, when an attribute of the object type is removed, it does not have to remove any information from the table. Rather, it can simply ignore the column thus keeping around historical information.
The one disadvantage that auto-generation of DDL statements has over all of the other methods is that misuse of the DDL code could have serious consequences. With the other approaches, a bug or misuse may mean that some data from a particular column is lost. With the use of DDL, it could mean that columns on arbitrary tables are added or removed and a really bad mistake could cause random database tables are removed. However, careful programming and the proper precautions can greatly reduce this risk.
A final approach to this problem is to require each object type that can be subtyped to declare available FlexFields to use for storing the information about objects of the subtype. This approach has three main advantages over the "wide table" approach. First, each application generally knows the type of information that subtypes will want to store. By allowing the application to specify the columns to use, the application can take advantage of specific needs. This also means that if an application needs an extremely large number of columns that it does not have to derive a workaround for the artificial limitation created by the single, generic table. The second advantage of having application specific tables is that the application developer will be able to tune the table appropriately and may be able to take care of specific foreign key constraints (for example the ID column of the table can now reference a specific table instead of being a stand-alone integer). The final advantage is that since there will be tables for each statically defined object type, there will not be as much information stored in each table and thus joining against the tables will not be as expensive.
The flexfield approach has a significant advantage over the automatic generation of DDL statements in that it does not alter the database schema in any way and thus it is more difficult to unintentionally destroy any data.
FlexFields, however, do have several disadvantages. First, it requires much more complex code to handle all of the possibilities for the different subtypes and still faces the problem of running out of columns of a given type. Second, the FlexFields approach requires a developer to know at compile time which object should be allowed to be subtypes. While the developer will have an idea of which object types will probably be cloned, there is no way for them to know all possible types. This means that if a user wants to subclass something that is not available then the developer must intervene. This is one of the large disadvantages of using FlexFields instead of dynamic DDL.
When developers create PDL files they are parsed into an abstract syntax tree and undefined events are created. This transformation can happen at compile time or server startup time.
Compiling the PDL and generating the SQL events at compile time allows for compile time syntax checking as well as a faster server startup because all events have already been generated and the PDL files have been expanded. The disadvantage of this approach is that it introduces another step into the build process and adds a slight bit of complexity to the overall system.
The other option is to have the server read the PDL and generate the SQL CRUD statements every time the server startups. This has the disadvantage of needing to create the CRUD statements either at server startup or when they are first needed by the code. The advantage of this approach, however, is that it would only force the application to store information input by the user and not queries generated by the system and it would allow for a simpler system.
Error checking at compile time is nice for the developer and dynamic SQL Event generation at compile time is desirable so that the server will start up faster. Thus, the system copies the PDL files to the web server's WEB-INF directory, generates dynamic events, and checks for errors at compile time. At server startup, it reads the PDL files and generates the events again. The improved performance of not having to generate events at server start up is not enough to justify the increased complexity that would be introduced if we wrote and read in the files with events.
When a user dynamically creates a new object type that is a subtype of an existing object type, the system needs to store this information so that it will persist between server restarts. There are two possible locations to store the information and there are pros and cons to each of them.
Storing subtype information as a PDL file in the file system has the advantage of providing the developer with the ability to easily change the underlying schema that is used to store information for dynamically created subtypes. It provides them with a standard file format and a familiar syntax. The disadvantage of this is that writing the file to the filesystem is not an ACID task. Therefore, if the server were to crash while the file is being written, information about the new (or updated) object type could be lost.
Storing subtype information in the database solves the main problem of store the information in the file system. That is, the database is ACID compliant. It also provides the system with the ability to lock files so that only one process can update it at a time. Storing information in a file system does not provide this functionality. However, storing information in the database makes it difficult for a developer to change the underlying schema. If the information is only in the database then the developer will have to make the updates via a web form which has proven to be cumbersome.
In order to support automatic generation of SQL, several new classes must be added to the com.arsdigita.persistence.metadata package.
JoinElement is a simple class that encapsulates the relationship between two columns, used to specify a join order. JoinElement is used by JoinPath to specify a complete path from one object type or table to another.
JoinPath defines a complete path from one ObjectType or table to another, allowing metadata-driven SQL to create queries across multiple tables on the fly. A path is composed of 1 or more JoinElements, specifying the particular columns to join, and in what order.
MDSQLGenerator is an interface that defines an API to automatically generate SQL queries based on the metadata provided in the PDL files. This class provides a method, generateObjectEvent that actually generates the SQL to be executed. Different implementations may be needed for different databases.
MDSQLGeneratorFactory instantiates the correct MDSQLGenerator implementation and then returns a pointer to that class.
DDLGenerator is an interface that is implemented for each different database. The implementations of the interface create the SQL that is used to create an modify tables used to store specific object types.
DDLGeneratorFactory instantiates the correct DDLGenerator implementation and then returns a pointer to that class.
In order to provide users with the ability to dynamically subtype existing object types, the persistence system uses a single new Java class, DynamicObjectType. This class allows the developer to subtype an existing object type, add and remove attributes, and add and remove associations to these new object types.
One of the goals of Metadata-Driven SQL is to automatically generate the SQL Events for pre-defined object types. In order to do this several additions have been made to PDL. The first addition involves declaring attributes. Typically, when an Attribute for an object type is declared, it is of the form
<attribute type> <optional multiplicity> <attribute name> |
With Metadata-Driven SQL, it is now possible to map the attribute to a specific column in the database. So, object types that use automatic event generation will now declare their attributes with the following signature:
<attribute type> <optional multiplicity> <attribute name> =
<table name>.<column name> <column type> |
If object types do not want to take advantage of automatic SQL generation then they can continue using the standard way of declaring Attributes. This provides for backwards compatibility.
The second addition is the concept of a reference key. The reference key is used within object types that extend other object types and it dictates how the child table can be joined with the parent table. The reference key definition appears where the object key would appear for other object types. Its signature is:
reference key (<table_name>.<column_name>) |
With these new additions, the com.arsdigita.categorization.Category object type definition goes from
model example;
object type Category extends ACSObject {
String[0..1] description;
String name;
Boolean isEnabled;
<all events and associations here>
} |
model example;
object type Category extends ACSObject {
String[0..1] description = cat_categories.description varchar(4000);
String name = cat_categories.name varchar(200);
Boolean isEnabled = cat_categories.enabled_p char(1);
<associations here as before>
reference key (cat_categories.category_id);
} |
The final syntax addition to PDL are the addition of they keywords join and to They are used together to specify how to join one table to another. For generating CRUD, they are useful when an attribute is stored in a different table. The signature of this is
join <table name>.<column name> to <table name>.<column name> |
As an example, supposed the user's password is a String stored in a passwords table. The following shows the PDL that can be used to generate the appropriate CRUD statements for a user with a name, email, and password:
model example;
object type User extends ACSObject {
String name = users.name;
String email = users.email;
String password = passwords.my_password;
reference key (users.user_id);
join users.user_id to passwords.user_id;
} |
The join ... to ... syntax will also be used for defining how associations are declared. That is, the definition of associations and role references will expand in a way similar to how attributes expended. Specifically, the current definition for a role reference is
<object type> <optional multiplicity> <role reference name> |
For role references with dynamically created events, the signature will change to be
<object type> <optional multiplicity> <role reference name> =
join <table_name>.<column_name>
to <table_name>.<column_name>
[, join <table_name>.<column_name>
to <table_name>.<column_name>]* |
With this additional syntax in place, we will now have the ability to automatically generate most SQL needed by the system. In addition, the dynamic subtyping will have a way to represent and generate the map section of the retrieve and retrieve all events.
To allow dynamic subtyping, the persistence layer will use a single custom table with the help of the versioning service to store the PDL files for dynamically created object types. The object type definition for this table is below.
object type DynamicObjectType extends ACSObject {
String[1..1] pdlFile = persistence_dynamic_ot.pdl_file;
// objectTypeName will hold something like
// "com.arsdigita.cms.myNewContentItem"
String[1..1] dynamicType = persistence_dynamic_ot.dynamic_object_type;
reference key (persistence_dynamic_ot.pdl_id);
} |
An admin interface for managing the dynamically generated SQ L should be created. It would be nice to have pages to:
View all object types defined in the system. This includes all statically and dynamically defined types.
Provide a user interface to allow developers that want to change the database schema the ability to change the PDL representation as well (so that if information is already being stored the information will not be lost when the column is renamed/moved).
General data queries that would be nice to have for object types. For instance, it would be nice to show how many instances of each object type exist, the object type hierarchy, etc.