Netscape logo Deployment Guide
Netscape Directory Server

Previous      Contents      Index      DocHome      Next     

Chapter 6       Designing the Replication Process


Replicating your directory contents increases the availability and performance of your directory. In Chapter 4 and Chapter 5, you made decisions about the design of your directory tree and your directory topology. This chapter addresses the physical and geographical location of your data, and specifically, how to use replication to ensure your data is available when and where you need it.

This chapter discusses uses for replication and offers advice on designing a replication strategy for your directory environment. It contains the following sections:

Introduction to Replication


Replication is the mechanism that automatically copies directory data from one Netscape Directory Server (Directory Server) to another. Using replication, you can copy any directory tree or subtree (stored in its own database) between servers. The Directory Server that holds the master copy of the information, will automatically copy any updates to all replicas.

Replication enables you to provide a highly available directory service, and to geographically distribute your data. In practical terms, replication brings the following benefits:

Before defining a replication strategy for your directory information, you should understand how replication works. This section describes:

Replication Concepts

When you consider replication, you always start by making the following fundamental decisions:

These decisions cannot be made effectively without an understanding of how the Directory Server handles these concepts. For example, when you decide what information you want to replicate, you need to know what is the smallest replication unit that the Directory Server can handle. The following sections contain definitions of concepts used by the Directory Server. This provides a framework for thinking about the global decisions you need to make.

Unit of Replication

In Directory Server 6.1, the smallest unit of replication is a database. This means that you can replicate an entire database, but not a subtree within a database. Therefore, when you create your directory tree, you must take your replication plans into consideration. For more information on how to set up your directory tree, refer to Chapter 5 "Designing the Directory Topology."

The replication mechanism also requires that one database correspond to one suffix. This means that you cannot replicate a suffix (or namespace) that is distributed over two or more databases.

Read-Write Replica/Read-Only Replica

A database that participates in replication is defined as a replica. There are two kinds of replicas: read-write or read-only. The read-write replicas contain master copies of directory information and can be updated. Read-only replicas refer all update operations to read-write replicas.

Supplier/Consumer

A server that holds a replica that is copied to a replica on a different server is called a supplier for that replica. A server that holds a replica that is copied from a different server is called a consumer for that replica. Generally, the replica on the supplier server is a read-write replica, and the one on the consumer server is a read-only replica. There are exceptions to this statement:

In Directory Server 6.1, replication is always initiated by the supplier server, never by the consumer, in contrast to earlier versions of the Directory Server that allowed consumer-initiated replication (where you configure consumer servers to pull data from a supplier server).

For any particular replica, the supplier server must:

A consumer server must:

In the special case of cascading replication, the hub supplier must:

For more information on cascading replication, refer to "Cascading Replication".

Change Log

Every supplier server maintains a change log. A change log is a record that describes the modifications that have occurred on a replica. The supplier server then replays these modifications on the replicas stored on consumer servers, or on other masters in the case of multi-master replication.

When an entry is modified, a change record describing the LDAP operation that was performed is recorded in the change log.

Replication Agreement

Directory Servers use replication agreements to define replication. A replication agreement describes replication between one supplier and one consumer. The agreement is configured on the supplier server. It identifies:

Data Consistency

Consistency refers to how closely the contents of replicated databases match each other at a given point in time. When you set up replication between two servers, part of the configuration is to schedule updates. With Directory Server 6.1, it is always the supplier server that determines when consumer servers need to be updated, and initiates replication.

Directory Server offers the option of keeping replicas always synchronized, or of scheduling updates for a particular time of day, or day in the week. The advantage of keeping replicas always in sync is obviously that it provides better data consistency. The cost is the network traffic resulting from the frequent update operations. This solution is the best in cases where:

In cases where you can afford to have looser consistency in data, you can choose the frequency of updates that best suits your needs or lowers the affect on network traffic. This solution is the best in cases where:

In the case of multi-master replication, the replicas on each master are said to be loosely consistent because at any given time, there can be differences in the data stored on each master. This is true even when you have selected to always keep replicas in sync, because:

Common Replication Scenarios


You need to decide how the updates flow from server to server and how the servers interact when propagating updates. There are three basic scenarios:

The following sections describe these methods and provide strategies for deciding the method is appropriate for your environment. You can also combine these basic scenarios to build the replication topology that best suits your needs.

Single-Master Replication

In the most basic replication configuration, a supplier server copies a replica directly to one or more consumer servers. In this configuration, all directory modifications occur on the read-write replica on the supplier server, and the consumer servers contain read-only replicas of the data.

The supplier server must perform all modifications to the read-write replicas stored on the consumer servers. Figure 6-1 shows this simple configuration.

Figure 6-1    Single-Master Replication

The supplier server can replicate a read-write replica to several consumer servers. The total number of consumer servers that a single supplier server can manage depends on the speed of your networks and the total number of entries that are modified on a daily basis. However, you can reasonably expect a supplier server to maintain several consumer servers.

Multi-Master Replication

In a multi-master replication environment, master copies of the same information can exist on two servers. This means that data can be updated simultaneously in two different locations. The changes that occur on each server are replicated to the other. This means that each server plays both roles of supplier and consumer.

When the same data is modified on both servers, there is a conflict resolution procedure to determine which change is kept. The Directory Server considers the valid change to be the most recent one.

Although two separate servers can have master copies of the same data, within the scope of a single replication agreement, there is only one supplier server and one consumer. So, to create a multi-master environment between two supplier servers that share responsibility for the same data, you need to create more than one replication agreement. The following figure shows this configuration:

Figure 6-2    Multi-Master Replication Configuration (Two Masters)

In this illustration, Supplier A and Supplier B each hold a read-write replica of the same data.

The number of masters or suppliers you can have in any replication environment is limited to two. However, the number of consumer servers that hold read-only replicas is not limited. Figure 6-3 shows the replication traffic in an environment with two masters (read-write replicas in the illustration), and two consumers (read-only replicas in the illustration). This figure shows that the consumers can be updated by both masters. The master servers ensure that the changes do not collide.

Figure 6-3    Replication Traffic in a Multi-Master Environment

Cascading Replication

In a cascading replication scenario, a hub supplier receives updates from a supplier server, and replays those updates on consumer servers. The hub supplier is a hybrid: it holds a read-only replica, like a typical consumer server and it maintains a change log like a typical supplier server.

Hub suppliers do not actually keep copies of the master data themselves, they only pass it on as it received it from the original master. For the same reason, when a hub supplier receives an update request from a directory client, it refers the client to the master server.

Cascading replication is useful, for example, if some network connections between various locations in your organization are better than others. For example, suppose the master copy of your directory data is in Minneapolis, and you have consumer servers in Saint Cloud as well as Duluth. Suppose, too, that your network connection between Minneapolis and Saint Cloud is very good, but your network connection between Minneapolis and Duluth is of poor quality. Then, if your network between Saint Cloud and Duluth is of acceptable quality, you can use cascaded replication to move directory data from Minneapolis to Saint Cloud to Duluth.

This cascading replication scenario is illustrated as follows:

Figure 6-4    Cascading Replication Scenario

The same scenario is illustrated below from a different perspective. It shows how the replicas are configured on each server (read-write or read-only) and which servers maintain a change log.

Figure 6-5    Replication Traffic and Change logs in Cascading Replication

Mixed Environments

You can combine any of the scenarios outlined in the previous sections to best fit your needs. For example, you could combine a multi-master configuration with a cascading configuration to produce something similar to the scenario illustrated in Figure 6-6.

Figure 6-6    Combined Multi-Master and Cascading Replication

Defining a Replication Strategy


The replication strategy that you define is determined by the service you want to provide:

To determine your replication strategy, start by performing a survey of your network, your users, your applications, and how they use the directory service you can provide. For guidelines on performing this survey, refer to the following section, "Replication Survey."

Once you understand your replication strategy, you can start deploying your directory. This is a case where deploying your service in stages will pay large dividends. By placing your directory into production in stages, you can get a better sense of the loads that your enterprise places on your directory. Unless you can base your load analysis on an already operating directory, be prepared to alter your directory as you develop a better understanding on how your directory is used.

The following sections describe in more detail the factors affecting your replication strategy:

Replication Survey

The type of information you need to gather from your survey to help you define your replication strategy includes:

For example, a site that manages human resource databases or financial information is likely to put a heavier load on your directory than a site containing engineering staff that uses the directory for simple telephone book purposes.

Replication Resource Requirements

Using replication requires more resources. Consider the following resource requirements when defining your replication strategy:

Using Replication for High Availability

Use replication to prevent the loss of a single server from causing your directory to become unavailable. At a minimum you should replicate the local directory tree to at least one backup server.

Some directory architects argue that you should replicate three times per physical location for maximum data reliability. How much you use replication for fault tolerance is up to you, but you should base this decision on the quality of the hardware and networks used by your directory. Unreliable hardware needs more backup servers.


Note  

You should not use replication as a replacement for a regular data backup policy. For information on backing up your directory data, refer to the Netscape Directory Server Administrator's Guide.




If you need to guarantee write-failover for all you directory clients, you should use a multi-master replication scenario. If read-failover is sufficient, you can use single-master replication.

LDAP client applications can usually be configured to search only one LDAP server. That is, unless you have written a custom client application to rotate through LDAP servers located at different DNS hostnames, you can only configure your LDAP client application to look at a single DNS hostname for a Directory Server. Therefore, you will probably need to use either DNS round robins or network sorts to provide fail-over to your backup Directory Servers. For information on setting up and using DNS round robins or network sorts, see your DNS documentation.

Using Replication for Local Availability

Your need to replicate for local availability is determined by the quality of your network as well as the activities of your site. In addition, you should carefully consider the nature of the data contained in your directory and the consequences to your enterprise in the event that the data becomes temporarily unavailable. The more mission critical this data is, the less tolerant you can be of outages caused by poor network connections.

You should use replication for local availability for the following reasons:

Using Replication for Load Balancing

Replication can balance the load on your Directory Servers in several ways:

One of the more important reasons to replicate directory data is to balance the work load of your network. When possible, you should move data to servers that can be accessed using a reasonably fast and reliable network connection. The most important considerations are the speed and reliability of the network connection between your server and your directory users.

Directory entries generally average around one KB in size. Therefore, every directory lookup adds about one KB to your network load. If your directory users perform around ten directory lookups per day, then for every directory user you will see an increased network load of around 10,000 bytes per day. Given a slow, heavily loaded, or unreliable WAN, you may need to replicate your directory tree to a local server.

You must carefully consider whether the benefit of locally available data is worth the cost of the increased network load because of replication. For example, if you are replicating an entire directory tree to a remote site, you are potentially adding a large strain on your network in comparison to the traffic caused by your users' directory lookups. This is especially true if your directory tree is changing frequently, yet you have only a few users at the remote site performing a few directory lookups per day.

For example, consider that your directory tree on average includes in excess of 1,000,000 entries and that it is not unusual for about ten percent of those entries to change every day. If your average directory entry is only one KB in size, this means you could be increasing your network load by 100 MB per day. However, if your remote site has only a few employees, say 100, and they are performing an average of ten directory lookups a day, then the network load caused by their directory access is only one MB per day.

Given the difference in loads caused by replication versus that caused by normal directory usage, you may decide that replication for network load-balancing purposes is not desirable. On the other hand, you may find that the benefits of locally available directory data far outweigh any considerations you may have regarding network loads.

A good compromise between making data available to local sites without overloading the network is to use scheduled replication. For more information on data consistency and replication schedules, refer to "Data Consistency".

Example of Network Load Balancing

Suppose your enterprise has offices in two cities. Each office has specific subtrees that they manage as follows:





Each office contains a high-speed network, but you are using a dial-up connection to network between the two cities. To balance your network load:

Example of Load Balancing for Improved Performance

Suppose that your directory must include 1,500,000 entries in support of 1,000,000 users, and each user performs ten directory lookups a day. Also assume that you are using a messaging server that handles 25,000,000 mail messages a day, and that performs five directory lookups for every mail message that it handles. Therefore, you can expect 125,000,000 directory lookups per day just as a result of mail. Your total combined traffic is, therefore, 135,000,000 directory lookups per day.

Assuming an eight-hour business day, and that your 1,000,000 directory users are clustered in four time zones, your business day (or peak usage) across four time zones is 12 hours long. Therefore you must support 135,000,000 directory lookups in a 12-hour day. This equates to 3,125 lookups per second (135,000,000 / (60*60*12)). That is:

1,000,000 users

10 lookups per user =

10,000,000 reads/day

25,000,000 messages

5 lookups per message =

125,000,000 reads/day

Total reads/day =

135,000,000

12-hour day includes 43,200 seconds

Total reads/second =

3,125



Now, assume that you are using a combination of CPU and RAM with your Directory Servers that allows you to support 500 reads per second. Simple division indicates that you need at least six or seven Directory Servers to support this load. However, for enterprises with 1,000,000 directory users, you should add more Directory Servers for local availability purposes.

You could, therefore, replicate as follows:

Example Replication Strategy for a Small Site

Suppose your entire enterprise is contained within a single building. This building has a very fast (100 MB per second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle your site's load.

In this case, you should replicate at least once to ensure availability in the event your primary server is shut down for maintenance or hardware upgrades. Also, set up a DNS round robin to improve LDAP connection performance in the event that one of your Directory Servers becomes unavailable.

Example Replication Strategy for a Large Site

Suppose your entire enterprise is contained within two buildings. Each building has a very fast (100 MB per second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle the load placed on a server within each building.

Also assume that you have slow (ISDN) connections between the buildings, and that this connection is very busy during normal business hours.

Your replication strategy follows:

Using Replication with other Directory Features


Replication interacts with other Directory Server features to provide advanced replication features. The following sections describe feature interactions to help you better design your replication strategy.

Replication and Access Control

The directory stores ACIs as attributes of entries. This means that the ACI is replicated along with other directory content. This is important because Directory Server evaluates ACIs locally.

For more information about designing access control for your directory, refer to Chapter 7, "Designing a Secure Directory".

Replication and Directory Server Plug-ins

You can use replication with most of the plug-ins delivered with Directory Server. There are some exceptions and limitations in the case of multi-master replication with the following plug-ins:

You cannot use multi-master replication with the attribute uniqueness plug-in at all, because this plug-in can validate only attribute values on the same server not on both servers in the multi-master set.

You can use the referential integrity plug-in with multi-master replication providing that this plug-in is enabled on just one master in the multi-master set. This ensures that referential integrity updates are made on just one of the master servers, and propagated to the other.


Note  

By default, these plug-ins are disabled. You need to use the Directory Server Console or the command line to enable them.




Replication and Database Links

When you distribute entries using chaining, the server containing the database link points to a remote server that contains the actual data. In this environment, you cannot replicate the database link itself. You can, however, replicate the database that contains the actual data on the remote server.

You must not use the replication process as a backup for database links. You must backup database links manually. For more information about chaining and entry distribution, refer to Chapter 5, "Designing the Directory Topology".

Figure 6-7    Replicating Chained Databases

Schema Replication

In all replication scenarios, before pushing data to consumer servers, the supplier server checks whether its own version of the schema is in sync with the version of the schema held on consumer servers.

If the schema entries on both supplier and consumers are the same, the replication operation proceeds.

If the version of the schema on the supplier server is more recent than the version stored on the consumer, the supplier server replicates its schema to the consumer before proceeding with the data replication.

If the version of the schema on the supplier server is older than the version stored on the consumer, you will probably witness a lot of errors during replication because the schema on the consumer cannot support the new data.

A consumer might contain replicated data from two suppliers, each with different schema. Whichever supplier was updated last will "win" and its schema will be propagated to the consumer.


Note  

You must never update the schema on a consumer server because the supplier server is unable to resolve the conflicts that will occur and replication will fail.

Schema should be maintained on a master supplier server in a replicated topology. If using the standard 99user.ldif file, these changes will be replicated to all consumers. When using custom schema files, ensure that these files are copied to all servers after making changes on the master supplier. After copying files, the server must be restarted. Refer to "Creating Custom Schema Files" for more information.




In Directory Server 6.x, the same server can hold read-write replicas for which it acts as a supplier, and read-only replicas for which it acts as a consumer. Therefore, you should always identify the server that will act as a supplier for the schema and set up replication agreements between this master and all other servers in your replication environment which should act as consumers for the schema information..


Note  

Special replication agreements are not required to replicate the schema. If replication has been configured between a supplier and a consumer, schema replication will happen by default.




Changes made to custom schema files are only replicated if the schema is updated using LDAP or the Directory Server Console. These custom schema files should be copied to each server in order to maintain the information in the same schema file on all servers. For more information, refer to "Creating Custom Schema Files".

For more information on schema design, refer to Chapter 3 "How to Design the Schema."



Previous      Contents      Index      DocHome      Next     

© 2001 Sun Microsystems, Inc. Portions copyright 1999, 2002 Netscape Communications Corporation. All rights reserved.


Last Updated August 16, 2002