Managing Identity Values

SQL Replication

Replication

Managing Identity Values

You can manage identity values by:

  • Allowing Microsoft® SQL Server™ 2000 replication to automatically manage identity columns by dynamically allocating ranges of identity values to the Publisher and all the Subscribers.

  • Using the Transact-SQL NOT FOR REPLICATION option when defining the identity column.

  • Using a primary key other than the identity column (for example, a composite key or a rowguid column), if an identity column is not necessary. This strategy eliminates the overhead of managing identity columns on the replicated data.
Automatic Identity Range Handling

The simplest way of handling identity ranges across replicas is to allow SQL Server 2000 to manage identity range handling for you. To use automatic identity range handling, you must first enable the feature at the time the publication is created, assign a set of initial Publisher and Subscriber identity range values, and then assign a threshold value that determines when a new identity range is created.

For example, assigning an identity range from 1000 through 2000 to a Publisher, and a range from 2001 through 3000 to an initial Subscriber, works as follows when combined with a threshold value of 80 percent:

  • Newly inserted Publisher rows are assigned identity values from 1000 through 2000. Newly inserted rows on the initial Subscriber will sequence from 2001 through 3000.

  • When 80 percent of either the Publisher identity values or the Subscriber identity values are used, a new identity range is created for forthcoming inserts. In this example, if rows from 1001 through 1800 are used on the Publisher, the threshold has been reached. A new identity range, from 3001 through 4000, is created on the Publisher, and the next inserted row at the Publisher is assigned an identity value of 3001. After the Subscriber reaches the threshold (assuming the Subscriber reached threshold after the Publisher), a new identity range is created on the Subscriber, from 4001 through 5000, and the next inserted row at the Subscriber is assigned an identity value of 4001. The process is repeated as identity ranges are used.

  • As each Subscriber is added, an identity range that is the same size as the initial Subscriber range is added, using the next available starting point.

The threshold setting avoids situations where the Subscribers run out of identity values and become unable to insert new rows until the Distribution Agent or Merge Agent synchronizes with the Subscriber. However, setting the threshold value too low can generate large numbers of unused identity values. The threshold value should be set carefully by evaluating the update frequency at the Subscriber and the synchronization schedule.

For transactional articles enabled for identity range management, the identity ranges at both the Publisher and Subscriber need to be checked and adjusted periodically. The Log Reader Agent does this at the Publisher and the Distribution Agent does this at the Subscriber.

If a Log Reader Agent or Distribution Agent is not running in continuous mode, the check and possible adjustment will be done after all the commands have been processed. When one of the agents is in continuous mode, the check and possible adjustment will be done in a time interval of 10 times the polling interval of the agent after all the commands have been processed. After the agent is started, the first check will be done as soon as the commands have been processed.

Run the Log Reader Agent or the Distribution Agent to adjust the Publisher or Subscriber when the server is out of its identity range. If the agent is running in continuous mode, you may need to restart it for the identity range to be adjusted immediately.

You can also execute sp_adjustpublisheridentityrange to explicitly adjust the identity range at the Publisher based on threshold value for either transactional or merge publications.

You enable automatic identity range handling:

  • In SQL Server Enterprise Manager, in the Publication Properties dialog box.

  • By setting the following options in the sp_addmergearticle stored procedure.
Parameter Values Description
@auto_identity_range TRUE or FALSE Enable (TRUE) or disable (FALSE) automatic identity range handling.
@pub_identity_range Integer values of range (for example, from 1001 through 2000) Identity range for the Publisher.
@identity_range Integer values of range (for example, from 2001 through 3000) Identity range for the initial Subscriber; length of range used for additional Subscribers.
@threshold Integer value for percent threshold (for example, 90 is equivalent to 90 percent) Percent of total identity values used on replica that trigger creation of new identity range.

Manual Identity Range Handling

You can also manage identity values using a check constraint and the NOT FOR REPLICATION option on the IDENTITY property of a Transact-SQL CREATE TABLE statement. Use the NOT FOR REPLICATION option to specify identity ranges programmatically, or if you are upgrading an existing instance of SQL Server where identity ranges are already being managed through Transact-SQL statements.

Using the NOT FOR REPLICATION statement informs SQL Server 2000 that the replication process gets a waiver when supplying an explicit value and that the local identity value should not be reseeded. Each Publisher using this option receives a reseeding waiver.

The following code example illustrates how to implement identities with different ranges at each Publisher:

  • At Publisher A, start at 1 and increment by 1.
    CREATE TABLE authors ( COL1 INT IDENTITY (1, 1) NOT FOR REPLICATION PRIMARY KEY ) 
    
  • At Publisher B, start at 1001 and increment by 1.
    CREATE TABLE authors ( COL1 INT IDENTITY (1001, 1) NOT FOR REPLICATION PRIMARY KEY ) 
    

After activating the NOT FOR REPLICATION option, connections from replication agents to Publisher A insert rows with values such as 1, 2, 3, 4. These are replicated to Publisher B without being changed (that is, 1, 2, 3, 4). Connections from replication agents at Publisher B receive values 1001, 1002, 1003, and 1004. Those are replicated to A without being changed. When all data is distributed or merged, both Publishers have values 1, 2, 3, 4, 1001, 1002, 1003, and 1004. The next locally inserted value at Publisher A is 5. The next locally inserted value at Publisher B is 1005.

It is recommended that you always use the NOT FOR REPLICATION option along with the CHECK constraint to ensure that the identity values being assigned are within the allowed range. For example:

CREATE TABLE sales

(sale_id INT IDENTITY(100001,1)

NOT FOR REPLICATION

CHECK NOT FOR REPLICATION (sale_id <= 200000),

sales_region CHAR(2),

CONSTRAINT id_pk PRIMARY KEY (sale_id)

)

Even if someone used SET IDENTITY INSERT, all values inserted locally must obey the range. However, a replication process is still exempt from the check.

Using Other Columns as Primary Keys

If using an identity column is not a requirement, you can eliminate the overhead of managing the uniqueness of identity values in replicated data by using another column as the primary key, or using combinations of columns as the primary key.

For example, you can define a primary key, consisting of an identity column whose values are not unique and a second column, that when combined with the identity column guarantees uniqueness (for example, a site ID column, pk_id_plus_site). In this example, the composite key pk_id_plus_site is a combination of the identity and site columns. In replication, duplicate identity values can be created only at different sites; therefore, each primary key value in this case will always be unique.

ROWGUIDCOL is a property you can assign to a column with uniqueidentifier values, a SQL Server 2000 data type that defines a 128-bit integer guaranteed to be unique. As such, using a rowguid column as a primary key is a safe alternative to using an identity column to guarantee uniqueness.