Data Needs and Characteristics

SQL Replication

Replication

Data Needs and Characteristics

While examining the data that you are replicating, consider the following:

  • Collation (defines code page or character set and data sorting)

  • Data types
Character Sets

If replication is implemented between servers using different character sets, Microsoft® SQL Server™ 2000 does not convert any of the replicated data and may mistranslate the data as it is replicated because it is impossible to map all characters between character sets.

If you can guarantee that all characters you use will have identical codes on all code pages, replication would be successful, but it would not be guaranteed. Similarly, the comparison style specified by the collation you select can affect the accuracy of replicated transactions. To guarantee successful data replication, servers are best when configured using the same code pages and comparison styles.

Generally, if you have an environment where you have different character sets, you should consider using Unicode data types for which no conversion is necessary.

Data Types

When determining data to replicate, consider the data type. You should understand the following:

  • timestamp columns. For merge replication or transactional replication with the queued updating option, when articles contain a timestamp column, the timestamp column is replicated, but the literal timestamp values are not. The timestamp values are regenerated when applying the initial snapshot rows at the Subscriber. This allows timestamp to continue using optimistic concurrency control (a frequent usage). For snapshot and transactional publications, and publications that allow immediate updating, the literal values for a timestamp column are replicated, but the data type for the replicated values is changed to binary (8) on the Subscriber. For more information, see Replication Data Considerations.

  • uniqueidentifier columns. If you are using merge replication, or if you are using snapshot replication or transactional replication with queued updating and the table that is being replicated does not have a uniqueidentifier column, SQL Server 2000 will add one when you create a publication. In merge replication, this occurs when the initial snapshot is generated. In snapshot replication or transactional replication using the queued updating option, this occurs when the publication is created. In the case of queued updating, a predefined uniqueidentifier column will be added for row versioning irrespective of the presence of a globally uniqueidentifier (GUID) column in the table. The use of the uniqueidentifier in queued updating is conceptually like a global timestamp. To ensure that merge replication will reuse an existing uniqueidentifier column to uniquely identify replicated rows, make sure that your uniqueidentifier column is created with the column property ROWGUIDCOL. The use of the GUID in merge replication is conceptually like the use of a global primary key.

  • Columns with text or image data types. These columns can take longer to replicate because they can be very large. When using snapshot replication or transactional replication with the immediate updating or queued updating options, updates made at the Subscriber to replicated data with text or image data types are not supported. However, replication and updating of these columns is fully supported when not using updatable subscriptions. Publishing text and image data types is also supported in merge replication.

  • Case sensitivity. Generally, you should choose the same collation scheme (as the most common setting is case sensitivity) at the Publisher and at the Subscriber. For more information, see Specifying Collations.

For example, suppose you are publishing data about customers and you do not choose the same collation scheme at the Publisher that is at the Subscriber. Data is then filtered based on state="Ca" for a particular Subscriber. The data that is published to the Subscriber may not be the data that you intended because of differences in collation. Choosing the same collation scheme is not required, and depending on your application requirements, you may want to choose a different collation scheme (for example, a Publisher might have data that is case-sensitive, but a Subscriber that is a reporting server may have data that is case-insensitive).

  • Triggers. Consider triggers that reside on the publishing table. By default, the triggers will be published with data from that table. If you do not want triggers on the publishing table to be published with data, you can change an option in the properties for a specific publication. For more information, see Publishing Data and Database Objects and Using NOT FOR REPLICATION.

  • Row size. Is the row size greater than the maximum of 6,000 characters for merge replication and 8,000 characters for transactional replication? (Size limits exclude columns with text and image data types.)

  • Data type mapping. Do you need to support Subscribers running on an instance of SQL Server 7.0 or earlier, or Subscribers that are not running on a version of SQL Server? SQL Server 2000 has new data types that servers running earlier versions of SQL Server cannot replicate. If so, you should know how the data types map between the different databases. For more information, see Data Type Mapping.

  • Column-level or database collations. Depending on which collation you use, retrieving the data may be different at different Subscribers.