Distributed Update Factors

SQL Replication

Replication

Distributed Update Factors

If distributed data does not need to be updated at more than one site, data can easily maintain the ACID properties of transactions. However, when you need to update data at multiple sites, you should consider how the ACID properties of transactions and site autonomy are going to be affected.

ACID Properties

To qualify as a transaction, a single unit of work must adhere to the ACID properties of atomicity, consistency, isolation, and durability.

  • Atomic. For a transaction to be atomic, all of its data modifications are performed or none of them are performed.

  • Consistent. To be consistent, a completed transaction must leave all data in a consistent, logically correct state.

  • Isolation. To meet the isolation property, a transaction reads data in the state it was in before another concurrent transaction modified it (without yet committing the transaction). Concurrent modifications that are in progress do not affect the transaction.

  • Durable. To meet the durability property, the modifications of a transaction will persist (for example, remain in the database, even if there is a system failure). After a commit is acknowledged, the system must guarantee that the transaction persists.

Your needs for strict adherence to ACID properties are significant when planning for replication because when data modifications are made at multiple Subscribers independently, conflicts can occur. If conflicts are allowed, strict ACID characteristics cannot be guaranteed even with conflict detection and resolution. If you are considering merge replication or transactional replication with the queued updating option, you need to prepare for how to handle transactions that do not meet these properties.

Two-phase commit protocol (2PC) is required to guarantee ACID properties in a distributed, multiple-update environment. However, this means that the sites are dependent on one another for completion of an update, and they will give up site autonomy.

For more information about ACID properties, see Transactions.

Questions relating to ACID properties include:

  • Do multiple Subscribers need to make updates? If replicated data is going to be read-only, ACID properties will not be affected.

  • If updates need to be made at multiple sites, can you allow conflicts? Is the data filtered into different partitions for different sites? If you need to preserve transaction isolation and durability, you must avoid conflicts.

  • Is it acceptable for a committed transaction to be undone to resolve a conflict?

  • Is it acceptable that subsequent transactions are changed based on the value of a transaction that was undone due to a conflict?

If ACID properties must be preserved, you can use 2PC so that the Publisher accepts any changes before a conflict could exist, execute all updates at one site, or filter data so sites can update unique subsets of data and avoid conflicts with other sites.

ACID Properties and Replication

When designing replication, determine whether ACID properties need to be maintained and how much autonomy is required by your application.

When thinking about ACID properties in regards to replication, consider whether data at any participating site must be the same data that would have resulted had all transactions been performed at only one site. If you made all data modifications at one site, your transactions would typically be consistent, isolated, and durable. Consider if you also have those needs in your distributed environment.

Latency refers to the period of time between when data is updated at one site (the Publisher) and when those changes appear at another site (the Subscriber). The latency can vary from a few seconds to hours, days, or longer.

Questions relating to ACID properties in your replication application include:

  • Does data need to be updated at Subscribers?

  • How much latency is acceptable?

To maintain strict ACID properties, you will often have to give up site autonomy because servers must be continuously and reliably connected. That is the only way to guarantee you avoid conflicts. If you allow conflicts, some transactions must be altered or undone to resolve the conflict. Therefore, at least some transactions were not durable, and perhaps other transactions that read the values of the non-durable transaction were not isolated.

Autonomy

Autonomy is the degree of dependence one site has on another. Complete autonomy occurs when one site does not depend on any other site to complete its work, and it is independent of the operations at any other site.

2PC is an example of a nonautonomous process because every data change is dependent on every other participating site being able to accept the transaction successfully and immediately. But in replication, 2PC is optimized to be dependent on only two servers in the replication topology: the Publisher and the Subscriber making the update, with the Publisher as the arbiter.

Merge replication or transactional replication with queued updating is often used when sites need to modify data autonomously and then later merge changes with changes made at the Publisher and at other Subscribers.

With merge replication, data converges and all sites end up with the same values; however, because conflicts can occur and are resolved, the values are not necessarily the ones that would have resulted had all the work been done at only one site. All sites may work offline and when all sites have synchronized data, all sites will eventually have the same data. However, because the same data is being changed at multiple locations, conflicts can occur and some transactions from one site will be committed while others will be rejected and resolved. Those transactions by definition are not durable.

Example

Northwind Traders has a Publisher with local Distributor at the corporate headquarters that publishes customer and orders information using merge replication to four regional sales offices. Data is not filtered and published in partitions, so data modifications to the same rows can occur at multiple sites.

A sales representative in the Northwest office changes the customer information for the company named White Clover Markets by changing the value of the customer phone number to (206) 554-2341. A sales representative in the Southwest office uses replicated data at that site and changes the fax number for White Clover Markets to (206) 555-8314. Another sales representative at the publishing site in the corporate headquarters changes the phone number of White Clover Markets to (206) 554-2241. When the Subscribers merge with the Publisher, the conflicts will be detected and depending on the conflict resolution policy, it will be resolved. It is possible that White Clover Markets ends up with the new phone number that was entered at the corporate office, (206) 554-2241, and that the new fax number entered at the Southwest office is rolled back and the original fax number is maintained in the database.

If sites are autonomous, ACID properties cannot be assured. For example, merge replication allows sites to be autonomous and to update replicated data whether online or offline. It does not, however, guarantee durability. If conflicts are to be resolved, then a committed transaction must be altered in order to resolve the conflict. Instead, it focuses on data convergence, the merging of changes made at various sites into a new result set.

Questions relating to autonomy include:

  • How independent do the various sites need to be?

  • Are sites continuously and reliably connected, or are they disconnected for periods of time?

  • Is preserving ACID properties more important than autonomy?