Partitioning Data

SQL Server Architecture

SQL Server Architecture

Partitioning Data

The first step in building a set of federated database servers is to horizontally partition the data in a set of tables across multiple servers. Horizontally partitioning a table refers to dividing a table into multiple smaller tables, called member tables. Each member table has the same format as the original table, but only part of the rows. Each table is placed on a separate resource (files or servers) to spread the processing load across the resources. For example, a company assigns customer identifiers (IDs) from 1 through 9999999. The Customers table may be partitioned into three member tables, with each member table having an equal customer ID range.

If used without views, horizontal partitioning would require applications to have logic to determine which member tables have the data requested by the user and dynamically build SQL statements referencing the tables. The application would require complex queries joining the member tables. Changing the member tables would also involve recoding the application. Views solve the problem by making the member tables look like one table. The SQL UNION operator combines result sets with identical formats into one. Because all the member tables have the same format, the result of SELECT * statements for each table have the same format, and can be combined using the UNION clause to form a single result set that operates similarly to the original table. For example, the Customers table has been partitioned across three servers (Server1, Server2, and Server3). The distributed partitioned view defined on Server1 is:

CREATE VIEW Customers
AS
SELECT * FROM Customers_33
   UNION ALL
SELECT * FROM Server2.CustomerDB.dbo.Customers_66
   UNION ALL
SELECT * FROM Server3.CustomerDB.dbo.Customers_99

This view makes the actual location of the data transparent to an application. When a SQL statement is executed on Server1 that references the Customers partitioned view, the application has no visibility to where the data is located. If some of the rows required to complete the SQL statement reside on Server2 or Server3, the instance of SQL Server on Server1 automatically generates a distributed query that pulls in the required rows from the other servers. This transparency allows database administrators to repartition tables without recoding applications. If the Customers view is updatable, the behavior of the view is the same as a table named Customers.

Local partitioned views reference member tables on one server. Distributed partitioned views reference member tables on multiple servers. A server containing a member table is called a member server, and a database containing a member table is called a member database. Each member server contains one member table and a distributed partitioned view. An application that references the partitioned view on any of the servers gets the same results as if a complete copy of the original table were present on each server.

Microsoft SQL Server 2000 and Microsoft SQL Server version 7.0 support partitioned views; however, SQL Server 2000 introduces key features that allow the views to scale out and form federations of database servers:

  • SQL Server 2000 partitioned views are updatable. This is crucial for distributing data so that the location of the data is transparent to the application. Updatable views support the full behavior of the original table; nonupdatable views are like read-only copies.

  • The SQL Server 2000 query optimizer supports new optimizations that minimize the amount of distributed data that has to be transferred. The distributed execution plans generated by SQL Server 2000 result in good performance for a larger set of queries than the plans generated by SQL Server version 7.0.

SQL Server 2000 partitioned views are best suited for the types of SQL statements generated by Web sites and online transaction processing (OLTP) systems.

Partitioning a Database

To build an effective federation of database servers:

  • Create multiple databases, each on a different member server running an instance of SQL Server 2000.

  • Partition the individual tables in the original database so that most related data is placed together on a member server. This may require different methods of distributing the data in the various tables across all the member databases; partitioning some tables; making complete copies of other tables in each member database; and leaving some tables intact on the original server.

  • Devise data routing rules that can be incorporated in the business services tier, so that applications can send each SQL statement to the member server that stores most of the data required by the statement.

The most important goal is to minimize distributed processing in such a system. You must be able to collocate related data on the same member server, and then route each SQL statement to a member server that contains most, if not all, of the data required to process the statement. For example, you may find that all the sales, customer, sales personnel, and inventory tables in a database can be partitioned by sales region, and that most SQL statements only reference data in a single region. You can then create member servers where each server has the horizontally partitioned data for one or more regions. If applications can identify the region currently referenced in the user's input, the application can submit any generated SQL statement to the member server containing the data for that region. The only SQL statements that will generate distributed queries are those that reference data from multiple regions.