Failover Cluster Troubleshooting
This topic provides information about:
- Resolving the most common Microsoft® SQL Server™ 2000 failover clustering usage issues.
- Optimizing failover cluster performance.
- Using failover clustering with extended stored procedures that use COM objects.
Resolving Common Usage Issues
The following list describes common usage issues and explains how to resolve them:
- SQL Server 2000 cannot log on to the network after it migrates to another node.
SQL Server service account passwords must be identical on all nodes or else the node cannot restart a SQL Server service that has migrated from a failed node.
If you change the SQL Server service account passwords on one node, you must change the passwords on all other nodes. However, if you change the account using SQL Server Enterprise Manager, this task will be done automatically.
- SQL Server cannot access the cluster disks.
A node cannot recover cluster disks that have migrated from a failed node if the shared cluster disks use a different letter drive. The disk drive letters for the cluster disks must be the same on both servers. If they are not, review your original installation of the operating system and Microsoft Cluster Service (MSCS). For more information, see the Microsoft Windows NT® 4.0, Enterprise Edition, Windows® 2000 Advanced Server, or Windows 2000 Datacenter Server documentation.
- You do not want a failure of a service, such as full-text search or SQL Server Agent, to cause a failover.
To prevent the failure of specific services from causing the SQL Server group to fail over, configure those services using Cluster Administrator in Windows NT 4.0 or Windows 2000. For example, to prevent the failure of the Full-Text Search service from causing a failover of SQL Server, clear the Affect the Group check box on the Advanced tab of the Full Text Properties dialog box. However, if SQL Server causes a failover, the full-text search service will restart.
- SQL Server will not start automatically.
You cannot start a failover cluster automatically using SQL Server. You must use Cluster Administrator in MSCS to automatically start a failover cluster.
- The error message "No compatible resource groups found" is displayed during SQL Server Setup.
This error is caused by the Microsoft Distributed Transaction Coordinator (MS DTC) setup on Windows NT 4.0, Enterprise Edition. MS DTC requires a group containing a network name, IP address, and shared cluster disk to be owned by the local node when the Setup program is run. If this error is displayed, open Cluster Administrator and make certain there is a group that meets these requirements owned by the local node. The easiest way to do this is to move a disk into the cluster group that already contains a network name and IP address. After you have this group on the local node, click Retry.
- The error message "All cluster disks available to this virtual server are owned by other node(s)" is displayed during Setup.
This message is displayed when you select the drive and path for installing data files, and the drive you selected is not owned by the local node. Move the disk to the local node using Cluster Administrator.
- The error message "Unable to delete SQL Server resources. They must be manually removed. Uninstallation will continue." is displayed during SQL Server Setup.
This message is displayed if SQL Server Setup cannot delete all of the SQL Server resources. You must go into Control Panel and uninstall the instance you were trying to remove on every node.
- You cannot enable the clustering operating system error log.
The operating system cluster error log is used by MSCS to record information about the cluster. Use this error log to debug cluster configuration issues. To enable the cluster error log, set the system environment variable CLUSTERLOG=<path to file> (for example, CLUSTERLOG=c:\winnt\cluster\cluster.log). This error log is on by default in Windows 2000.
- If the Network Name is offline and you cannot connect using TCP/IP, you must use Named Pipes.
To connect using Named Pipes, create an alias using the Client Network Utility to connect to the appropriate computer. For example, if you have a cluster with two nodes (Node A and Node B), and a virtual server (Virtsql) with a default instance, you can connect to the server that has the Network Name resource offline by doing the following:
- Determine on which node the group containing the instance of SQL Server is running by using the Cluster Administrator. For this example, it will be Node A.
- Start the SQL Server service on that computer using net start. For more information about using net start, see Starting SQL Server Manually.
- Start the SQL Server Network Utility on Node A. View the pipe name on which the server is listening. It should be similar to \\.\$$\VIRTSQL\pipe\sql\query.
- On the client computer, start the Client Network Utility.
- Create an alias SQLTEST1 to connect via Named Pipes to this pipe name. To do this, put Node A as the server name and edit the pipe to be \\.\pipe\$$\VIRTSQL\sql\query. Connect to this instance using the alias SQLTEST1 as the server name.
For more information, see Client Net-Libraries and Network Protocols.
- Determine on which node the group containing the instance of SQL Server is running by using the Cluster Administrator. For this example, it will be Node A.
Optimizing Failover Clustering Performance
To optimize performance when using failover clustering, consider the following:
- If your disk controller is not external to your clustered computer, you must turn off write-caching within the controller to prevent data loss during a failover.
- Write-back caching cannot be used on host controllers in a cluster without hindering performance. However, if you use external controllers, you continue to provide performance benefits. External disk arrays are not affected by failover clustering and can sync the cache correctly, even across a SCSI bus.
- It is recommended that you do not use the cluster drive for file shares. Using these drives impacts recovery times and can cause a failover of the cluster group due to resource failures.
Using Extended Stored Procedures and COM Objects
When you use extended stored procedures with a failover clustering configuration, all extended stored procedures need to be installed on the shared cluster disk. This is to ensure that when a node fails over, the extended stored procedures can still be used.
If the extended stored procedures use COM components, the administrator needs to register the COM components on each node of the cluster. The information for loading and executing COM components must be in the registry of the active node in order for the components to be created. Otherwise, the information will remain in the registry of the computer on which the COM components were first registered. For more information, see Extended Stored Procedure Architecture.