Determining When SQL Server Causes a Windows NT Blue Screen

Troubleshooting SQL Server

Troubleshooting

Determining When SQL Server Causes a Windows NT Blue Screen

Infrequently, Microsoft Windows NT® may either halt with a STOP screen or hard hang, during which the console is completely frozen and unresponsive. This is commonly called a blue screen. This may sometimes happen on a computer on which Microsoft® SQL Server™ is running, or may coincide with a particular SQL Server operation such as the bcp utility, a long-running query, and so on.

Most of the time, this indicates an operating system, device-driver, or hardware problem and should be pursued as such. The Windows NT user or kernel mode process isolation ensures that a user mode application problem does not cause the operating system to stop responding. This section presents exceptions to this and ways to determine whether to troubleshoot the problem at the system or application layer.

Sometimes the cause of a computer hard hang or blue screen may be a nonmaskable interrupt (NMI) error. This is sometimes visible as an error code stating NMI, parity check, or I/O parity check. NMI errors are almost always hardware. Usually they are caused by a memory failure; however, they can originate in other hardware subsystems such as video boards. Even if the NMI error happens only during certain SQL Server operations, and if the system passes initial hardware diagnostics, it should still be considered a hardware problem and pursued as such. It may be necessary to use a dedicated memory SIMM testing device, which can often find a transient memory error that eludes software-based diagnostics.

Processes exist on Windows NT in either user mode or kernel mode (sometimes called supervisor or privileged mode). In the Intel® x86 architecture, user mode maps to ring 3 and kernel mode to ring 0 of the 4-ring protection system. The x86 architecture has been carried forward with little change in all Intel and compatible processors to date, including the Pentium Pro and Pentium II. Processors such as the Alpha AXP typically have unprivileged and privileged modes as well.

Kernel mode is a privileged processor mode in which a thread has access to system-wide memory (including that of all user-mode processes) and to hardware. By contrast, user mode is a nonprivileged processor mode in which a thread can only access system resources by calling system services.

A user mode process cannot access kernel mode memory, or access memory of another user mode process. This is enforced by processor hardware, in conjunction with kernel mode data structures such as Page Tables. 

As a result of this protection system, a user mode application generally cannot stop responding, cause a blue screen, or otherwise cause a failure in the Windows NT operating system. Such problems should be pursued primarily at the system layer as an operating system, device-driver, or hardware issue.

While an application error cannot cause a failure in the operating system, an operating system error can cause an application to stop responding. This is because of the general rule: applications must call inward (to kernel mode), but the operating system can reference outward to user mode freely at any time. A microkernel-influenced architecture such as Windows NT may in turn dispatch certain work to a user-mode system process rather than perform the work in kernel mode. However, the overall principle remains the same: Processor hardware enforces process context isolation, which prevents one process from causing a failure in another, whether one or both are in user mode.

If a user mode application passes an invalid parameter in a Win32® API call, it is the responsibility of the operating system to validate this parameter. In very rare cases, passing an invalid parameter may cause a Windows NT blue screen error. However, this is an operating system issue, and should be debugged and pursued as such.

See Also

bcp Utility