Full-Text Search Recommendations
Full-text search allows word- or phrase-based indexing of character data in Microsoft® SQL Server™ 2000 tables. Full-text search consists of these basic components:
- Full-text indexing enables the creation and population of the full-text catalogs, which are maintained outside of SQL Server and managed by the Microsoft Search service.
- Full-text search uses the new Transact-SQL predicates (CONTAINS, CONTAINSTABLE, FREETEXT, and FREETEXTTABLE) to query these populated full-text catalogs.
Full-Text Indexing
If you are full-text indexing tables that have less than a million rows, very little performance tuning is required. If you full-text index large SQL Server tables that contain millions of rows that create large full-text catalogs, this will sustain heavy read and write activity, so you must configure SQL Server and the full-text catalogs to maximize disk I/O performance by load balancing across multiple hard disk drives. You will also need to consider hardware configurations, Microsoft Windows® 2000 or Windows NT® 4.0 system configurations, and SQL Server 2000 configurations, as well the actual location of the full-text catalogs and database files.
Hardware Considerations
- Multiple CPUs: One to four 500 MHz Xeon III processors.
- Memory: 1 to 4 GB of physical RAM.
- Multiple disk controllers with several channels or a single disk controller with multiple channels.
- Disk I/O sub-systems: RAID0 (disk striping with no fault-tolerance protection), RAID0+1 and RAID5.
Windows 2000 or Windows NT 4.0 System Configuration Considerations
- If you are installing SQL Server on Windows NT Server 4.0, the pagefile.sys file needs to be sized at 1.5 to 2 times the amount of available physical RAM. This consideration can be avoided by installing SQL Server on Windows 2000 Server with larger amounts of RAM.
- Pagefile.sys files need to be placed on their own drives (RAID0 or RAID0+1), preferably on a separate controller or least a separate channel off a shared controller.
SQL Server Configuration Considerations
- After a full population of a large table (greater than 1 million rows), consider using the new feature Change Tracking along with Update Index in Background and Update Index versus Incremental Population. For more information about when to use Change Tracking versus Timestamp-based incremental populations, see Maintaining Full-Text Indexes.
Full-Text Indexing and Catalog Considerations
- Full-text indexing or populating the full-text catalogs should be done during periods of low system activity, typically during database maintenance windows.
- Place the full-text catalog files on either its own disk controller or off a separate channel on a single disk controller with multiple channels.
- Place the database files on a separate disk controller from the full-text catalog files or off a separate channel on a single disk controller with multiple channels.
- The full-text indexing of SQL tables with 4 million to 20 millions rows can take many hours or days to complete. Consider options offered in Knowledge Base Article Q240867, "INF: How to Move, Copy and Backup SQL 7.0 Full-Text Catalog Folders and Files."
Full-Text Search
- If you are full-text searching tables that have less than a million rows, there is little performance tuning required. (A million rows is just a general break point.) If you are going to be full-text searching tables that have more than a million rows, consider the appropriate full-text search predicate, CONTAINS versus CONTAINSTABLE or FREETEXT versus FREETEXTTABLE, as well as the average number of rows and query timeout considerations.
- Use CONTAINSTABLE or FREETEXTTABLE with the new top_n_by_rank parameter to restrict the number of rows returned. Top_n_by_rank specifies that only the n-highest ranked matches, in descending order, be returned. Applies only when an integer value, n, is specified. In addition, you should consider using the TOP clause to limit the number of rows returned in the result set with CONTAINTSTABLE or FREETEXTTABLE. Review the Knowledge Base Article Q240833, "FIX: Full-Text Search Performance Improved via Support for TOP" for more details.
If you are attempting to limit the results from a full-text query with additional WHERE clauses, the WHERE clauses are applied after the JOIN with the SQL table results, not before. Otherwise, the result set would be incorrect because qualifying rows would be omitted from the result set without any notification to the client. To limit the results from a full-text search query, use the Top_N_Rank parameter from the CONTAINSTABLE or FREETEXTTABLE predicates.
- If you are using SQL Server full-text search by means of a Web or Microsoft Internet Information Services (IIS) interface and searching against large tables (greater than 1 million rows), consider increasing the IIS query timeout default of 20 seconds to 30 seconds if you are using the CONTAINS or FREETEXT predicates.
- If you are using multiple CONTAINS or FREETEXT predicates in your SQL query and are experiencing poor full-text search query performance, reduce the number of CONTAINS or FREETEXT predicates or using "*" to use all full-text indexed columns in your query.
You also may encounter Error 7619, "The query contained only ignored words" when using any of the full-text predicates in a full-text query, such as CONTAINS(pr_info, 'between AND king'). The word "between" is an ignored or noise word and the full-text query parser considers this an error, even with an OR clause. Consider rewriting this query to a phrase-based query, removing the noise word, or options offered in Knowledge Base article Q246800, "INF: Correctly Parsing Quotation Marks in FTS Queries". Also, consider using Windows 2000 Server: there have been some enhancements to the word-breaker files for Indexing Services.
- What is RANK and how is it determined when used with CONTAINSTABLE and FREETEXTTABLE predicates? Full-text RANK values are based on the frequency of rows that contain the unique word. A factor that plays a part in determining the RANK value of the returned row is the frequency of the unique word in the full-text indexed column for that row. Another factor is the total number of unique word occurrences in the table (this serves to normalize the probabilities). The RANK values returned in the result set are relative to each other. Therefore, it is not possible to interpret the RANK value as a percentage or group the RANK values into high/medium/low ranges. Think of RANK as a method to order the results for a specific query and result set.
There are also full-text indexing and searching considerations when determining whether to include multiple SQL tables in one full-text catalog versus one SQL table per full-text catalog. There is a trade-off between performance and maintenance when considering this design question with large SQL tables and you may want to test both options for your environment. If you choose to have multiple SQL tables in one full-text catalog, you incur the overhead of longer-running full-text search queries as well because incremental populations will force the full-text indexing of all other SQL tables in that full-text catalog. If you choose to have a single SQL table per full-text catalog and have multiple SQL tables full-text indexed, you have the overhead of maintaining separate full-text catalogs with a total limit of 256 full-text catalogs per server.