Full-text Querying of File Data

Accessing and Changing Relational Data

Accessing and Changing Relational Data

Full-text Querying of File Data

Microsoft® SQL Server™ supports textual queries against data residing in the file system, as well as SQL Server data. Products and features that support this capability include SQL Server distributed queries, Microsoft Internet Information Services 4.0, and Microsoft Indexing Service version 2.0.

A large portion of digitally stored information is still in the form of unstructured data, primarily text, stored in the file system. This information is often related to data within the database, which requires that queries be run against both sources. However, it is often inappropriate to import this data from the file system. Distributed queries coupled with extensions to the SQL language make it possible to write such queries without the data. This is known as file content search.

There are two major types of textual queries:

Property search

Applies filters to documents to extract properties, such as author, subject, type, word count, printed page count, and time last written, and then issues queries against those properties.

Full-text search

Creates indexes of all nonnoise words in the documents, and then uses these indexes to support linguistic searches and proximity searches.

For example, the following query selects the names, sizes, and authors of all Microsoft Word files on the D drive that contains the phrase "SQL Server" in close proximity to text. It then joins this with the writers table to obtain the author's citizenship.

SELECT Q.FileName, Q.Size, Q.DocAuthor, W.Citizenship 
FROM OpenQuery(MyLinkedServer, 
               'SELECT FileName, Size, DocAuthor
                FROM SCOPE('' "D:\" '')
                WHERE CONTAINS(''"SQL Server"
                                 NEAR() text'')
                AND FileName LIKE ''%.doc%'' '
              ) > 0 AS Q,
     writers AS W
WHERE Q.DocAuthor = W.writer_name

File content search relies upon the Microsoft OLE DB Provider for Microsoft Indexing Service 2.0. It also relies upon Indexing Service for the support of the underlying filters and full-text indexes.

The OLE DB Provider for Indexing Service 2.0 supports the ability to support SQL queries against data in the file system independent of SQL Server. The core extensions to the SQL language to support such queries are the same in both products. However, there are certain extensions that are not relevant to Indexing Service 2.0. For more information about the syntax of full-text queries against data in SQL Server, see Full-text Querying SQL Server Data.

See Also

CONTAINS

WHERE

FREETEXT