UltimateSearch Configuration File (UltimateSearch.config) | |
Element | Description |
scanDirectoryList | Starts scanning (crawling and indexing) the files under the specified directories and continues until it covers all subdirectories underneath. If you don't specify anything in scanDirectoryList, scanXmlList or scanUrlList it scans the files under the current web application by default. Note that if you enter anything in scanDirectoryList you also need to set mapPathList so that it can map to the virtual path to crawl properly. |
scanXmlList | Parses the local XML file specified by "filePath" to extract the urls from the elements or attributes specified by "urlXPath". You can list one or more website navigation files such as UltimateMenu, UltimatePanel and UltimateSitemap source XML files, each one specified in a separate "scanXml" element. Note that "urlXPath" is case-sensitive. Also note that you can set "filePath" in three different forms. |
scanUrlList | Starts scanning (crawling and indexing) with the specified urls and continues with the urls inside each page until it covers all urls within each domain. You can list multiple domains, home pages, sitemap pages, or any other url. Note that scanUrl can be set to any URL that opens as a page in your browser window. If you set it to a directory like WebApplication2 you should enable default documents on the Documents tab of the IIS settings. |
excludePathList | Urls starting with the specified prefixes will be discarded. Note that you can also use the robots.txt file to disallow paths, or robots meta tags to set noindex and nofollow flags in each page. You may visit http://www.robotstxt.org/wc/exclusion-admin.html to get more familiar with the robots.txt file and meta tags. If you don't specify anything it will exclude the UltimateSearchInclude directory under the current web application by default. |
Ignore Tags | You can exclude a portion of your pages in three different ways: 1. Use UltimateSearch_IgnoreBegin and UltimateSearch_IgnoreEnd tags to exclude everything between these tags from indexing. 2. Use UltimateSearch_IgnoreTextBegin and UltimateSearch_IgnoreTextEnd tags to exclude only the text between these tags from indexing, while following the links. 3. Use UltimateSearch_IgnoreLinksBegin and UltimateSearch_IgnoreLinksEnd tags to exclude only the links between these tags from indexing, while indexing the text. See how you can define these ignore tags below: <!-- UltimateSearch_IgnoreBegin --> Everything here will be ignored <!-- UltimateSearch_IgnoreEnd --> <!-- UltimateSearch_IgnoreTextBegin --> Text here will be ignored, but links will be followed <!-- UltimateSearch_IgnoreTextEnd --> <!-- UltimateSearch_IgnoreLinksBegin --> Links here will be ignored, but text will be indexed <!-- UltimateSearch_IgnoreLinksEnd --> |
includeFileTypeList | Only the files with the specified extensions will be scanned. |
mapPathList | Virtual to physical path mappings must be provided if you use scanDirectoryList. |
devProdMapPathList | When you deploy your web application to a production/hosting environment, you may not have the ability to crawl/index your website or you may not have the necessary permissions to save your index files. In that case, you may build your index file on your development/publishing machine, and then copy the Index directory onto your production machine. On your development/publishing machine, you have to provide "devProdMapPathList" so that the generated index files have the urls point to the actual production machine instead of your development machine. After copying the Index directory onto the production machine you will also need to update the config file on that machine to set "saveIndex", "saveEventLog", and "saveSearchLog" to "false" since you're not allowed to write onto that machine. On your production/hosting machine, open the UltimateSearch.admin.aspx page in IE, and click "Load Copied Index" in order to load the copied index. |
defaultDocumentList | Default documents under each directory. When you specify this list, it won't index the directory url and the default document url at the same time. |
stopWordList | These words will not be indexed. Note that you don't need to list words that are shorter than the "minWordLength" attribute setting. |
Configuration Attributes | ||
Attribute | Description | Default Value |
ignoreAllNumericWords | Ignore words that contain only numeric characters such as 1234. | true |
ignoreMixedNumericWords | Ignore words that contain both numeric and alphabetic characters such as ABC123. | true |
indexDirectory | Directory that contains the index files. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Index directory in order to save the index files properly. | ~/UltimateSearchInclude/Index |
logDirectory | Directory that contains the event and search logs. Give full permission to the local NETWORK SERVICE user (ASP.NET user in Windows XP) on the Log directory in order to save the log files properly. | ~/UltimateSearchInclude/Log |
saveEventLog | Whether to log history of index operations. | true |
saveSearchLog | Whether to log history of search operations. | true |
displayExceptionMessage | Whether to display the exception message on screen when it can't write to event log file. | true |
useIfilterToParsePdf | UltimateSearch has a built-in PDF parser. However, if you experience any issues with it you may set this flag to true, and install Adobe IFilter from http://www.ifilter.org. Remember to reboot your machine after each IFilter installation. | false |
useRobotsFile | Whether to use robots file to disallow paths from crawling. If you want to keep the querystrings as part of the indexed urls you should set this flag to false. | false |
useRobotsMeta | Whether to use meta tags to set noindex and nofollow flags in each page. | false |
removeQueryString | Whether to remove query string from urls while crawling. | false |
urlCaseSensitive | Whether to treat urls case-sensitive or not. If you set this flag to true indexed urls will be case-sensitive, i.e. search results may show both http://www.mydomain.com and http://www.MyDomain.com if both links exist on your pages. This feature is especially useful if the values in querystrings need to be case-sensitive. | false |
maxPageCount | Maximum number of pages to be crawled and indexed. There is no limitation on this setting. You can set it to a larger number if you have enough memory and disk space to support. | 1000000 |
maxPageLength | Maximum number of characters to be parsed and indexed in every page. There is no limitation on this setting. You can set it to a greater number if your pages are too big and you want to index all page content. Note that this value needs to be greater than the number of characters displayed on a page because of the HTML tags and hidden text in the source of the page. In other words you should take into account the actual number of characters that you see when you make a view source on a page. | 1000000 |
minWordLength | Minimum number of characters allowed in a word to be indexed. Words with less number of characters won't be indexed. | 3 |
maxWordLength | Maximum number of characters allowed in a word to be indexed. Words with more number of characters won't be indexed. | 30 |
requestTimeout | Request timeout period in milliseconds. Default is 30000, i.e. 30 seconds. Increase this number if you have big pages or documents to be crawled and indexed. | 30000 |
scoreUrl | Score assigned to a word found inside the page url. Set it to 0 (zero) if you don't want the words in page url to be indexed. | 16 |
scoreTitle | Score assigned to a word found inside the page title. Set it to 0 (zero) if you don't want the words in page title to be indexed. | 8 |
scoreKeywords | Score assigned to a word found inside the page keywords. Set it to 0 (zero) if you don't want the words in page keywords to be indexed. | 4 |
scoreDescription | Score assigned to a word found inside the page description. Set it to 0 (zero) if you don't want the words in page description to be indexed. | 2 |
scoreText | Score assigned to a word found inside the page text. Set it to 0 (zero) if you don't want the words in page text to be indexed. | 1 |
userAgent | User-Agent to identify the originator of the HTTP request sent to the web server during crawling. For example, you can set it to "BlackBerry8100/4.2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1" if you want to index your website for people connecting from a mobile device like BlackBerry Pearl. | Karamasoft UltimateSearch Crawler |
useDefaultProxy | Whether to use the default proxy. | true |
proxyAddress | Proxy address to use when the website is behind a proxy server. | None |
proxyUsername | Proxy username to use when the website is behind a proxy server. | None |
proxyPassword | Proxy password to use when the website is behind a proxy server. | None |
proxyDomain | Proxy domain to use when the website is behind a proxy server. | None |
useDefaultCredentials | Whether to use the default network credentials. | true |
networkUsername | Network username to use when the website uses Windows authentication. | None |
networkPassword | Network password to use when the website uses Windows authentication. | None |
networkDomain | Network domain to use when the website uses Windows authentication. | None |
ignoreSslCertificateValidation | Whether to ignore SSL certificate validation. | false |