3. UltimateSearch Configuration

UltimateSearch

UltimateSearch Configuration File (UltimateSearch.config)
Element Description
scanDirectoryList Starts scanning (crawling and indexing) the files under the specified directories and continues until it covers all subdirectories underneath. If you don't specify anything in scanDirectoryList, scanXmlList or scanUrlList it scans the files under the current web application by default. Note that if you enter anything in scanDirectoryList you also need to set mapPathList so that it can map to the virtual path to crawl properly.
scanXmlList Parses the local XML file specified by "filePath" to extract the urls from the elements or attributes specified by "urlXPath". You can list one or more website navigation files such as UltimateMenu, UltimatePanel and UltimateSitemap source XML files, each one specified in a separate "scanXml" element. Note that "urlXPath" is case-sensitive. Also note that you can set "filePath" in three different forms.
scanUrlList Starts scanning (crawling and indexing) with the specified urls and continues with the urls inside each page until it covers all urls within each domain. You can list multiple domains, home pages, sitemap pages, or any other url. Note that scanUrl can be set to any URL that opens as a page in your browser window. If you set it to a directory like WebApplication2 you should enable default documents on the Documents tab of the IIS settings.
excludePathList Urls starting with the specified prefixes will be discarded. Note that you can also use the robots.txt file to disallow paths, or robots meta tags to set noindex and nofollow flags in each page. You may visit http://www.robotstxt.org/wc/exclusion-admin.html to get more familiar with the robots.txt file and meta tags. If you don't specify anything it will exclude the UltimateSearchInclude directory under the current web application by default.
Ignore Tags You can exclude a portion of your pages in three different ways:
1. Use UltimateSearch_IgnoreBegin and UltimateSearch_IgnoreEnd tags to exclude everything between these tags from indexing.
2. Use UltimateSearch_IgnoreTextBegin and UltimateSearch_IgnoreTextEnd tags to exclude only the text between these tags from indexing, while following the links.
3. Use UltimateSearch_IgnoreLinksBegin and UltimateSearch_IgnoreLinksEnd tags to exclude only the links between these tags from indexing, while indexing the text.

See how you can define these ignore tags below:

<!-- UltimateSearch_IgnoreBegin -->
  Everything here will be ignored
<!-- UltimateSearch_IgnoreEnd -->

<!-- UltimateSearch_IgnoreTextBegin -->
  Text here will be ignored, but links will be followed
<!-- UltimateSearch_IgnoreTextEnd -->

<!-- UltimateSearch_IgnoreLinksBegin -->
  Links here will be ignored, but text will be indexed
<!-- UltimateSearch_IgnoreLinksEnd -->
includeFileTypeList Only the files with the specified extensions will be scanned.
mapPathList Virtual to physical path mappings must be provided if you use scanDirectoryList.
devProdMapPathList When you deploy your web application to a production/hosting environment, you may not have the ability to crawl/index your website or you may not have the necessary permissions to save your index files. In that case, you may build your index file on your development/publishing machine, and then copy the Index directory onto your production machine. On your development/publishing machine, you have to provide "devProdMapPathList" so that the generated index files have the urls point to the actual production machine instead of your development machine. After copying the Index directory onto the production machine you will also need to update the config file on that machine to set "saveIndex", "saveEventLog", and "saveSearchLog" to "false" since you're not allowed to write onto that machine. On your production/hosting machine, open the UltimateSearch.admin.aspx page in IE, and click "Load Copied Index" in order to load the copied index.
defaultDocumentList Default documents under each directory. When you specify this list, it won't index the directory url and the default document url at the same time.
stopWordList These words will not be indexed. Note that you don't need to list words that are shorter than the "minWordLength" attribute setting.
Configuration Attributes
Attribute Description Default Value
ignoreAllNumericWords Ignore words that contain only numeric characters such as 1234. true
ignoreMixedNumericWords Ignore words that contain both numeric and alphabetic characters such as ABC123. true
indexDirectory Directory that contains the index files. Give full permission to the ASP.NET user (NETWORK SERVICE in Windows 2003) on the Index directory in order to save the index files properly. ~/UltimateSearchInclude/Index
logDirectory Directory that contains the event and search logs. Give full permission to the local NETWORK SERVICE user (ASP.NET user in Windows XP) on the Log directory in order to save the log files properly. ~/UltimateSearchInclude/Log
saveEventLog Whether to log history of index operations. true
saveSearchLog Whether to log history of search operations. true
displayExceptionMessage Whether to display the exception message on screen when it can't write to event log file. true
useIfilterToParsePdf UltimateSearch has a built-in PDF parser. However, if you experience any issues with it you may set this flag to true, and install Adobe IFilter from http://www.ifilter.org. Remember to reboot your machine after each IFilter installation. false
useRobotsFile Whether to use robots file to disallow paths from crawling. If you want to keep the querystrings as part of the indexed urls you should set this flag to false. false
useRobotsMeta Whether to use meta tags to set noindex and nofollow flags in each page. false
removeQueryString Whether to remove query string from urls while crawling. false
urlCaseSensitive Whether to treat urls case-sensitive or not. If you set this flag to true indexed urls will be case-sensitive, i.e. search results may show both http://www.mydomain.com and http://www.MyDomain.com if both links exist on your pages. This feature is especially useful if the values in querystrings need to be case-sensitive. false
maxPageCount Maximum number of pages to be crawled and indexed. There is no limitation on this setting. You can set it to a larger number if you have enough memory and disk space to support. 1000000
maxPageLength Maximum number of characters to be parsed and indexed in every page. There is no limitation on this setting. You can set it to a greater number if your pages are too big and you want to index all page content. Note that this value needs to be greater than the number of characters displayed on a page because of the HTML tags and hidden text in the source of the page. In other words you should take into account the actual number of characters that you see when you make a view source on a page. 1000000
minWordLength Minimum number of characters allowed in a word to be indexed. Words with less number of characters won't be indexed. 3
maxWordLength Maximum number of characters allowed in a word to be indexed. Words with more number of characters won't be indexed. 30
requestTimeout Request timeout period in milliseconds. Default is 30000, i.e. 30 seconds. Increase this number if you have big pages or documents to be crawled and indexed. 30000
scoreUrl Score assigned to a word found inside the page url. Set it to 0 (zero) if you don't want the words in page url to be indexed. 16
scoreTitle Score assigned to a word found inside the page title. Set it to 0 (zero) if you don't want the words in page title to be indexed. 8
scoreKeywords Score assigned to a word found inside the page keywords. Set it to 0 (zero) if you don't want the words in page keywords to be indexed. 4
scoreDescription Score assigned to a word found inside the page description. Set it to 0 (zero) if you don't want the words in page description to be indexed. 2
scoreText Score assigned to a word found inside the page text. Set it to 0 (zero) if you don't want the words in page text to be indexed. 1
userAgent User-Agent to identify the originator of the HTTP request sent to the web server during crawling. For example, you can set it to "BlackBerry8100/4.2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1" if you want to index your website for people connecting from a mobile device like BlackBerry Pearl. Karamasoft UltimateSearch Crawler
useDefaultProxy Whether to use the default proxy. true
proxyAddress Proxy address to use when the website is behind a proxy server. None
proxyUsername Proxy username to use when the website is behind a proxy server. None
proxyPassword Proxy password to use when the website is behind a proxy server. None
proxyDomain Proxy domain to use when the website is behind a proxy server. None
useDefaultCredentials Whether to use the default network credentials. true
networkUsername Network username to use when the website uses Windows authentication. None
networkPassword Network password to use when the website uses Windows authentication. None
networkDomain Network domain to use when the website uses Windows authentication. None
ignoreSslCertificateValidation Whether to ignore SSL certificate validation. false