WebScraper Fields

IronWebScraper

previous page next page

IronWebScraper - The C# Web Scraping Library

WebScraper Fields

The WebScraper type exposes the following members.

Fields

	Name	Description
	AllowedDomains	If not empty, all requested Urls' hostname must match at least one of the AllowedDomains patterns. Patterns may be added using glob wildcard strings or Regex
	AllowedUrls	If not empty, all requested Urls must match at least one of the AllowedUrls patterns. Patterns may be added using glob wildcard strings or Regex
	BannedDomains	If not empty, no requested Urls' hostname may match any of the BannedDomains patterns. Patterns may be added using glob wildcard strings or Regex
	BannedUrls	If not empty, no requested Urls may match any of the BannedUrls patterns. Patterns may be added using glob wildcard strings or Regex
	CrawlId	A unique string used to identify a crawl job.
	FilesDownloaded	The total number of files downloaded successfully with the DownloadImage and DownloadFile methods.
	Identities	A list of http identities to be used to fetch web resources. Each Identity may have a different proxy IP addresses, userAgent, http headers, persistent cookies, username and password. Best practice is to create Identities in your WebScraper.Init Method and Add them to this WebScraper.Identities List.
	LoggingLevel	The level of logging made by the WebScraper engine to the Console. LogLevel.Critical is normally the most useful setting, allowing the developer to write their own, meaningful and application relevant messages inside of Parse methods. LogLevel.ScrapedData is useful when coding and testing a new WebScraper.
	ObeyRobotsDotTxt	Causes the WebScraper to always obey /robots.txt directives including url and path restrictions and crawl rates.
	WorkingDirectory	Path to a local directory where scraped data and state information will be saved.

See Also

Reference

WebScraper Class

IronWebScraper Namespace

previous page start next page

Get in touch

Submit feedback about this site to:

[email protected]