WebScraper Methods

	Name	Description
	AcceptUrl	Decides if the WebScraper will accept a given url. My be overridden to apply custom middleware logic.
	ChooseIdentityForRequest	Picks a random identity from WebScraper.Identities for each request. Add Identities with proxy IP addresses, userAgents, headers, cookies, username and password in your Init Method and add them to the WebScraper.Identities List; Override this method to create your own logic for non-random selection of a HttpIdentity for each request.
	DownloadFile(String, String, Boolean, HttpIdentity)	Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images. Normally called with an Parse Method of IronWebScraper.WebScraper
	DownloadFile(Uri, String, Boolean, HttpIdentity)	Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images. Normally called with an Parse Method of IronWebScraper.WebScraper
	DownloadFileUnique	Much like DownloadFile except if the file has already been downloaded or exists locally, it will not be re-downloaded. Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images. Normally called with an Parse Method of IronWebScraper.WebScraper
	DownloadImage(String, String, Int32, Int32, Boolean, HttpIdentity)	Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images. Normally called with an Parse Method of IronWebScraper.WebScraper
	DownloadImage(Uri, String, Int32, Int32, Boolean, HttpIdentity)	Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images. Normally called with an Parse Method of IronWebScraper.WebScraper
	EnableWebCache	Caches web http responses for reuse. This allows WebScraper classes to be modified and restarted without re-downloading previously scraped urls.
	EnableWebCache(TimeSpan)	Caches web http responses for reuse. This allows WebScraper classes to be modified and restarted without re-downloading previously scrape urls.
	Equals	(Inherited from Object.)
	FetchUrlContents	A handy shortcut method that fetches the text content from any Url (synchronously).
	FetchUrlContentsBinary	A handy shortcut method that fetches the text content from any Url (synchronously) as a binary data in a byye array (byte[])
	GetHashCode	(Inherited from Object.)
	GetType	(Inherited from Object.)
	Init	Override this method initialize your web-scraper. Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.
	Log	Logs the specified message to the console. Logs can be Enabled using the EnableLogging. This function has been exposed and is over-ridable to allow for easy Email and Slack notification integration.
	ObeyRobotsDotTxtForHost	Causes the WebScraper to always obey /robots.txt directives including path restrictions and crawl rates on a domain by domain basis. May be overridden for advanced control.
	Parse	Override this method to create the default Response handler for your web scraper. If you have multiple page types, you can add additional similar methods.
	ParseWebscraperDownload	Internal method to parse binary files downloaded by a webScraper.
	ParseWebscraperDownloadImage	Internal method to parse images downloaded by a webScraper.
	PostRequest(String, ActionResponse, DictionaryString, String)	Request adds a new request to the scrape-job queue using the POST http method.
	PostRequest(Uri, ActionResponse, DictionaryString, String)	Request adds a new request to the scrape-job queue using the POST http method.
	PostRequest(String, ActionResponse, DictionaryString, String, MetaData)	Request adds a new request to the scrape-job queue using the POST http method.
	PostRequest(Uri, ActionResponse, DictionaryString, String, MetaData)	Request adds a new request to the scrape-job queue using the POST http method.
	PostRequest(String, ActionResponse, DictionaryString, String, HttpIdentity, MetaData)	Request adds a new request to the scrape-job queue using the POST http method.
	PostRequest(Uri, ActionResponse, DictionaryString, String, HttpIdentity, MetaData)	Request adds a new request to the scrape-job queue using the POST http method.
	Request(IEnumerableString, ActionResponse)	A key method called from with the Init and Parse Methods. Request adds new requests to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(String, ActionResponse)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(Uri, ActionResponse)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(String, ActionResponse, MetaData)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(Uri, ActionResponse, MetaData)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(String, ActionResponse, HttpIdentity, MetaData)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Request(Uri, ActionResponse, HttpIdentity, MetaData)	A key method called from with the Init and Parse Methods. Request adds a new request to the scrape-job queue, and decides which method (e.g. Parse) will be used to parse the Response object.
	Retry	Retries a Response. Usually called in a Parse method, this method is useful if a Captcha or error screen was encountered during Html parsing.
	Scrape	Appends any scraped data to a file in the JsonLines format. (1 json object per line). Will save any .Net object of any kind. This method is typically used with IronWebScraper.ScrapedData or developer defined classes for scraped data items. The default filename will follow the pattern "NameSpace.TypeName.jsonl". E.g: IronWebScraper.ScrapedData.jsonl
	ScrapeUnique	Appends scraped data to a file in the JsonLines format. (1 json object per line). Automatically ignores duplicates. Will save any .Net object of any kind. This method is typically used with IronWebScraper.ScrapedData or developer defined classes for scraped data items. The default filename will follow the pattern "WorkingDirecory/NameSpace.TypeName.jsonl". E.g: Scrape/IronWebScraper.ScrapedData.jsonl
	SetSiteSpecificCrawlRateLimit	Set a throttle limit for a specific domain
	Start	Starts the WebScraper. Set CrawlId to make this crawl resumable. Will also resume a previous scrawl with the same CrawlId if it exists. Giving a CrawlId also causes the WebScraper to auto-save its state every 5 minutes in case of a crash, system failure or power outage. This feature is particularly useful for long running web-scraping tasks, allowing hours, days or even weeks of work to be recovered effortlessly.
	StartAsync	Starts the WebScraper Asynchronously. Set CrawlId to make this crawl resumable. Will resume a previous scrawl with the same CrawlId if it exists.
	Stop	Stops this WebScraper instance graceful. The WebScraper may be restated later with no loss of data by calling Start(CrawlId) or StartAsync(CrawlId)
	ToString	(Inherited from Object.)
	UnScrape(Boolean)	Retrieves IronWebScraper.ScrapedData objects which were saved using the WebScraper.Scrape method.
	UnScrape(String, Boolean)	Retrieves IronWebScraper.ScrapedData objects which were saved using the WebScraper.Scrape method.
	UnScrapeT(Boolean)	Retrieves native C# objects which were saved using the WebScraper.Scrape method in the JsonLines format.
	UnScrapeT(String, Boolean)	Retrieves native C# objects which were saved using the WebScraper.Scrape method in the JsonLines format.

Name

Description

Decides if the WebScraper will accept a given url. My be overridden to apply custom middleware logic.

Picks a random identity from WebScraper.Identities for each request. Add Identities with proxy IP addresses, userAgents, headers, cookies, username and password in your Init Method and add them to the WebScraper.Identities List;

Override this method to create your own logic for non-random selection of a HttpIdentity for each request.

DownloadFile(String, String, Boolean, HttpIdentity)

Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images.

Normally called with an Parse Method of IronWebScraper.WebScraper

DownloadFile(Uri, String, Boolean, HttpIdentity)

Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images.

Normally called with an Parse Method of IronWebScraper.WebScraper

DownloadFileUnique

Much like DownloadFile except if the file has already been downloaded or exists locally, it will not be re-downloaded.

Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images.

Normally called with an Parse Method of IronWebScraper.WebScraper

DownloadImage(String, String, Int32, Int32, Boolean, HttpIdentity)

Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images.

Normally called with an Parse Method of IronWebScraper.WebScraper

DownloadImage(Uri, String, Int32, Int32, Boolean, HttpIdentity)

Requests a file to be downloaded from the given Url to the local file-system. Often used for scraping documents, assets and images.

Normally called with an Parse Method of IronWebScraper.WebScraper

EnableWebCache

Caches web http responses for reuse. This allows WebScraper classes to be modified and restarted without re-downloading previously scraped urls.

EnableWebCache(TimeSpan)

Caches web http responses for reuse. This allows WebScraper classes to be modified and restarted without re-downloading previously scrape urls.

Equals

(Inherited from Object.)

FetchUrlContents

A handy shortcut method that fetches the text content from any Url (synchronously).

FetchUrlContentsBinary

A handy shortcut method that fetches the text content from any Url (synchronously) as a binary data in a byye array (byte[])

GetHashCode

(Inherited from Object.)

GetType

(Inherited from Object.)

Init

Override this method initialize your web-scraper. Important tasks will be to Request at least one start url... and set allowed/banned domain or url patterns.

Log

Logs the specified message to the console. Logs can be Enabled using the EnableLogging. This function has been exposed and is over-ridable to allow for easy Email and Slack notification integration.

ObeyRobotsDotTxtForHost

Causes the WebScraper to always obey /robots.txt directives including path restrictions and crawl rates on a domain by domain basis. May be overridden for advanced control.

Parse

Override this method to create the default Response handler for your web scraper. If you have multiple page types, you can add additional similar methods.

ParseWebscraperDownload

Internal method to parse binary files downloaded by a webScraper.

ParseWebscraperDownloadImage

Internal method to parse images downloaded by a webScraper.

PostRequest(String, ActionResponse, DictionaryString, String)