HTMLparser Members

Majestic12

High-performance .NET C# HTMLparser Library

HTMLparser Members

HTMLparser overview

Public Static Methods

CalculateWidth Parses WIDTH param and calculates width
DecodeEntities This function will decode any entities found in a string - not fast!
IsBiggerFont Checks if first font is bigger than the second
IsEqualOrBiggerFont Checks if first font is equal or bigger than the second
ParseFontSize Parses font's tag size param

Public Instance Constructors

HTMLparser Overloaded. Initializes a new instance of the HTMLparser class.

Public Instance Fields

bAutoExtractBetweenTagsOnly If true (and either bAutoKeepComments or bAutoKeepScripts is true), then oHTML will be set to data BETWEEN tags excluding those tags themselves, as otherwise FULL HTML will be set, ie: '' but if this is set to true then only ' comments ' will be returned
bAutoKeepComments If true (default) then HTML for comments tags themselves AND between them will be set to oHTML variable, otherwise it will be empty but you can always set it later
bAutoKeepScripts If true (default: false) then HTML for script tags themselves AND between them will be set to oHTML variable, otherwise it will be empty but you can always set it later
bAutoMarkClosedTagsWithParamsAsOpen Long winded name... by default if tag is closed BUT it has got parameters then we will consider it open tag, this is not right for proper XML parsing
bCompressWhiteSpaceBeforeTag If true (default), then all whitespace before TAG starts will be compressed to single space char (32 or 0x20) this makes parser run a bit faster, if you need exact whitespace before tags then change this flag to FALSE
oHE Heuristics engine used by Tag Parser to quickly match known tags and attribute names, can be disabled or you can add more tags to it to fit your most likely cases, it is currently tuned for HTML

Public Instance Properties

bDecodeEntities 
bDecodeMiniEntities 
bEnableHeuristics If true (default) then heuristics engine will be used to match tags and attributes quicker, it is possible to add new tags to it, oHE

Public Instance Methods

ChangeToEntities Parses line and changes known entiry characters into proper HTML entiries
CleanUp Cleans up parser in preparation for next parsing
Close Closes object and releases all allocated resources
Dispose 
Equals (inherited from Object) Determines whether the specified Object is equal to the current Object.
GetHashCode (inherited from Object) Serves as a hash function for a particular type, suitable for use in hashing algorithms and data structures like a hash table.
GetType (inherited from Object) Gets the Type of the current instance.
InitOverloaded. Initialises parses with HTML to be parsed from provided string
InitMiniEntities Inits mini-entities mode: only "nbsp" will be converted into space, all other entities will be left as is
LoadFromFile Loads HTML from file
ParseNext Parses next chunk and returns it with
ParseNextTag Returns next tag or null if end of document, text will be ignored completely
Reset Resets current parsed data to start
SetChunkHashMode Sets chunk param hash mode
SetEncodingOverloaded. Sets encoding
SetRawHTML Sets oHTML variable in a chunk to the raw HTML that was parsed for that chunk.
ToString (inherited from Object) Returns a String that represents the current Object.

Protected Instance Methods

Finalize (inherited from Object) Allows an Object to attempt to free resources and perform other cleanup operations before the Object is reclaimed by garbage collection.
MemberwiseClone (inherited from Object) Creates a shallow copy of the current Object.

See Also

HTMLparser Class | Majestic12 Namespace