SolrNet: SolrNet.ExtractParameters Class Reference

SolrNet

SolrNet.ExtractParameters Class Reference

Contains parameters that can be specified when extracting a rich document to the index. More...

List of all members.

Public Member Functions

 ExtractParameters (Stream content, string id, string resourceName)
 Constructs a new ExtractParameters with required values.
 ExtractParameters (FileStream content, string id)
 Constructs a new ExtractParameters with required values.

Properties

string Id [get, set]
 Provides the necessary unique id for the document being indexed /summary>
string ResourceName [get, set]
bool AutoCommit [get, set]
 Causes Solr to do a commit after indexing the document, making it immediately searchable.
bool ExtractOnly [get, set]
 If true, return the extracted content from Tika without indexing the document. This literally includes the extracted XHTML as a string in the response.
ExtractFormat ExtractFormat [get, set]
 The format to specify for extraction.
bool CaptureAttributes [get, set]
 Index attributes of the Tika XHTML elements into separate fields, named after the element. For example, when extracting from HTML, Tika can return the href attributes in <a> tags as fields named "a".
string Capture [get, set]
 Tika XHTML NAME: Capture XHTML elements with the name separately for adding to the Solr document. This can be useful for grabbing chunks of the XHTML into a separate field. For instance, it could be used to grab paragraphs (<p>) and index them into a separate field.
string Prefix [get, set]
 Prefix all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions.
string DefaultField [get, set]
 If uprefix is not specified and a Field cannot be determined, the default field will be used.
IEnumerable< ExtractFieldFields [get, set]
 Collection of fields and thier specified value.
string XPath [get, set]
 When extracting, only return Tika XHTML content that satisfies the XPath expression. See http://lucene.apache.org/tika/documentation.html for details on the format of Tika XHTML.
bool LowerNames [get, set]
 Map all field names to lowercase with underscores. For example, Content-Type would be mapped to content_type.
string StreamType [get, set]
 Mime type of the file - if provided, Tika won't have to try to infer it from the ResourceName and content.
Stream Content [get, set]
 The rich document to index.

Detailed Description

Contains parameters that can be specified when extracting a rich document to the index.

See http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters


Constructor & Destructor Documentation

SolrNet.ExtractParameters.ExtractParameters ( Stream  content,
string  id,
string  resourceName 
)

Constructs a new ExtractParameters with required values.

Parameters:
content
id
resourceName
SolrNet.ExtractParameters.ExtractParameters ( FileStream  content,
string  id 
)

Constructs a new ExtractParameters with required values.

Parameters:
content
id

Property Documentation

bool SolrNet.ExtractParameters.AutoCommit [get, set]

Causes Solr to do a commit after indexing the document, making it immediately searchable.

For good performance when loading many documents, don't call commit until you are done.

string SolrNet.ExtractParameters.Capture [get, set]

Tika XHTML NAME: Capture XHTML elements with the name separately for adding to the Solr document. This can be useful for grabbing chunks of the XHTML into a separate field. For instance, it could be used to grab paragraphs (<p>) and index them into a separate field.

Content is also still captured into the overall "content" field.

bool SolrNet.ExtractParameters.CaptureAttributes [get, set]

Index attributes of the Tika XHTML elements into separate fields, named after the element. For example, when extracting from HTML, Tika can return the href attributes in <a> tags as fields named "a".

Stream SolrNet.ExtractParameters.Content [get, set]

The rich document to index.

string SolrNet.ExtractParameters.DefaultField [get, set]

If uprefix is not specified and a Field cannot be determined, the default field will be used.

ExtractFormat SolrNet.ExtractParameters.ExtractFormat [get, set]

The format to specify for extraction.

bool SolrNet.ExtractParameters.ExtractOnly [get, set]

If true, return the extracted content from Tika without indexing the document. This literally includes the extracted XHTML as a string in the response.

IEnumerable<ExtractField> SolrNet.ExtractParameters.Fields [get, set]

Collection of fields and thier specified value.

string SolrNet.ExtractParameters.Id [get, set]

Provides the necessary unique id for the document being indexed /summary>

summary> Name of the file Tika can use it as a hint for detecting mime type. /summary>

bool SolrNet.ExtractParameters.LowerNames [get, set]

Map all field names to lowercase with underscores. For example, Content-Type would be mapped to content_type.

string SolrNet.ExtractParameters.Prefix [get, set]

Prefix all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions.

Setting Prefix to false would effectively ignore all unknown fields generated by Tika given the example schema contains <dynamicField name="ignored_*" type="ignored">

string SolrNet.ExtractParameters.StreamType [get, set]

Mime type of the file - if provided, Tika won't have to try to infer it from the ResourceName and content.

string SolrNet.ExtractParameters.XPath [get, set]

When extracting, only return Tika XHTML content that satisfies the XPath expression. See http://lucene.apache.org/tika/documentation.html for details on the format of Tika XHTML.


The documentation for this class was generated from the following file:
  • SolrNet/ExtractParameters.cs
Generated on Sun May 3 2015 17:19:05 for SolrNet by  doxygen 1.7.2