xdoc2txt - PDF, WORD, EXCEL, the text from a variety of binary documents such as Ichitaro extraction |
xdoc2txt xdoc2txt
Overview
- xdoc2txt is PDF, WORD, EXCEL, from a variety of binary documents such as Ichitaro, is a general-purpose text converter to extract the text element, will work with the Windows command line.
- xdoc2txt because it has to analyze the structure of the various documents directly, you can convert alone. * Such as WORD or Acrobat, you do not need to install the original application.
- Because it operates at a high speed, it is ideal for filter of various full-text search engine.
- Kind of
word-processing document, and then determined
from the extension.
To the next extension of the files are supported.
.rtf Rich text .docx Microsoft WORD 2007/2010/2013 (OOXML) .xlsx Microsoft Excel 2007/2010/2013 (OOXML) .pptx Microsoft PowerPoint 2007/2010/2013 (OOXML) .doc Microsoft WORD ver5.0 / 95/97/2000 / XP / 2003 .xls Microsoft Excel ver5.0 / 95/97/2000 / XP / 2003 .ppt Microsoft PowerPoint 97/2000 / XP / 2003 .sxw / .sxc / .sxi / .sxd OpenOffice.org .odt / .ods / .odp / .odg Open Document .jaw / jtw Ichitaro ver5 .jbw / juw Ichitaro ver6 .jfw / jvw Ichitaro ver7 .jtd / jtt Ichitaro ver8 / 9/10/11/12 .oas / oa2 / oa3 OASYS / Win .bun New pine / pine 5 / pine 6 .wj2 / wj3 / wk3 / wk4 / 123 Lotus 123 .wri Windows3.1 Write .pdf Adobe PDF .mht Web Archive .html HTML .eml Export format of OutlookExpress - From Ver2.0, it supports iFilter. * Even with the extension that xdoc2txt does not correspond to the native, text extraction can be done if there is a corresponding iFilter. (This function is only exe version)
- exe version, Dll version, there is a COM component version. Function of text extraction is equivalent.
operating environment
xdoc2txt operates in the following environment.
ver | Operating environment |
---|---|
ver1.x (MBCS) | Windows 95/98 / ME / NT4.0 / 2000 / XP / Vista / Windows 7 (32bit / 64bit) / Windows 8 (32bit / 64bit) / Server 2003 / Windows Server 2008 R2 (64bit) / Windows Server 2012 (64bit) / Windows Server 2012 R2 (64bit) |
ver2.x (Unicode) | 2000 / XP / Vista / Windows 7 (32bit / 64bit) / Windows 8 (32bit / 64bit) / Windows 10 (32bit / 64bit) / Server 2003 / Windows Server 2008 R2 (64bit) / Windows Server 2012 (64bit) / Windows Server 2012 R2 (64bit) |
- To execute Ver 2.0, the following packages need to be installed.
When executing xdoc2txt 32 bit (x86) version (Windows OS to execute is 32/64 bit):
Microsoft Visual C ++ 2010 Redistributable Package (x86)When executing xdoc2txt 64bit (x64) version:
"The specified program can not be executed." If you receive an error message such as "The application could not be started due to incorrect side-by-side configuration of this application", install the above package from Microsoft's distribution site please.
Microsoft Visual C ++ 2010 Redistributable Package (x64)
Copyright and Terms and Conditions
- xdoc2txt the case of a non-profit, it can be used in free. Use of personal and non-profit organization, the use of the enterprise and corporate internal intranet, if you want to use to build their own Internet publishing server (including the management of the commercial site), does not hit for commercial, available in free I can do it.
- Because
if you
want to re-distribute incorporated into commercial products is
a
xdoc2txt you around to a commercial license, please contact
the author.
xdoc2txt commercial license (2007/5/24 edition)
- The xdoc2txt that is included in the Hyper Estraier, if you want to distribute with the Hyper Estraier are excluded from the commercial license.
- If you
wish to re-distribution of
xdoc2txt, thank you so be sure to take a distribution permit
to the
author.
In the case of free software, it will not allow the
principle distribution.
If you want to re-distribute the xdoc2txt, please distribute without changing the whole file that is included in the package. * Also, Please indicate to where the user is found in the manual such as the fact that you are using the xdoc2txt.
It should be noted that, in the case where the software is a specification that can be built-in the xdoc2txt as an external filter (if you use the xdoc2txt downloaded separately), the contact of the author is absolutely unnecessary. - Copyright of xdoc2txt and accompanying documentation are the property of their hishida.
- xdoc2txt is provided as-is with no warranty. Any damages arising from the use or non-use of xdoc2txt respect (lost profits, interruption of business, including the other monetary damage loss of business information), the author does not take any responsibility.
- Post and inclusion to the software of the magazine, to allow the reprint of on the Internet * If you posted as is, please contact us posted about the magazine until the author because it is quite the ex-post reporting.
command options
xdoc2txt.exe [options ..] <filename ...> -h help encoding of the display encoding -s output ShiftJIS (default) encoding of -j output encoding of JIS -s output of EUC -u output UTF -16 (LE) * Ver2.0 encode later -8 output prioritize iFilter in the case where it can -i use UTF-8 ※ Ver2.0 or later.Ver2.0 or later output the -f conversion results to a file. By default, if the output -p OLE2 compound document to the standard output, display the document properties (Office, Ichitaro effective) -r = conversion of the HTML document ruby -r = 0 ruby Delete -r = 1 () -r = 2 "" blue sky paperback format -o = other options -o = 0 PDF -? - is the delete the line breaks in -o = 1 PDF you do not want to display the form page number of (new line in the vertical writing to every one letter when in use) -g = # character spacing adjustment option (the default value for PDF 95) # in the case of zero or more of the percentage (60%, specified as -g = 60) between the character is, character height * (# / 100 ) When you open more, open vacant and it considers -g = 60 of character height more than 60% that does not output -g = 0 characters between adjusting the blank and, -v version number display regarded as vacant is between the characters - display only the cells which are present in x EXCEL2007 (xdoc2txt 1.33 or later) -z = # maximum size (byte) initial value of the input file is -z = 0 unlimited to 512MB the upper limit of 256MB -z = 512000000 input file size. It does not perform the file size check. <Filename> convert the original file name. * Wildcard characters (*?) Is usable. * In the case of file names that contain spaces, please enclosed by "". * The following options have been deprecated from Ver2.0. Ignores the setting of the access privileges of -n PDF document (cryptlib.dll need) -c PDF cache on (default is off) |
How to Use
- The
following example,
you write the text that is included in the sample.doc of
MS-Word
document to the standard output.
xdoc2txt sample.doc
xdoc2txt sample.doc> sample.txt
xdoc2txt -f sample.doc sample.xls
xdoc2txt -f * .xls xdoc2txt -p manual .doc * Execution result
* <Title> KWIC Finder manual </ Title>
* <Author> hishida </ Author>
* <Template> Normal.dot </ Template>
* <LastAuthor> hishida </ LastAuthor>
* <RevisionNumber> 1 </ RevisionNumber>
* <AppName> Microsoft Word 9.0 </ AppName>
* <Lastprinted> 2004/03/23 19:39:00 </ Lastprinted>
* <Created> 2004/03/23 19:35:00 </ Created>
* <LastSaved> 2004/03/23 19:44:00 </ LastSaved>
* <PageCount> 1 </ PageCount>
* <WordCount> 21 </ WordCount>
* <CharCount> 121 </ CharCount> - Is protected by a password WORD / EXCEL / PowerPoint / Ichitaro can not be displayed.
- Since then output as a general rule in the order of the stored text in a file, it may be different from the order of the display of the original application.
about the use of the mouse operation
Create a shortcut on the desktop, you can text of the mouse operation.
- Explorer right button menu → [send (N)] on xdoc2txt.exe in → [Desktop (create shortcut)
- Right button menu on the icon that has been created on the desktop → [Properties (R)]
- At the
end of the Target (T)], add the -f.
Example) "C: \ Program Files \ kwic \ xdoc2txt.exe" -f - When you drag and drop the file you want to text into the icon, the extension in the same directory you can file a .txt.
Reference article: Http://Www.Forest.Impress.Co.Jp/article/2003/11/19/xdoc2txt.Html ([Du NEWS of the window])
About iFilter
- In Ver2.0 or later, it supports iFilter. * The -i option, if the iFilter for the extension is available, will give priority to iFilter.
- We are
in the process of validating the following iFilter.
- Ichitaro IFilter 32 bit for OS
- DocuWorks Content Filter
- Microsoft Office Filter Pack
- Adobe
Reader 9.5 accessory iFilter
※ Adobe Reader 10 that comes after iFilter and, alone has been distributed in the "Adobe PDF IFilter v6.0", "Adobe PDF iFilter 9 for 64-bit platforms," is not available.
- iFilter support is a feature of only exe version. * It can not be used iFilter in Dll version.
Download
Ver2.x (Unicode version)
-
New! 2017/07/06
- xdoc2txt 2.16.1 (xd2tx2161_x64.zip) - x64 (64 bit) version
- xdoc2txt 2.16.1 (xd2tx2161.zip) x86 (32 bit) version
Ver1.x (MBCS version)
- Ver1.00 Cryptlib.Dll (Crypt100.Lzh / 37KB) - encrypted (not required xdoc2txt 2.0 or later) additional DLL in order to search and display the PDF without a password
Filter Case Study
name | Kind | Genre | URL | Include |
---|---|---|---|---|
GoogleXdoc (Incorporating the xdoc2txt to GoogleDeskTop PlugIn) |
free | Full-text search | http://softfarm.net/ Soft farm |
○ |
Namazu for Win32 | free | Full-text search | Sample
of the document filters using
xdoc2txt (Mr. by a.hanai) Full-text
search system Namazu for Win32 |
|
Hyper Estraier | free | Full-text search | http://hyperestraier.sourceforge.net/ | ○ |
Meadow2 | free | editor | http://www.bookshelf.jp/pukiwiki/pukiwiki.php?Meadow%20memo%20Wiki Meadow memo Wiki |
○ |
MiGrep | free | Search | http://homepage3.nifty.com/m-and-i/freetalk/upload/index.html M & I page of |
|
VxEditor | free | editor | http://homepage3.nifty.com/x-labo/ X-Labo WebPage |
○ |
smoopy | free | Text vertical writing viewer | http://www.vector.co.jp/soft/win95/util/se263229.html | |
Transwise | free | Translation support | http://www6.ocn.ne.jp/~vmel/software/Transwise/Transwise.htm |
|
EBView | free | Dictionary-text search | http://ebview.sourceforge.net/ | |
Search Cross | Product | Full-text search | http://www.villagecenter.co.jp/soft/searchx/ Village Center Co., Ltd. |
|
KOA Direct Server | free (some fee required) | Content sharing system | http://koaproject.sakura.ne.jp/pages/koadirectserver.html KOA Project |
○ |
HNXgrep | free | Grep Search | http://www.vector.co.jp/soft/winnt/util/se494966.html |
* Of the software that can be used to xdoc2txt as a filter, which the author knows.
History
Ver2.x (Unicode version)
2.16.1 | 2017/07/06 |
|
2.16.1 | 2016/06/28 |
|
2.16 | 2016/04/26 |
|
2.15 | 2016/04/07 |
|
2.14 | 2015/11/19 |
|
2.13 | 2015/8/25 |
|
2.12 | 2015/7/18 |
|
2.11 | 2015/5/29 |
|
2.10 | 2015/4/15 |
|
2.09 | 2015/4/09 |
|
2.08 | 2015/3/11 |
|
2.07 | 2014/10/28 |
|
2.06 | 2014/10/09 |
|
2.05 | 2014/08/31 |
|
2.04 | 2014/07/29 |
|
2.03 | 2014/07/16 |
|
2.02 | 2014/06/14 |
|
2.02 | 2014/05/04 |
|
2.01 | 2014/02/16 |
|
2.00 | 2013/01/23 |
|
2.00β4 | 2012/12/28 |
|
2.00β3 | 2012/12/24 |
|
2.00β2 | 2012/12/19 |
|
2.00β1 | 2012/12/01 |
|
2.00β0 | 2012/11/26 |
|
2.00α3 | 2012/11/17 |
|
2.00α2 | 2012/11/15 |
|
2.00α1 | 2012/11/14 |
|
2.00α0 | 2012/11/13 |
|
Ver1.x (MBCS version)
Development of MBCS version (Ver1.x) has ended. Please use the Ver2.x system in the future.
1.52 | 2015/11/19 |
|
1.51 | 2015/8/25 |
|
1.50 1.50 | 2014/10/28 |
|
1.49 1.49 | 2014/10/09 |
|
1.48 | 2014/05/04 |
|
1.47 | 2013/11/30 |
|
1.46 | 2012/12/24 |
|
1.45 | 2012/11/26 |
|
1.44 | 2012/11/17 |
|
1.43 | 2012/10/17 |
|
1.42 | 2012/05/16 |
|
1.41 | 2011/07/31 |
|
1.40 | 2011/05/17 |
|
1.39 | 2011/04/28 |
|
1.38 | 2010/12/21 |
|
1.37 | 2010/05/16 |
|
1.36 | 2010/01/09 |
|
1.35 | 2009/08/28 |
|
1.34 | 2009/06/22 |
|
1.33 | 2009/06/07 |
|
1.32 | 2008/12/01 |
|
1.31 | 2008/11/05 |
|
1.30 R2 | 2008/08/18 |
|
1.30 | 2008/05/22 |
|
1.29 | 2008/05/18 |
|
1.28 | 2008/03/18 |
|
1.27 | 2008/01/24 |
|
1.26a | 2007/10/21 |
|
1.26 | 2007/05/11 |
|
1.25 | 2007/08/13 |
|
1.24 | 2007/02/18 |
|
1.23 | 2006/08/29 |
|
1.22 | 2006/05/28 |
|
- | 2006/05/10 |
|
1.21 | 2006//05/08 |
|
1.20 | 2006/02/17 |
|
1.19 | 2006/02/08 |
|
1.18 | 2006/02/04 |
|
1.17 | 2005/09/19 |
|
1.16 | 2005/05/02 |
|
1.15 | 2005/04/23 |
|
1.14 | 2005/01/31 |
|
1.13 | 2004/05/30 |
|
1.12 | 2004/05/05 |
|
1.11 | 2004/04/04 |
|
1.10 | 2004/03/13 |
|
1.09 | 2004/02/25 |
|
1.08 | 2004/01/28 |
|
1.07 | 2004/01/26 |
|
- | 2004/01/18 |
|
1.06 | 2003/11/09 |
|
1.05 | 2003/07/15 |
|
1.04 | 2003/03/26 |
|
1.03 | 2002/11/23 |
|
1.02 | 2002/10/18 |
|
1.01 | 2002/9/9 |
|
1.00 | 2002/7/8 |
|
© 2002-2012 hishida
Go to Home