DOC - > TXT组件 [英] DOC -> TXT component

查看:71
本文介绍了DOC - > TXT组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,


我正在尝试从上传的DOC文件中提取文本,以便我可以对文本执行

一些正则表达式,以便填写ASP

页面上的一些文本框。我知道Writely( www.writely.com )这样做但我认为

使用C#或特殊内容来保存上传的

DOC文件的格式。


我的问题:

(a)有人知道从DOC文件中提取文本的方法吗?任何

组件那么便宜?


(b)有没有办法预览上传的DOC文件?也许

将其转换为XML并使用一些默认样式或其他东西。


(c)如果DOC->文本是不可能的,是DOC - > RTF可能吗?由于RTF是

ASCII,所以仍然可以做点什么..


非常感谢您的时间/任何回复。


Vince

Hi all,

I am trying to extract text from an uploaded DOC file so that I can do
some regex on the text in order to fill up some textboxes on the ASP
page. I know that Writely (www.writely.com) does this but I think the
use C# or something special to preserve the formatting of the uploaded
DOC file.

My questions:

(a) Does anybody know of a way to extract text from DOC files? Any
component out there that''s cheap?

(b) Is there any way to be able to preview an uploaded DOC file? Maybe
convert it into XML and use some default styles or something.

(c) If DOC->Text isn''t possible, is DOC->RTF possible? Since RTF is
ASCII, something could still be done..

Thanks a lot for your time / any response.

Vince

推荐答案



" LtCommander" < Lt ********* @ gmail.comwrote in message

news:11 ********************* @ e64g2000cwd.googlegro ups.com ...

"LtCommander" <Lt*********@gmail.comwrote in message
news:11*********************@e64g2000cwd.googlegro ups.com...

大家好,


我试图从上传中提取文字DOC文件,这样我就可以在文本上做一些正则表达式,以便填写ASP

页面上的一些文本框。我知道Writely( www.writely.com )这样做但我认为

使用C#或特殊内容来保存上传的

DOC文件的格式。


我的问题:

(a)有人知道从DOC文件中提取文本的方法吗?任何

组件那么便宜?
Hi all,

I am trying to extract text from an uploaded DOC file so that I can do
some regex on the text in order to fill up some textboxes on the ASP
page. I know that Writely (www.writely.com) does this but I think the
use C# or something special to preserve the formatting of the uploaded
DOC file.

My questions:

(a) Does anybody know of a way to extract text from DOC files? Any
component out there that''s cheap?



最便宜的是办公室DLL。它也是最重要的,并且存在许可问题的潜在价值。有文库可以从一个文字转到另一个文字,

但我不确定在另一个方向(确定它们在那里,但是我

不知道他们个人)。

Cheapest is the office DLLs. It is also the most perf heavy and there is a
potential of licensing issues. There are libraries to go from text to word,
but I am not certain in the other direction (sure they are out there, but I
do not know of them personally).


(b)有没有办法预览上传的DOC文件?也许

将其转换为XML并使用一些默认样式或其他东西。
(b) Is there any way to be able to preview an uploaded DOC file? Maybe
convert it into XML and use some default styles or something.



我确定有一个组件(componentsource.com)或一个开源的
库(sourceforge.net是一个很好的资源,就像codeplex.com)。

I am sure there is a component (componentsource.com) or an open source
library (sourceforge.net is a good resource, as is codeplex.com).


(c)如果DOC->文本是不可能的,DOC-> RTF可能吗?由于RTF是

ASCII,所以仍然可以完成某些事情..
(c) If DOC->Text isn''t possible, is DOC->RTF possible? Since RTF is
ASCII, something could still be done..



正在从DOC中提取问题。使用

Office libs可以很容易地使用DOC RTF,但是你可以重新使用你的系统上的Office组件。


我会看第三方图书馆。我知道有像Word这样的组件

Writer(不确定它是否两种方式)。我还会看看

开源社区。您可能仍需支付(取决于许可证),

但您应该能够制定合理的交易。


-

Gregory A. Beamer

MVP; MCP:+ I,SE,SD,DBA
http://gregorybeamer.spaces。 live.com


******************************* ******************

在盒子外面思考!

********** *****************************************

It is pulling from DOC that is the issue. DOC RTF is easy enough with the
Office libs, but you go back to the weight of having Office components on
your system.

I would look at third party libraries. I know there are components like Word
Writer (not sure if it goes both ways, however). I would also look at the
open source community. You might still have to pay (depending on license),
but you should be able to work out a reasonable deal.

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA
http://gregorybeamer.spaces.live.com

*************************************************
Think outside of the box!
*************************************************

你好,


牛仔(Gregory A. Beamer)写道:
Hi,

Cowboy (Gregory A. Beamer) wrote:

最便宜的是办公室DLL。它也是最重要的,并且存在许可问题的潜在价值。
Cheapest is the office DLLs. It is also the most perf heavy and there is a
potential of licensing issues.



,MS建议不要在服务器上安装Office。值得阅读

这个上下文:
http://support.microsoft.com/default...en-us%3B257757

干杯,

奥拉夫

-

我的.02: www.Resources.IntuiDev.com




Olaf Rabbachin写道:

Olaf Rabbachin wrote:




牛仔(Gregory A. Beamer)写道:
Hi,

Cowboy (Gregory A. Beamer) wrote:

最便宜的是办公室DLL。它也是最重要的,并且存在许可问题的潜在价值。
Cheapest is the office DLLs. It is also the most perf heavy and there is a
potential of licensing issues.



,MS建议不要在服务器上安装Office。值得阅读

这个上下文:
http://support.microsoft.com/default...en-us%3B257757

干杯,

奥拉夫

-

我的.02: www.Resources.IntuiDev.com



谢谢Greg和Olaf。

在服务器上安装Office库肯定是不得已。

我甚至不认为我们的hist提供者会允许这样做。我会在componetsource.com看看

,看看能不能找到一些东西。如果其他东西

突然袭来,请告诉我。

再次感谢您的帮助。


Vince


这篇关于DOC - &gt; TXT组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆