不使用Interop.word DLL读取word文件...不想在IIS中安装单词.. [英] Read a word file without using Interop.word dll...Do not want to install word in IIS..

查看:52
本文介绍了不使用Interop.word DLL读取word文件...不想在IIS中安装单词..的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在不使用Interop.word dll的情况下读取word文件...不想在IIS中安装单词.Actualy我通过将word文件转换为txt文件并从中读取来进行关键字搜索。我尝试过使用Open xml SDK但它没有正确读取旧的doc文件..也找到spire.doc这是支付类型..最早提供一个完整的代码解决方案...

代码如下:

I wish to Read a word file without using Interop.word dll...Do not want to install word in IIS..Actualy I made a keyword search by converting word file into txt file and reading from it..I tried using Open xml SDK but it doesn't read old doc files correctly..Also found spire.doc which is payment type..Provide a complete code with solution at the earliest...
Code as follows:

private void SearchWord(string[] str1)
      {
          string filename1 = "";
          string randomName = "";
          string fname = "";
          Session["cids"] = "";
          object missingType = Type.Missing;
          object readOnly = true;
          object isVisible = false;
          object documentFormat = 8;
          string s12 = "select id,docfilename from Uploadeddocsmaster";
          dt = cn.viewdatatable(s12);
          int dtcount = dt.Rows.Count * 2;
          string[] ids = new string[dtcount];

          for (int k = 0; k < dt.Rows.Count; k++)
          {
              string id = dt.Rows[k]["id"].ToString();
              filename1 = dt.Rows[k]["docfilename"].ToString();
              string fileName = Server.MapPath("~/UploadedFiles/") + filename1;
              string ext = Path.GetExtension(fileName);
              if (ext == ".doc" || ext == ".docx")
              {
                  RichEditDocumentServer server = new RichEditDocumentServer();
                  server.LoadDocument("document.doc", DocumentFormat.Doc);
                  server.ExportToPdf(memoryStream);

                  Application applicationclass = new Application();
                  string[] crefids = filename1.Split('.');
                  for (int mj = 0; mj < crefids.Length; mj++)
                  {
                      randomName = crefids[0].ToString();
                  }

                  object Source = fileName;
                  object Target = Server.MapPath("~/Temp/" + randomName + ".txt");
                  fname = Target.ToString();
                  // object Target = @"D:\Alex\ResumeManager Dec 6,2012\ResumeManager\Uploaddocs\test1.txt";

                  //Upload the word document and save to Temp folder
                  // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));


                  applicationclass.Documents.Open(ref Source,
                                                  ref readOnly,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType, ref isVisible,
                                                  ref missingType, ref missingType, ref missingType,
                                                  ref missingType, ref missingType);
                  applicationclass.Visible = false;
                  Document document = applicationclass.ActiveDocument;
                  object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatUnicodeText;

                  //Save the word document as HTML file
                  document.SaveAs(ref Target, ref format, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType, ref missingType, ref missingType,
                                  ref missingType);

                  //Close the word document
                  document.Close(ref missingType, ref missingType, ref missingType);


                  foreach (string str in str1)
                  {

                      using (StreamReader sr = new StreamReader(fname))
                      {

                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }

                  }
              }

              else if (ext == ".pdf")
              {
                  string randomName1 = DateTime.Now.Ticks.ToString();
                  string fname1 = "";



                  object Target1 = Server.MapPath("~/Temp/" + randomName1 + ".txt");
                  fname1 = Target1.ToString();

                  PDDocument doc = PDDocument.load(fileName);
                  PDFTextStripper stripper = new PDFTextStripper();
                  string s = stripper.getText(doc).ToLower();
                  System.IO.StreamWriter LogFile = new System.IO.StreamWriter(fname1, true);
                  LogFile.WriteLine(s);
                  LogFile.Close();
                  foreach (string str in str1)
                  {
                      using (StreamReader sr = new StreamReader(fname1))
                      {

                          if (string.IsNullOrEmpty(str) == false)
                          {
                              string szReadAll = sr.ReadToEnd().ToLower();
                              if (Regex.IsMatch(szReadAll, str.ToLower()))
                              {
                                  if (!ids.Contains(id))
                                  {
                                      ids[mn] = id;
                                  }
                                  Session["ids"] = ids;
                              }
                          }
                      }

                  }
              }
              mn++;

          }




          //Upload the word document and save to Temp folder
          // FileUpload1.SaveAs(Server.MapPath("~/Temp/") + Path.GetFileName(FileUpload1.PostedFile.FileName));

      }



已添加代码块[/ Edit]


Code block added[/Edit]

推荐答案

我完全理解你是否不想搞砸Microsoft Office安装和Office互操作,但首先,想想为什么要搞乱Microsoft Office文档 - 专有产品是专有的。目前,还有其他一些选择。



尽管如此,Office文档的最新版本并非如此专有。您可以随时学习它们,因为它们没有标准化。请参阅:

http://en.wikipedia.org/wiki/Office_Open_XML [< a href =http://en.wikipedia.org/wiki/Office_Open_XMLtarget =_ blanktitle =New Window> ^ ],

http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats [ ^ ],

http ://en.wikipedia.org/wiki/Office_Open_XML_file_formats [ ^ ]。



(不要将它们与OpenDocument混合, http://en.wikipedia.org/wiki/OpenDocument [ ^ ]。)



现在,还有另一种方法。有第三方产品使用Microsoft Office文档。如果他们能做到,你也可以做到。您只需下载一些开源产品的源代码并了解其工作原理。我所知道的唯一开源代码是OpenOffice本身(其中.odt来自)和它的 fork LibreOffice。请参阅:

http://en.wikipedia.org/wiki/OpenOffice.org [ ^ ],

http://www.openoffice.org/ [ ^ ],

http:// en.wikipedia.org/wiki/LibreOffice [ ^ ],

http://www.libreoffice.org/ [ ^ ]。



您可以下载源代码并查找代码使用几乎所有版本的Office文档。当然还有.ODT和所有其他OpenOffice / LibreOffice文档。



请同时查看我过去的答案:

将Office文档转换为PDF,无需互操作 [ ^ ],

您好如何使用c#.net 在Windows应用程序中显示word文件[ ^ ]。



-SA
I perfectly understand if you don't want to mess with Microsoft Office installation and Office interop, but first of all, think why messing with Microsoft Office documents at all — proprietary product is proprietary. These days, there is a number of other option.

Nevertheless, the last versions of Office documents are not so proprietary. You can always learn them, as they are no standardized. Please see:
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats[^],
http://en.wikipedia.org/wiki/Office_Open_XML_file_formats[^].

(Don't mix them up with OpenDocument, http://en.wikipedia.org/wiki/OpenDocument[^].)

Now, there is another approach to it. There are third-party products working with Microsoft Office document. If they can do it, you can, too. You just need to download source code of some open-source products and find out how it works. The only open-source code I know is OpenOffice itself (where .odt came from) and its fork LibreOffice. Please see:
http://en.wikipedia.org/wiki/OpenOffice.org[^],
http://www.openoffice.org/[^],
http://en.wikipedia.org/wiki/LibreOffice[^],
http://www.libreoffice.org/[^].

You can download the source and find the code working with nearly all versions of Office documents. And, of course, .ODT and all other OpenOffice/LibreOffice documents.

Please also see my past answers:
Convert Office-Documents to PDF without interop[^],
Hi how can i display word file in windows application using c#.net[^].

—SA


查看我的评论谢尔盖的回答,并阅读: http:// a.nnotate.com/server-installation-windows.html [ ^ ] - 部分:使用OpenOffice添加对上传DOC,PPT,XLS等的支持
See my comment to Sergey's answer, and read this: http://a.nnotate.com/server-installation-windows.html[^] - section: Adding support for uploading DOC, PPT, XLS etc using OpenOffice.


http://stackoverflow.com/questions/12426108/word-interop-does-not-save-file -c-sharp [ ^ ]



http://www.independentsoft.com/word/tutorial/index.html [ ^ ]


这篇关于不使用Interop.word DLL读取word文件...不想在IIS中安装单词..的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆