使用C#ASP.NET将Word文档内容传输到服务器端的Web环境 [英] Transfer Word Document Contents to Web Environment on Server Side Using C# ASP.NET

查看:55
本文介绍了使用C#ASP.NET将Word文档内容传输到服务器端的Web环境的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一些Ms Word .docx格式的内容,由客户准备.这些文档可能包含方程式,图像等.

We have some contents in Ms Word .docx formats, prepared by our customers. These documents may have equations, images, etc.

我们希望将这些内容传输到我们的网络环境中.

We want to transfer these contents to our web environment.

首先,我计划使用TinyMCE从单词粘贴"插件和fmath编辑器插件.没用...

Firstly, I plan to use TinyMCE "paste from word" plugin and fmath editor plugin. No use...

然后,我决定放置上载按钮,以传输ms字内容并将其显示的网站内容显示到TinyMCE编辑器中.实际上就像编写一个新插件一样.

Then I decide to put upload button to transfer ms word contents and showing resulting web contents into TinyMCE editor. Actually something like writing a new plugin.

我正在使用Microsoft.Office.Interop.Word.Document类的另存为"方法.但是我有以下问题:

I am using Microsoft.Office.Interop.Word.Document class's "SaveAs" method. But I have following problems:

1)我无法更改文档资源文件夹路径.它生成与生成的html文件相同的"..._ files"文件夹.我想将所有资源转移到服务器上的适当位置.

1) I can not change document resources folder path. It generate "..._files" folder same with generated html file. I want to transfer all resources to appropriate places on the server.

2)我不能将图像源路径更改为绝对路径.

2) I can not change the image source paths as absolute paths.

3)太多的垃圾样式,生成的html文件中的代码.

3) Too many garbage styles, codes on generated html file.

我可能完全以错误的方式实现了这一目的.因此,我决定听取您的建议,然后再继续按照这个方向进行操作.我有任何建议.

I may totally in wrong way to achieve this purpose. So I decided to get your advices, before continue in this directions. I am open any suggestion.

此致

我要添加此代码的草稿版本:

I am adding draft version of this code:



    var fileName = Request["docfilename"];
    var file = Request.Files[0];
    var buffer = new byte[file.ContentLength];
    file.InputStream.Read(buffer, 0, file.ContentLength);
    var root = HttpContext.Current.Server.MapPath(@"~/saveddata/_temp/");
    var path = Path.Combine(root, fileName);

    using (var fs = new FileStream(path, FileMode.Create))
    {
        using (var br = new BinaryWriter(fs))
        {
            br.Write(buffer);
        }
    }


    Microsoft.Office.Interop.Word.ApplicationClass oWord = new ApplicationClass();
    object missing = System.Reflection.Missing.Value;
    object isVisible = false;
    word.Document oDoc;
    object filename = path;
    object saveFile;
    oDoc = oWord.Documents.Open(ref filename, ref missing, ref missing, ref missing,
     ref missing, ref missing, ref missing, ref missing,
     ref missing,ref missing, ref missing, ref missing, ref missing, ref missing,
                        ref missing, ref missing);
    oDoc.Activate();

    object path2 = Path.Combine(root, "test.html");
    object fileFormat = word.WdSaveFormat.wdFormatFilteredHTML;
    oDoc.SaveAs(ref path2, ref fileFormat, missing, missing, missing, missing, missing, missing,
                missing, missing, missing, missing, missing, missing, missing, missing);

    oDoc.Close(ref missing, ref missing, ref missing);
    oWord.Application.Quit(ref missing, ref missing, ref missing);

推荐答案

这是一件微妙的事情.我面临着同样的问题,因为doc有很多样式标签.如果您注意到了,请尝试在Facebook上共享一个URL(具有Word文档内容),然后在URL的描述/摘要中,曾经出现过不需要的标签:)所以我想这个问题在那里也存在.我建议,通过信息检索的基础知识,并尝试智能地剥离样式标签.您将需要使用正则表达式编写大部分剥离代码

This is a delicate matter. I was facing the same problem as doc has a lot of style tags. If you notice, try to share a url (which has word doc content) on facebook, then in the description/summary of url, the unwanted tags used to come :) So I guess the issue is persistent there too. I would suggest, go through the basics of Information Retrieval and try to intelligently strip the style tags. You will be required to write most of your stripping code with regular expressions

这篇关于使用C#ASP.NET将Word文档内容传输到服务器端的Web环境的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆