docx牢不可破的单词 [英] docx unbreakable words

查看:66
本文介绍了docx牢不可破的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试替换docx文件中的单词,如此处所述:

I'm trying to replace words in a docx file like described here:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

工作正常,除了有时对于文档中的SomeTest,您会得到类似以下内容的信息:

That's working fine except that sometimes for SomeTest in a document you would get something like:

    <w:t>
        Some
    </w:t>
</w:r>

<w:r w:rsidR="009E5AFA">
    <w:rPr>
        <w:b/>
        <w:color w:val="365F91"/>
        <w:sz w:val="22"/>
    </w:rPr>
    <w:t>
        Test
    </w:t>
</w:r>

当然替换失败.也许有一种解决方法可以使docx中的某些单词牢不可破?还是我在做替换错误?

And of course replacement fails. Perhaps there is a workaround to make some words unbreakable in docx? Or perhaps I'm doing replace wrong?

推荐答案

解决此问题的一种方法是在进行转换之前对文档的xml进行标准化.您可以使用 OpenXml Powertools 来完成此操作.

One way to solve this is normalizing the xml of your document before doing transformtions. You can make use of OpenXml Powertools to do this.

用于规范化xml的示例代码

Sample code to normalize xml

 using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
                // Additional settings if required
                AcceptRevisions = true,
                RemoveBookmarks = true,
                RemoveComments = true,
                RemoveGoBackBookmark = true,
                RemoveWebHidden = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = true,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }

这将简化Open Xml文档的标记,从而使进一步的转换更容易以编程方式使用该文档.在编程方式处理打开的xml文档之前,我总是使用它.

This will simplify the markup of Open Xml document to make further transformations easier to work with the document programatically. I always use it before working with a open xml document programatically.

有关使用这些工具的更多信息,请参见此处和一篇不错的博客文章

More Info about using these tools can be found here and a good blog article here.

这篇关于docx牢不可破的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆