如何使用 OPENXML powertools 逐案访问和替换某些段落中的文本 [英] How to access and replace text in certain paragraphs using OPENXML powertools case by case

查看:234
本文介绍了如何使用 OPENXML powertools 逐案访问和替换某些段落中的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 c# 和 openxml 编辑一些 word 文件.我需要用某些短语来控制替换数字.每个单词文件包含不同数量的信息.我想为此目的使用 OPENXML powertools.

I am trying to redact some word files using c# and openxml. I need to do controlled replace of the numbers with certain phrase. Each word file contains different amount of info. I want to use OPENXML powertools for this purspose.

我使用普通的 openxml 方法进行替换,但它非常不可靠,并且会出现随机错误,例如零长度错误.我使用了正则表达式替换,这似乎有效,但它在整个文档中替换了它,这是非常不受欢迎的.

I used normal openxml method to replace but it very unreliable and gets random errors such as zero length error.I used regex replace and that seems to work but it replaces it through out the document which is highly undesirable.

这是代码的一些片段:

private void redact_Replaceall(string wfile)
        {
            try
            {
                using (WordprocessingDocument doc = WordprocessingDocument.Open(wfile, true))
                {
                    var ydoc = doc.MainDocumentPart.GetXDocument();
                    IEnumerable<XElement> content = ydoc.Descendants(W.body);



                    Regex regex = new Regex(@"\d+\.\d{2,3}");
                    int count1 = OpenXmlPowerTools.OpenXmlRegex.Match(content, regex);


                    int count2 = OpenXmlPowerTools.OpenXmlRegex.Replace(content, regex, replace_text, null);

                    statusBar1.Text = "Try 1: Found: " + count1 + ", Replaced: " + count2;


                    doc.MainDocumentPart.PutXDocument();

                }
            }
            catch(Exception e)
            {
                MessageBox.Show("Replace all exprienced error: " + e.Message);
            }

        }

基本上,我想根据段落内容进行编辑.我可以使用段落而不是 id 的

Basically, I want to do this redaction based on content of paragraph. I am able to get the paragraphs using but not the id's

IEnumerable<XElement> content = ydoc.Descendants(W.p);

这是我使用普通 openxml 方法的方法,但根据文件我得到了很多错误.

Here is my approach using the normal openxml method but I get alot of errors depending on the file.

  foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph para in bod.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
                                    {

                                        foreach (var run in para.Elements<Run>())
                                        {
                                            foreach (var text in run.Elements<Text>())
                                            {
                                                string temp = text.Text;
                                                int firstlength = first.Length + 1;
                                                int secondlength = second.Length + 1;
                                                if (text.Text.Contains(first) && !(temp.Length > firstlength))
                                                {
                                                    text.Text = text.Text.Replace(first, "DELETED");

                                                }

                                                if (text.Text.Contains(second) && !(temp.Length > secondlength))
                                                {
                                                    text.Text = text.Text.Replace(second, "DELETED");

                                                }
                                            }
                                        }
                                    }

这是最后一个新方法,但我坚持使用它

Here is the last new approach but I am stuck on it

   private void redact_Replacebadones(string wfile)
        {
            try
            {
                using (WordprocessingDocument doc = WordprocessingDocument.Open(wfile, true))
                {
                    var ydoc = doc.MainDocumentPart.GetXDocument();
                  /*  from XElement xele in ydoc.Root.Elements();
                    List<string> lhsElements = xele.Elements("lhs")
                               .Select(el => el.Attribute("id").Value)
                               .ToList();
                               */
                    /// XElement
                    IEnumerable<XElement> content = ydoc.Descendants(W.p);

                   foreach (var p in content )

                    {
                        if (p.Value.Contains("each") && !p.Value.Contains("DELETED"))
                        {

                            string to_overwrite = p.Value;
                            Regex regexop = new Regex(@"\d+\.\d{2,3}");

                            regexop.Replace(to_overwrite, "Deleted");

                            p.SetValue(to_overwrite);

                            MessageBox.Show("NAME :" + p.GetParagraphInfo() +" VValue:"+to_overwrite);
                        }

                    }


                    doc.MainDocumentPart.PutXDocument();

                }
            }
            catch (Exception e)
            {
                MessageBox.Show("Replace each exprienced error: " + e.Message);
            }

        } 

推荐答案

可能有点晚了.Eric white 的 OpenXML Power tools 有一个函数 SearchAndReplace,您可以在其中替换文本内容,因此您不必使用 RegEx 处理它.此函数还处理拆分为运行的文本.(如果你编辑一个词,一个词可以在runtet中进行拆分,这样你就可以直接找到搜索词组了.)可能这对某人有帮助.

May be a bit late. OpenXML Power tools by Eric white has a Function SearchAndReplace where you can replace Text content, so you don't have to handle it with RegEx. This function handles also text which is splitted into runs. (If you edit a word, a word can be splittet in runs, so you dint find the search phrase directly.) May be this helps somebody.

这篇关于如何使用 OPENXML powertools 逐案访问和替换某些段落中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆