多列docx文件C#的有条件新中断 [英] Conditional new Break for multi-column docx file, C#

查看:99
本文介绍了多列docx文件C#的有条件新中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是针对从ObservableCollection创建Word文件的后续问题使用C#.
我有一个带Body的.docx文件,该文件的SectionProperties有2列.我有一个外来词词典及其翻译.在每一行上,我需要[Word] = [Translation],每当一个新字母开始时,它都应该在自己的行中,在该字母前后要有2或3个换行符,如下所示:

This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:

A

一个单词 =翻译
A-单词 =翻译

A-word = translation
A-word = translation

B

B字 =翻译
B字 =翻译
...

B-word = translation
B-word = translation
...

我在for循环中对此进行了结构化,因此在每次迭代中,我都创建了一个新段落,其中字母可能为Run(如果有新的开始),单词为Run,而Run进行翻译.因此,具有第一个字母的Run与单词和翻译Run在同一Paragraph中,并且在Text之前和之后附加2或3个Break对象.
这样,第二列有时可以以1或2个空行开头.或者下一页的第一列可以以空行开头.
这就是我要避免的.

I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.

所以我的问题是,我可以以某种方式检查是否到达了页面的末尾,还是文本位于列的顶部,因此我不必添加Break?或者,我可以格式化Column本身,使其不以空行开头吗?

So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?

我尝试将字母Run放在一个单独的可选Paragraph中,但是再次,我发现自己不得不输入换行符,问题仍然存在.

I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.

推荐答案

本着我的其他答案的精神,您可以扩展模板功能. 使用生产力工具生成单个分页符对象,例如:

In the spirit of my other answer you can extend the template capability. Use the Productivity tool to generate a single page break object, something like:

private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));

制作一个帮助程序方法来查找文本标签的容器:

Make a helper method that finds containers of a text tag:

公共IEnumerable FindElements(OpenXmlCompositeElement searchParent,字符串tagRegex) T:OpenXmlElement { var regex = new Regex(tagRegex);

public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex) where T: OpenXmlElement { var regex = new Regex(tagRegex);

return searchParent.Descendants() 
    .Where(e=>(!(e is OpenXmlCompositeElement) 
              && regex.IsMatch(e.InnerText)))
    .SelectMany(e => 
        e.Ancestors()
            .OfType<T>()
            .Union(e is T ? new T[] { (T)e } : new T[] {} ))
    .ToList(); // can skip, prevents reevaluations 

}

另一个从文档中复制一个范围并删除范围的对象:

And another one that duplicates a range from the document and deletes range:

public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
  where T: OpenXmlElement
{ 
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose

  var tagElements = FindElements(root, tagRegex);
  var fromEl = tagElements.First();
  var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el

// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.

  var result = new List<OpenXmlElement>();

  var step = fromEl.NextSibling();
  while (step !=null && toEl!=null && step!=toEl){
   // another method called DeleteRange will instead delete elements in that range within this loop
    var copy = step.CloneNode();
    toEl.InsertAfterSelf(copy);
    result.Add(copy);
    step = step.NextSibling();
  }

  return result;
}


public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
  var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
  var regex = new Regex(tagRegex);
  foreach(var el in  replaceElements){
     el.InnerText = regex.Replace(el.InnerText, replacement);
  }

  return replaceElements;
}

现在您可以拥有一个如下所示的文档:

Now you can have a document that looks like this:

[页] [TitleLetter]

[page] [TitleLetter]

[WordTemplate] [Word]:[翻译] [/WordTemplate]

[WordTemplate][Word]: [Translation] [/WordTemplate]

[pageBreak] [/page]

[pageBreak] [/page]

使用该文档,您可以复制[page] .. [/page]范围,按字母进行处理,一旦出现字母不足的情况,请删除模板范围:

With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:

var词汇=词典>;

var vocabulary = Dictionary>;

foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
  // in reverse order because the copy range comes after the template range
  var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");

  foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){

    ReplaceTag(p, "[TitleLetter]",""+letter);
    var pageBr = ReplaceTag(p, "[pageBreak]","");
    if (pageBr.Any()){
      foreach(var pbr in pageBr){
       pbr.InsertAfterSelf(PageBreakPara.CloneNode()); 
      }
    }
    var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
    if (wordTemplateFound .Any()){
       foreach (var word in vocabulary[letter].Keys){
          var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
              .First(); // since it's a single paragraph template
          ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
          ReplaceTag(wordTemplate, "\\[Word]",word);
          ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
       }
    }
  }
}

...或者类似的东西.

...Or something like it.

  • 如果事情开始变得太复杂,请查看SdtElements
  • 尽管该答案很受欢迎,但仍不要使用AltChunk,它需要Word来打开和处理文件,因此您不能使用某些库来制作PDF
  • Word文档杂乱无章,上面的解决方案应该可以使用(未​​经测试),但是模板必须经过精心设计,经常对模板进行备份
  • 制作一个强大的文档引擎并不容易(因为Word太乱了),请尽您所能,并依赖模板在您的控件中(不可用户编辑).
  • 上面的代码远未优化或简化,我已经尝试过以可能的最小代价将其压缩为最小的占用空间.可能也有错误:)

这篇关于多列docx文件C#的有条件新中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆