多列docx文件C#的有条件新中断 [英] Conditional new Break for multi-column docx file, C#
问题描述
这是针对从ObservableCollection创建Word文件的后续问题使用C#.
我有一个带Body
的.docx文件,该文件的SectionProperties
有2列.我有一个外来词词典及其翻译.在每一行上,我需要[Word] = [Translation],每当一个新字母开始时,它都应该在自己的行中,在该字母前后要有2或3个换行符,如下所示:
This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body
that has 2 columns for its SectionProperties
. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:
A
一个单词 =翻译
A-单词 =翻译
A-word = translation
A-word = translation
B
B字 =翻译
B字 =翻译
...
B-word = translation
B-word = translation
...
我在for循环中对此进行了结构化,因此在每次迭代中,我都创建了一个新段落,其中字母可能为Run
(如果有新的开始),单词为Run
,而Run
进行翻译.因此,具有第一个字母的Run
与单词和翻译Run
在同一Paragraph
中,并且在Text
之前和之后附加2或3个Break
对象.
这样,第二列有时可以以1或2个空行开头.或者下一页的第一列可以以空行开头.
这就是我要避免的.
I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run
for the letter (if a new one starts), a Run
for the word and a Run
for the translation. So the Run
with the first letter is in the same Paragraph
as the word and translation Run
and it appends 2 or 3 Break
objects before and after the Text
.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.
所以我的问题是,我可以以某种方式检查是否到达了页面的末尾,还是文本位于列的顶部,因此我不必添加Break
?或者,我可以格式化Column
本身,使其不以空行开头吗?
So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break
? Or, can I format the Column
itself so that it doesn't start with an empty line?
我尝试将字母Run
放在一个单独的可选Paragraph
中,但是再次,我发现自己不得不输入换行符,问题仍然存在.
I have tried putting the letter Run
in a separate, optional, Paragraph
, but again, I find myself having to input line breaks and the problem remains.
推荐答案
本着我的其他答案的精神,您可以扩展模板功能. 使用生产力工具生成单个分页符对象,例如:
In the spirit of my other answer you can extend the template capability. Use the Productivity tool to generate a single page break object, something like:
private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));
制作一个帮助程序方法来查找文本标签的容器:
Make a helper method that finds containers of a text tag:
公共IEnumerable FindElements(OpenXmlCompositeElement searchParent,字符串tagRegex) T:OpenXmlElement { var regex = new Regex(tagRegex);
public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex) where T: OpenXmlElement { var regex = new Regex(tagRegex);
return searchParent.Descendants()
.Where(e=>(!(e is OpenXmlCompositeElement)
&& regex.IsMatch(e.InnerText)))
.SelectMany(e =>
e.Ancestors()
.OfType<T>()
.Union(e is T ? new T[] { (T)e } : new T[] {} ))
.ToList(); // can skip, prevents reevaluations
}
另一个从文档中复制一个范围并删除范围的对象:
And another one that duplicates a range from the document and deletes range:
public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
where T: OpenXmlElement
{
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose
var tagElements = FindElements(root, tagRegex);
var fromEl = tagElements.First();
var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el
// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.
var result = new List<OpenXmlElement>();
var step = fromEl.NextSibling();
while (step !=null && toEl!=null && step!=toEl){
// another method called DeleteRange will instead delete elements in that range within this loop
var copy = step.CloneNode();
toEl.InsertAfterSelf(copy);
result.Add(copy);
step = step.NextSibling();
}
return result;
}
public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
var regex = new Regex(tagRegex);
foreach(var el in replaceElements){
el.InnerText = regex.Replace(el.InnerText, replacement);
}
return replaceElements;
}
现在您可以拥有一个如下所示的文档:
Now you can have a document that looks like this:
[页] [TitleLetter]
[page] [TitleLetter]
[WordTemplate] [Word]:[翻译] [/WordTemplate]
[WordTemplate][Word]: [Translation] [/WordTemplate]
[pageBreak] [/page]
[pageBreak] [/page]
使用该文档,您可以复制[page] .. [/page]范围,按字母进行处理,一旦出现字母不足的情况,请删除模板范围:
With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:
var词汇=词典>;
var vocabulary = Dictionary>;
foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
// in reverse order because the copy range comes after the template range
var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");
foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){
ReplaceTag(p, "[TitleLetter]",""+letter);
var pageBr = ReplaceTag(p, "[pageBreak]","");
if (pageBr.Any()){
foreach(var pbr in pageBr){
pbr.InsertAfterSelf(PageBreakPara.CloneNode());
}
}
var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
if (wordTemplateFound .Any()){
foreach (var word in vocabulary[letter].Keys){
var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
.First(); // since it's a single paragraph template
ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
ReplaceTag(wordTemplate, "\\[Word]",word);
ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
}
}
}
}
...或者类似的东西.
...Or something like it.
- 如果事情开始变得太复杂,请查看SdtElements
- 尽管该答案很受欢迎,但仍不要使用AltChunk,它需要Word来打开和处理文件,因此您不能使用某些库来制作PDF
- Word文档杂乱无章,上面的解决方案应该可以使用(未经测试),但是模板必须经过精心设计,经常对模板进行备份
- 制作一个强大的文档引擎并不容易(因为Word太乱了),请尽您所能,并依赖模板在您的控件中(不可用户编辑).
- 上面的代码远未优化或简化,我已经尝试过以可能的最小代价将其压缩为最小的占用空间.可能也有错误:)
这篇关于多列docx文件C#的有条件新中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!