使用OpenXML 2.5将数据写入docx文档中的TextInput元素 [英] Write data into TextInput elements in docx documents with OpenXML 2.5

查看:81
本文介绍了使用OpenXML 2.5将数据写入docx文档中的TextInput元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些docx文档.我使用OpenXML 2.5 SDK阅读它们,并在每个文档中搜索TextInput.

I have some docx documents. I read them with OpenXML 2.5 SDK and I search for the TextInputs in each doc.

        byte[] filebytes = System.IO.File.ReadAllBytes("Test.docx");

        using (MemoryStream stream = new MemoryStream(filebytes))
        using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
        {

            IEnumerable<FormFieldData> fields = wordDocument.MainDocumentPart.Document.Descendants<FormFieldData>();
            foreach (var field in fields) 
            {

                IEnumerable<TextInput> textInputs =  field.Descendants<TextInput>();
                foreach (var ti in textInputs)
                {
                    <<HERE>>
                }
            }

            wordDocument.MainDocumentPart.Document.Save();

            stream.Flush(); 
            ETC...
       }

如何在每个TextInput中写入一个值?

How could I write a value into each TextInput ?

谢谢!

推荐答案

首先考虑市场上提供了用于设置表单域值的简单方法的任何软件产品(有些价格昂贵,但仍然值得).

Firstly consider any of the software products (some quite costly but still may be worth it) in the market that provide simple methods for setting the value of formfields.

但是,如果有人坚持使用OpenXML SDK是一种对我有用的方法(向下滚动以查看代码)(根据我的经验显示了任务的复杂性,如果有人可以向我展示OpenXML SDK,将非常高兴处理它的方法):

But if one insist using the OpenXML SDK here is an approach (scroll down to see code) that works for me (shows the complexity of the task as I experience it, would be very happy if someone could show me an OpenXML SDK method which deals with it):

给出一个TextInput对象:

查找包含单独的" fieldchar的第一次运行.这将始终与textinput在同一段落中.

Find the first run containing "separate" fieldchar. This will always be in same paragraph as textinput.

查找包含"end"字段字符的以下第一个运行.这可能在同一段落中,但是如果表单域的现有值有任何段落,它将在另一段落中.

Find the first following run containing "end" fieldchar. This may be in same paragraph but if the existing value of the formfield have any paragraphs it will be in another paragraph.

在包含单独" fieldchar的运行之后查找第一个运行.如果此运行是包含结束" fieldchar的运行,则进行新运行,并将其添加到包含单独" fieldchar的运行之后.

Find the first run following the run containing "separate" fieldchar. If this run was the one containing "end" fieldchar make a new run and add it after the run containing "separate" fieldchar.

在此运行中删除所有文本元素(保留任何rPr).

Remove any text elements in this run (keep any rPr).

删除以下所有运行,直到包含"end"字段字符的运行.
(除了包含结束"字段字符的段落必须与包含结束"字段字符的段落合并之外,任何段落都必须删除.)

Remove all the following runs until the run containing "end" fieldchar.
(Any paragraphs must also be removed except the one containing the "end" fieldchar which must be merged with the one containing the "end" fieldchar.)

现在可以设置formfield的值.

Now the value of the formfield can be set.

如果值中的任何行均用作段落,请使用包含单独"字段字符的段落的深层克隆来制作段落模板".
从段落模板中删除除pPr之外的所有内容.

If any lines in the value are intended as paragraphs make a paragraph "template" using deep clone of the paragraph containing the "separate" fieldchar.
Remove everything from the paragraph template except pPr.

对于该值的第一行,只需在单个运行中添加一个文本元素,我们现在就可以在包含单独" fieldchar的运行与包含"end" fieldchar的运行之间.

For the first line in the value simply add a text element to the single run we now got between the run containing "separate" fieldchar and the run containing "end" fieldchar.

每增加一行:

如果该行不打算用作段落:

If the line is not intended to be a paragraph:

添加一个中断(< br/>).
深度克隆之前的运行并设置text元素,然后添加它.

Add a break (<br/>).
Deep clone the previous run and set the text element then add it.

如果该行打算用作段落:

If the line is intended to be a paragraph:

深度克隆段落模板,并将其添加到保存上一次运行的段落之后.
深度克隆之前的运行并设置text元素,然后添加它.

Deep clone the paragraph template and add it after the paragraph holding the previous run.
Deep clone the previous run and set the text element then add it.

如果添加了任何段落,则将包含"end" fieldchar和属于formfield的bookmarkend元素的运行移动到最后添加的段落的末尾.

If any paragraphs was added move the run containing "end" fieldchar and the bookmarkend element that belongs to the formfield to the end the last paragraph added.

上述内容的实现,但不支持输入值中的段落:

private static void SetFormFieldValue(TextInput textInput, string value)
{  // Code for http://stackoverflow.com/a/40081925/3103123

   if (value == null) // Reset formfield using default if set.
   {
      if (textInput.DefaultTextBoxFormFieldString != null && textInput.DefaultTextBoxFormFieldString.Val.HasValue)
         value = textInput.DefaultTextBoxFormFieldString.Val.Value;
   }

   // Enforce max length.
   short maxLength = 0; // Unlimited
   if (textInput.MaxLength != null && textInput.MaxLength.Val.HasValue)
      maxLength = textInput.MaxLength.Val.Value;
   if (value != null && maxLength > 0 && value.Length > maxLength)
      value = value.Substring(0, maxLength);

   // Not enforcing TextBoxFormFieldType (read documentation...).
   // Just note that the Word instance may modify the value of a formfield when user leave it based on TextBoxFormFieldType and Format.
   // A curious example:
   // Type Number, format "# ##0,00".
   // Set value to "2016 was the warmest year ever, at least since 1999.".
   // Open the document and select the field then tab out of it.
   // Value now is "2 016 tht,tt" (the logic behind this escapes me).

   // Format value. (Only able to handle formfields with textboxformfieldtype regular.)
   if (textInput.TextBoxFormFieldType != null
   && textInput.TextBoxFormFieldType.Val.HasValue
   && textInput.TextBoxFormFieldType.Val.Value != TextBoxFormFieldValues.Regular)
      throw new ApplicationException("SetFormField: Unsupported textboxformfieldtype, only regular is handled.\r\n" + textInput.Parent.OuterXml);
   if (!string.IsNullOrWhiteSpace(value)
   && textInput.Format != null
   && textInput.Format.Val.HasValue)
   {
      switch (textInput.Format.Val.Value)
      {
         case "Uppercase":
            value = value.ToUpperInvariant();
            break;
         case "Lowercase":
            value = value.ToLowerInvariant();
            break;
         case "First capital":
            value = value[0].ToString().ToUpperInvariant() + value.Substring(1);
            break;
         case "Title case":
            value = System.Globalization.CultureInfo.InvariantCulture.TextInfo.ToTitleCase(value);
            break;
         default: // ignoring any other values (not supposed to be any)
            break;
      }
   }

   // Find run containing "separate" fieldchar.
   Run rTextInput = textInput.Ancestors<Run>().FirstOrDefault();
   if (rTextInput == null) throw new ApplicationException("SetFormField: Did not find run containing textinput.\r\n" + textInput.Parent.OuterXml);
   Run rSeparate = rTextInput.ElementsAfter().FirstOrDefault(ru =>
      ru.GetType() == typeof(Run)
      && ru.Elements<FieldChar>().FirstOrDefault(fc =>
         fc.FieldCharType == FieldCharValues.Separate)
         != null) as Run;
   if (rSeparate == null) throw new ApplicationException("SetFormField: Did not find run containing separate.\r\n" + textInput.Parent.OuterXml);

   // Find run containg "end" fieldchar.
   Run rEnd = rTextInput.ElementsAfter().FirstOrDefault(ru =>
      ru.GetType() == typeof(Run)
      && ru.Elements<FieldChar>().FirstOrDefault(fc =>
         fc.FieldCharType == FieldCharValues.End)
         != null) as Run;
   if (rEnd == null) // Formfield value contains paragraph(s)
   {
      Paragraph p = rSeparate.Parent as Paragraph;
      Paragraph pEnd = p.ElementsAfter().FirstOrDefault(pa =>
      pa.GetType() == typeof(Paragraph)
      && pa.Elements<Run>().FirstOrDefault(ru =>
         ru.Elements<FieldChar>().FirstOrDefault(fc =>
            fc.FieldCharType == FieldCharValues.End)
            != null)
         != null) as Paragraph;
      if (pEnd == null) throw new ApplicationException("SetFormField: Did not find paragraph containing end.\r\n" + textInput.Parent.OuterXml);
      rEnd = pEnd.Elements<Run>().FirstOrDefault(ru =>
         ru.Elements<FieldChar>().FirstOrDefault(fc =>
            fc.FieldCharType == FieldCharValues.End)
            != null);
   }

   // Remove any existing value.

   Run rFirst = rSeparate.NextSibling<Run>();
   if (rFirst == null || rFirst == rEnd)
   {
      RunProperties rPr = rTextInput.GetFirstChild<RunProperties>();
      if (rPr != null) rPr = rPr.CloneNode(true) as RunProperties;
      rFirst = rSeparate.InsertAfterSelf<Run>(new Run(new[] { rPr }));
   }
   rFirst.RemoveAllChildren<Text>();

   Run r = rFirst.NextSibling<Run>();
   while(r != rEnd)
   {
      if (r != null)
      {
         r.Remove();
         r = rFirst.NextSibling<Run>();
      }
      else // next paragraph
      {
         Paragraph p = rFirst.Parent.NextSibling<Paragraph>();
         if (p == null) throw new ApplicationException("SetFormField: Did not find next paragraph prior to or containing end.\r\n" + textInput.Parent.OuterXml);
         r = p.GetFirstChild<Run>();
         if (r == null)
         {
            // No runs left in paragraph, move other content to end of paragraph containing "separate" fieldchar.
            p.Remove();
            while (p.FirstChild != null)
            {
               OpenXmlElement oxe = p.FirstChild;
               oxe.Remove();
               if (oxe.GetType() == typeof(ParagraphProperties)) continue;
               rSeparate.Parent.AppendChild(oxe);
            }
         }
      }
   }
   if (rEnd.Parent != rSeparate.Parent)
   {
      // Merge paragraph containing "end" fieldchar with paragraph containing "separate" fieldchar.
      Paragraph p = rEnd.Parent as Paragraph;
      p.Remove();
      while (p.FirstChild != null)
      {
         OpenXmlElement oxe = p.FirstChild;
         oxe.Remove();
         if (oxe.GetType() == typeof(ParagraphProperties)) continue;
         rSeparate.Parent.AppendChild(oxe);
      }
   }

   // Set new value.

   if (value != null)
   {
      // Word API use \v internally for newline and \r for para. We treat \v, \r\n, and \n as newline (Break).
      string[] lines = value.Replace("\r\n", "\n").Split(new char[]{'\v', '\n', '\r'});
      string line = lines[0];
      Text text = rFirst.AppendChild<Text>(new Text(line));
      if (line.StartsWith(" ") || line.EndsWith(" ")) text.SetAttribute(new OpenXmlAttribute("xml:space", null, "preserve"));
      for (int i = 1; i < lines.Length; i++)
      {
         rFirst.AppendChild<Break>(new Break());
         line = lines[i];
         text = rFirst.AppendChild<Text>(new Text(lines[i]));
         if (line.StartsWith(" ") || line.EndsWith(" ")) text.SetAttribute(new OpenXmlAttribute("xml:space", null, "preserve"));
      }
   }
   else
   { // An empty formfield of type textinput got char 8194 times 5 or maxlength if maxlength is in the range 1 to 4.
      short length = maxLength;
      if (length == 0 || length > 5) length = 5;
      rFirst.AppendChild(new Text(((char)8194).ToString()));
      r = rFirst;
      for (int i = 1; i < length; i++) r = r.InsertAfterSelf<Run>(r.CloneNode(true) as Run);
   }
}

注意1 :不能保证上面的逻辑适用于textinput表单域的所有可能变体.应该阅读所有相关元素的开放xml文档,以查看是否有任何哥特.一件事是用户在Word或任何其他编辑器中编辑的文档.另一件事是由处理OpenXML的许多软件产品创建/编辑的文档.

NOTE 1: The logic above is not guaranteed to work with all possible variations of textinput formfields. One should read the open xml documentation for all relevant elements to see if there are any gothcas. One thing is a document edited by a user in Word or any other editor. Another thing is documents created/edited by any number of software products that handle OpenXML.

注意2 :简单地在Word中制作一些文档非常有用.
每个都包含一个带有
的单个textinput表单字段 -无值
-单行文字
-多行文字
-多段文字
-多个空段
-字体和段落格式(f.ex字体大小20,段落行距三联)
然后在Visual Studio中打开它们,然后查看document.xml(使用设置文档格式"功能获得可读的xml).
这真是大开眼界,因为它揭示了表单域的复杂性,并可能导致人们重新考虑处理表单域的产品.

NOTE 2: It is very helpful to simply make some documents in Word.
Each containing a single textinput formfield with
- no value
- a single line of text
- multiple lines of text
- multiple paragraphs of text
- multiple empty paragraphs
- font and paragraph formatting (f.ex font size 20, paragraph linespacing trippel)
Then open these in Visual Studio and look at document.xml (use the Format document feature to get readable xml).
This is quite an eye-opener as it reveals the complexity of formfields and may cause one to reconsider bying a product which deals with it.

注意3 :表单字段的类型和格式存在未解决的问题.

NOTE 3: There are unresolved issues around formfield type and format.

这篇关于使用OpenXML 2.5将数据写入docx文档中的TextInput元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆