OpenXML-Word文档的Infopath RichText Box出现格式错误 [英] OpenXML - Infopath RichText Box to Word Document gives formatting errors

查看:116
本文介绍了OpenXML-Word文档的Infopath RichText Box出现格式错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经以InfoPath形式设置了RTF文本框,我的程序通过Infopath XML进行了如下解析:

XPathNavigator formNameNode = root.SelectSingleNode("/my:myFields/my:Responses/my:Q1", nsMgr);
string response1 = formNameNode.InnerXml;

然后使用以下代码打开word文档并获取名为response1的纯文本内容控件:

    using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(ms, true))
    {
        MainDocumentPart mainPart = myDoc.MainDocumentPart;

    List<OpenXmlElement> sdtList = InfoPathToWord.GetContentControl(mainPart.Document, "response1");
            InfoPathToWord.AddRichText(0, response1, ref mainPart, ref sdtList);
}

代码然后调用如下所示的InfoPathToWord.AddRichText:

public static void AddRichText(int id, string rtfValue,
          ref MainDocumentPart mainPart, ref List<OpenXmlElement> sdtList)
        {
            if (sdtList.Count != 0)
            {
                id++;
                string altChunkId = "AltChunkId" + id;
                AlternativeFormatImportPart chunk =
                  mainPart.AddAlternativeFormatImportPart(
                  AlternativeFormatImportPartType.Xhtml, altChunkId);

                using (MemoryStream ms = new MemoryStream(System.Text.Encoding.Default.GetBytes(rtfValue)))
                {
                    chunk.FeedData(ms);
                    ms.Close();
                }

                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;

                InfoPathToWord.ReplaceContentControl(sdtList, altChunk);
            }
        }

最后altChunk取代了"response1"

    public static void ReplaceContentControl(
      List<OpenXmlElement> sdtList, OpenXmlElement element)
    {
        if (sdtList.Count != 0)
        {
            foreach (OpenXmlElement sdt in sdtList)
            {
                OpenXmlElement parent = sdt.Parent;
                parent.InsertAfter(element, sdt);
                sdt.Remove();
            }
        }
    }

问题在于它替换了文本,但格式不正确并显示?"输出文本中的字符. 不知道它是否是由于编码引起的,我也尝试过System.Text.Encoding.UTF8.GetBytes(rtfValue), System.Text.Encoding.ASCII.GetBytes(rtfValue),但似乎没有帮助.

请有人告诉我我做错了.

谢谢.

具有

解决方案

我正在使用regx在保存之前清理字符串.

html = Regex.Replace(html,"/[\ x00- \ x08 \ x0B \ x0C \ x0E- \ x1F \ x80- \ x9F]/u",")'允许制表符和其他可打印字符

将ms设置为新的MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)) '创建其他格式的导入部件. 昏暗的formatImportPart为AlternativeFormatImportPart = mainDocPart.AddAlternativeFormatImportPart("application/xhtml + xml",altChunkId)

正则表达式要从字符串中删除所有特殊字符吗?

更新...经过严格的测试后,我发现docx中的InfoPath RTF出现了太多字符编码问题.

I've setup Rich Text Box in InfoPath form, my program parses through the Infopath XML as below:

XPathNavigator formNameNode = root.SelectSingleNode("/my:myFields/my:Responses/my:Q1", nsMgr);
string response1 = formNameNode.InnerXml;

The following code is then used to open a word document and get a Plain Text Content Control called response1:

    using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(ms, true))
    {
        MainDocumentPart mainPart = myDoc.MainDocumentPart;

    List<OpenXmlElement> sdtList = InfoPathToWord.GetContentControl(mainPart.Document, "response1");
            InfoPathToWord.AddRichText(0, response1, ref mainPart, ref sdtList);
}

The code then calls InfoPathToWord.AddRichText which is as below:

public static void AddRichText(int id, string rtfValue,
          ref MainDocumentPart mainPart, ref List<OpenXmlElement> sdtList)
        {
            if (sdtList.Count != 0)
            {
                id++;
                string altChunkId = "AltChunkId" + id;
                AlternativeFormatImportPart chunk =
                  mainPart.AddAlternativeFormatImportPart(
                  AlternativeFormatImportPartType.Xhtml, altChunkId);

                using (MemoryStream ms = new MemoryStream(System.Text.Encoding.Default.GetBytes(rtfValue)))
                {
                    chunk.FeedData(ms);
                    ms.Close();
                }

                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;

                InfoPathToWord.ReplaceContentControl(sdtList, altChunk);
            }
        }

And finally the altChunk replaces the "response1"

    public static void ReplaceContentControl(
      List<OpenXmlElement> sdtList, OpenXmlElement element)
    {
        if (sdtList.Count != 0)
        {
            foreach (OpenXmlElement sdt in sdtList)
            {
                OpenXmlElement parent = sdt.Parent;
                parent.InsertAfter(element, sdt);
                sdt.Remove();
            }
        }
    }

The issue is that it replaces the text but the formatting is not correct and shows "?" character in the Output text. Not sure if its being caused because of encoding, I've also tried System.Text.Encoding.UTF8.GetBytes(rtfValue), System.Text.Encoding.ASCII.GetBytes(rtfValue) but none of this seems to help.

Please could someone tell me what I'm doing wrong.

Thanks in advance.

Mave

解决方案

I'm using a regx to clean the string prior to save.

html = Regex.Replace(html, "/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/u", "") ' allows tab and other printable chars

Dim ms As New MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)) ' Create alternative format import part. Dim formatImportPart As AlternativeFormatImportPart = mainDocPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId)

Regex to remove all special characters from string?

UPDATE... after rigorous testing I've found too many character encoding issues with InfoPath RTF in a docx.

这篇关于OpenXML-Word文档的Infopath RichText Box出现格式错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆