OOXML SDK非法字符替换 [英] OOXML SDK illegal character replacements

查看:79
本文介绍了OOXML SDK非法字符替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用MS的Open XML SDK 2.0创建XLSX文档时遇到问题.

我的问题是我需要在excel工作表中显示其中一些非法字符,但是如果我将它们添加到工作表中,则文档将不会加载.

我正在使用此功能

    private static string ProcessString(string str)
    {
        return System.Security.SecurityElement.Escape(str);
    }

哪个会给我汤姆的球而不是汤姆的球. (好吧,我还没有弄清楚如何获得后者,因为生成的Excel无法打开.)

有人知道如何在Excel工作表中使用OOXML来显示非法XML字符吗?


在我用来创建文本单元格的函数中:

private static Cell CreateTextCell(string header, UInt32 index, string text)
{
    var c = new Cell { DataType = CellValues.String, CellReference = header + index };
    var cellValue = new CellValue(text);
    c.Append(cellValue);
    return c;
}

我知道这与非法字符有关,因为当我在文本中不包含特定字段时,它就起作用了,然后当我包含它时,Excel会给我一个解析器错误和一个空白文档.

我处理的文本也恰好也带有HTML标记.

ps.大声笑,我只是注意到使用markdown解析了我的HTML转义,使我的示例看起来很荒谬.

一些输入示例:

  • 癌症的复杂性:我们在寻找错误的水平来制定有效的干预措施吗?

  • BRCA1/i突变阴性女性乳腺癌风险的前瞻性研究.或< i> BRCA2< i凯瑟琳·坎宁安基金会(Kathleen Cuningham Foundation Foundation)进行的家族性乳腺癌研究(kConFab)中的突变阳性家庭.

  • Germline< em> BRCA2</em>突变与侵略性前列腺癌和不良结果相关.

基本上,html格式是这样,因此它可以显示在网页上.我应该删除基本的格式标签.但更重要的是,我希望excel文件能够加载并转义值是这样做的肯定方法.

解决方案

您确定这是引起问题的原因吗?您可以在单元格中添加正常"字符串并打开它吗?

AFAIK撇号字符不是非法的XML字符.

如果您在OOXML规范的22.9.2.19 ST_Xstring(转义字符串)(单元格中字符串的数据类型)中查找,您将看到以下说明:

* 22.9.2.19 ST_Xstring(转义字符串) 支持转义的无效XML字符的字符串. 对于无法按照XML 1.0规范定义的XML表示的所有字符,使用Unicode数字字符表示转义字符格式 xHHHH 对字符进行转义,其中H表示字符值中的十六进制字符. [示例:XML 1.0文档中不允许使用Unicode字符8,因此必须将其转义为 x0008 .最终示例] *

I am having an issue with creating an XLSX document with the Open XML SDK 2.0 from MS.

My issue is that I need to display some of these illegal characters in the excel sheet, but if I just add them to the sheet, the document will not load.

I am using this function

    private static string ProcessString(string str)
    {
        return System.Security.SecurityElement.Escape(str);
    }

Which will give me Tom&apos;s ball instead of Tom's ball. (Well I haven't figured out how to get the latter as the excel generated won't open.)

Anybody know how to make the illegal XML characters show using OOXML in an Excel sheet?

EDIT:
In function I am using to create a text cell is:

private static Cell CreateTextCell(string header, UInt32 index, string text)
{
    var c = new Cell { DataType = CellValues.String, CellReference = header + index };
    var cellValue = new CellValue(text);
    c.Append(cellValue);
    return c;
}

I know it has to do with illegal characters because when I didn't include a particular field in my text it worked, then when I included it, Excel would give me a parser error and a blank document.

The text that I deal with also happens to have HTML tags in it as well.

ps. lol, I just noticed that the markdown used parsed my HTML escape making my example look ridiculous.

edit 2:

Some example of input:

  • Cancer's Complexity: Are we Looking at the Wrong Levels to Develop Effective Interventions?

  • Prospective study of breast cancer risk in mutation-negative women from <i>BRCA1</i> or <i>BRCA2</i> mutation-positive families in the Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFab).

  • Germline <em>BRCA2</em> mutations correlate with aggressive prostate cancer and adverse outcome.

The html formatting is basically so it displays on the web page. I should just strip off the basic formatting tags. But more importantly, I want the excel file to load and escaping the values is a sure way of doing just that.

解决方案

Are you sure this is what is causing the problem? Can you add "normal" strings to the cells and open it?

AFAIK the apostrophe character is not an illegal XML character.

If you look in the OOXML specification in section 22.9.2.19 ST_Xstring (Escaped String) (the data type for strings in cells) you will see the following explanation:

*22.9.2.19 ST_Xstring (Escaped String) String of characters with support for escaped invalid-XML characters. For all characters which cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format xHHHH, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it must be escaped as x0008. end example]*

这篇关于OOXML SDK非法字符替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆