OpenXML SDK2.5(Excel):如何确定单元格是否包含数字值? [英] OpenXML SDK2.5 (Excel): How to determine if a cell contains a numeric value?

查看:232
本文介绍了OpenXML SDK2.5(Excel):如何确定单元格是否包含数字值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正忙于开发一个从MS Excel(2016)文件导入数据的组件. 该组件使用MS OpenXML SDK2.5库. MS Excel的最终用户安装基于荷兰的国家/地区设置. 该文件除其他外,包含带有财务数据(数字)的列.此列的位置未知.

要确定单元格是否包含数字数据,我评估属性Cell.DataType(类型为CellValues,它是一个枚举). 起初,似乎该属性是确定此属性的理想选择. CellValues的可能值为: 布尔值,数字,错误,SharedString,字符串,InlineString或日期.因此,我希望将Cell.DataType设置为CellValues.Number. 经过一些调试后,我发现当单元格包含数字数据时,Cell.DataType为null.

在Internet上搜索以查找说明时,我发现了以下MSDN文章: https://msdn.microsoft.com/en-us/library/office/hh298534.aspx

本文准确地描述了我在调试过程中发现的内容:

单元格类型提供DataType属性,该属性指示单元格内数据的类型.对于数字和日期类型,DataType属性的值为null.

有人知道为什么不分别用CellValues.Number或CellValues.Date初始化Cell.DataType吗?

确定单元格是否包含数值的最佳方法是什么?

解决方案

有人知道为什么不分别用CellValues.Number或CellValues.Date初始化Cell.DataType吗?

此处中查看ECMA-376标准,Cell的(缩写)XSD看起来像这样:

<xsd:complexType name="CT_Cell">
    ...
    <xsd:attribute name="t" type="ST_CellType" use="optional" default="n"/>
    ...
</xsd:complexType>

该属性表示类型.请注意,它是可选的,默认值为"n".第18.18.11节ST_CellType(单元格类型)列出了该类型的有效值:

b-布尔值
d-日期
e-错误
inlineStr-内联字符串
n-数字(默认值)
s-共享字符串 str-公式字符串

您可以看到"n"代表number.

确定单元格是否包含数值的最佳方法是什么?

从上面看来,您可以检查是否为空的Cell.DataTypeCell.DataTypeCellValues.Number,以判断单元格是否包含数字,但并不是那么简单-最大的问题是日期. /p>

似乎日期的原始存储机制是使用数字并依靠样式来确定数字是否实际上是数字或数字是否代表日期.

令人困惑的是,规范已更新为包含Date类型,但并非所有日期都将使用日期类型. Date类型表示单元格包含ISO 8601格式的日期,但是对于将日期存储为具有正确样式的数字是完全有效的.下面的XML代码段示例以NumberDate格式显示相同的日期(2017年2月1日):

<sheetData>
    <row r="1" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A1" s="1">
            <v>42767</v>
        </c>
    </row>
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A2" s="1" t="d">
            <v>2017-02-01</v>
        </c>
    </row>
</sheetData>

在Excel中打开时,外观如下:

如果需要区分日期和数字,则需要查找任何数字(空Cell.DataTypeCell.DataTypeCellValues.Number),然后检查这些单元格的样式以确保它们是数字而不是数字日期伪装成数字.

I am busy developing a component which imports data from a MS Excel (2016) file. This component uses the MS OpenXML SDK2.5 library. The end-users installation of MS Excel is based on Dutch country / region settings. The file contains, among others, a column with financial data (numeric). The position of this column is not known in advance.

To determine if a cell contains numeric data I evaluate the property Cell.DataType (of type CellValues, which is an enum). At first it seems that this property is the perfect candidate to determine this. Possible values of CellValues are: Boolean, Number, Error, SharedString, String, InlineString or Date. So I would expect that Cell.DataType is set to CellValues.Number. After some debugging I found out that Cell.DataType is null when the cell contains numeric data.

While searching on internet to find an explanation I found the following MSDN article: https://msdn.microsoft.com/en-us/library/office/hh298534.aspx

The article describes exactly what I found during debugging:

The Cell type provides a DataType property that indicates the type of the data within the cell. The value of the DataType property is null for numeric and date types.

Does anybody know why Cell.DataType is not initialized with respectively CellValues.Number or CellValues.Date?

What is the best way to determine if a cell contains a numeric value?

解决方案

Does anybody know why Cell.DataType is not initialized with respectively CellValues.Number or CellValues.Date?

Looking at the ECMA-376 standard from here, the (abbreviated) XSD for a Cell looks like this:

<xsd:complexType name="CT_Cell">
    ...
    <xsd:attribute name="t" type="ST_CellType" use="optional" default="n"/>
    ...
</xsd:complexType>

That attribute represents the type. Note that it is optional with a default value of "n". Section 18.18.11 ST_CellType (Cell Type) lists the valid values for the type which are:

b - boolean
d - date
e - error
inlineStr - an inline string
n - number (the default)
s - a shared string str - a formula string

You can see that "n" represents a number.

What is the best way to determine if a cell contains a numeric value?

It would seem from the above that you could check for a null Cell.DataType or a Cell.DataType of CellValues.Number to tell if a cell contains a number but it's not quite that simple - the big problem is dates.

It would seem that the original storage mechanism for dates was to use a number and rely on the style to know whether or not the number is actually a number or if the number represents a date.

Confusingly, the spec has been updated to include the Date type but not all dates will use the date type. The Date type means the cell contains a date in ISO 8601 format but it's perfectly valid for a date to be stored as a number with the correct style. The following XML snippet for example shows the same date (1st Feb 2017) in both Number and Date format:

<sheetData>
    <row r="1" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A1" s="1">
            <v>42767</v>
        </c>
    </row>
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
        <c r="A2" s="1" t="d">
            <v>2017-02-01</v>
        </c>
    </row>
</sheetData>

Which looks like this when opened in Excel:

If you need to differentiate between dates and numbers then you will need to find any numbers (null Cell.DataType or a Cell.DataType of CellValues.Number) and then check the style of those cells to ensure they are numbers and not dates disguised as numbers.

这篇关于OpenXML SDK2.5(Excel):如何确定单元格是否包含数字值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆