阅读Excel Open XML是忽略空白单元格 [英] reading Excel Open XML is ignoring blank cells

查看:180
本文介绍了阅读Excel Open XML是忽略空白单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用接受的解决方案将excel表转换为数据表。如果我有完美的数据,这个工作正常,但是如果我的数据中间有一个空白的单元格,似乎把错误的数据放在每一列中。

I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.

认为这是因为在以下代码中:

I think this is because in the below code:

row.Descendants<Cell>().Count()

是填充单元格数(不是所有列)AND:

is number of populated cells (not all columns) AND:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

似乎找到下一个填充的单元格(不一定是该索引中的内容),所以如果第一列是空的,我调用ElementAt(0),它返回第二列中的值。

seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.

这是完整的解析代码。

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}


推荐答案

有意义的是,Excel不会存储一个为null的单元格的值。如果您使用Open XML SDK 2.0生产效率工具打开文件,并将XML遍历单元格级别,您将看到只有具有数据的单元格将在该文件中。

This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.

您的选项是在您要遍历的单元格范围内插入空白数据,或以编程方式计算出单元格被跳过并适当调整索引。

Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.

我使用单元格参考A1和C1中的字符串创建了一个excel文档示例。然后我在Open XML Productivity Tool中打开了excel文档,这里是存储的XML:

I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:

<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

这里您将看到数据对应于第一行,只有两个单元格的数据是为该行保存。保存的数据对应于A1和C1,并且不保存空值的单元格。

Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.

要获得所需的功能,您可以按照上述方式遍历单元格,但您需要检查单元格引用的值,并确定如果任何细胞已被跳过。要做到这一点,您将需要两个实用程序函数从单元格引用中获取列名称,然后将该列名称转换为零索引:

To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后,您可以迭代单元格,并检查单元格引用是什么与columnIndex相比。如果它小于那么你添加空值数据到您的tempRow,否则只读入单元格中包含的值。 (注意:我没有测试下面的代码,但一般想法应该有帮助):

Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

这篇关于阅读Excel Open XML是忽略空白单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆