读取 Excel Open XML 忽略空白单元格 [英] reading Excel Open XML is ignoring blank cells

查看:57
本文介绍了读取 Excel Open XML 忽略空白单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此处接受的解决方案将excel表转换为数据表.如果我有完美"的数据,这很好用,但如果我的数据中间有一个空白单元格,它似乎在每一列中放置了错误的数据.

I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.

我认为这是因为在下面的代码中:

I think this is because in the below code:

row.Descendants<Cell>().Count()

是填充单元格的数量(不是所有列)AND:

is number of populated cells (not all columns) AND:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

似乎找到下一个填充的单元格(不一定是该索引中的内容),因此如果第一列为空并且我调用 ElementAt(0),它会返回第二列中的值.

seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.

这里是完整的解析代码.

Here is the full parsing code.

DataRow tempRow = dt.NewRow();

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
        Console.Write(tempRow[i].ToString());
    }
}

推荐答案

这是有道理的,因为 Excel 不会为空单元格存储值.如果您使用 Open XML SDK 2.0 Productivity Tool 打开文件并将 XML 向下遍历到单元格级别,您将看到只有具有数据的单元格才会出现在该文件中.

This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.

您的选择是在您要遍历的单元格范围内插入空白数据,或者以编程方式找出跳过的单元格并适当调整索引.

Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.

我用单元格引用 A1 和 C1 中的字符串制作了一个示例 Excel 文档.然后我在 Open XML Productivity Tool 中打开了 excel 文档,这里是存储的 XML:

I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:

<x:row r="1" spans="1:3" 
   xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A1" t="s">
    <x:v>0</x:v>
  </x:c>
  <x:c r="C1" t="s">
    <x:v>1</x:v>
  </x:c>
</x:row>

在这里您将看到数据对应于第一行,并且该行只保存了两个单元格的数据.保存的数据对应于 A1 和 C1,并且没有保存空值的单元格.

Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.

为了获得您需要的功能,您可以像上面那样遍历单元格,但您需要检查单元格引用的值并确定是否跳过了任何单元格.为此,您需要两个实用程序函数来从单元格引用中获取列名,然后将该列名转换为基于零的索引:

To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };

    /// <summary>
    /// Given a cell name, parses the specified cell to get the column name.
    /// </summary>
    /// <param name="cellReference">Address of the cell (ie. B2)</param>
    /// <returns>Column Name (ie. B)</returns>
    public static string GetColumnName(string cellReference)
    {
        // Create a regular expression to match the column name portion of the cell name.
        Regex regex = new Regex("[A-Za-z]+");
        Match match = regex.Match(cellReference);

        return match.Value;
    }

    /// <summary>
    /// Given just the column name (no row index), it will return the zero based column index.
    /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
    /// A length of three can be implemented when needed.
    /// </summary>
    /// <param name="columnName">Column Name (ie. A or AB)</param>
    /// <returns>Zero based index if the conversion was successful; otherwise null</returns>
    public static int? GetColumnIndexFromName(string columnName)
    {
        int? columnIndex = null;

        string[] colLetters = Regex.Split(columnName, "([A-Z]+)");
        colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();

        if (colLetters.Count() <= 2)
        {
            int index = 0;
            foreach (string col in colLetters)
            {
                List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();
                int? indexValue = Letters.IndexOf(col1.ElementAt(index));

                if (indexValue != -1)
                {
                    // The first letter of a two digit column needs some extra calculations
                    if (index == 0 && colLetters.Count() == 2)
                    {
                        columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);
                    }
                    else
                    {
                        columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;
                    }
                }

                index++;
            }
        }

        return columnIndex;
    }

然后您可以遍历单元格并检查单元格引用与列索引进行比较的内容.如果小于,则将空白数据添加到 tempRow,否则只需读取单元格中包含的值.(注意:我没有测试下面的代码,但总体思路应该会有所帮助):

Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):

DataRow tempRow = dt.NewRow();

int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
   // Gets the column index of the cell with data
   int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));

   if (columnIndex < cellColumnIndex)
   {
      do
      {
         tempRow[columnIndex] = //Insert blank data here;
         columnIndex++;
      }
      while(columnIndex < cellColumnIndex);
    }
    tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);

    if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
    {
       Console.Write(tempRow[i].ToString());
    }
    columnIndex++;
}

这篇关于读取 Excel Open XML 忽略空白单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆