C# OPEN XML:将数据从 EXCEL 获取到 DATATABLE 时会跳过空单元格 [英] C# OPEN XML: empty cells are getting skipped while getting data from EXCEL to DATATABLE

查看:55
本文介绍了C# OPEN XML:将数据从 EXCEL 获取到 DATATABLE 时会跳过空单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务

excel导入数据到DataTable

问题

不包含任何数据的单元格将被跳过,并且行中具有数据的下一个单元格用作空列的值.例如

The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum. E.g

A1 为空 A2 有一个值 Tom 然后在导入数据时 A1 获取 A1 的值strong>A2 和 A2 保持为空

A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty

为了清楚起见,我在下面提供了一些屏幕截图

To make it very clear I am providing some screen shots below

这是excel数据

这是从excel导入数据后的DataTable

代码

public class ImportExcelOpenXml
{
    public static DataTable Fill_dataTable(string fileName)
    {
        DataTable dt = new DataTable();

        using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
        {

            WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
            IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
            string relationshipId = sheets.First().Id.Value;
            WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
            Worksheet workSheet = worksheetPart.Worksheet;
            SheetData sheetData = workSheet.GetFirstChild<SheetData>();
            IEnumerable<Row> rows = sheetData.Descendants<Row>();

            foreach (Cell cell in rows.ElementAt(0))
            {
                dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
            }

            foreach (Row row in rows) //this will also include your header row...
            {
                DataRow tempRow = dt.NewRow();

                for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                {
                    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
                }

                dt.Rows.Add(tempRow);
            }

        }

        dt.Rows.RemoveAt(0); //...so i'm taking it out here.

        return dt;
    }


    public static string GetCellValue(SpreadsheetDocument document, Cell cell)
    {
        SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
        string value = cell.CellValue.InnerXml;

        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
        {
            return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
        }
        else
        {
            return value;
        }
    }
}

我的想法

我觉得有问题

公共 IEnumerable;后代(),其中T:OpenXmlElement;

如果我想要使用 Descendants 的列数

In case I want the count of columns using Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();

如果我使用 Descendants 获取行数

If I am getting the count of rows using Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`

在这两种情况下,Descendants 都在跳过空单元格

In both cases Descendants is skipping the empty cells

有没有 Descendants 的替代品.

非常感谢您的建议

PS:我还想通过使用列名(如 A1、A2)来获取单元格值,但为了做到这一点,我必须获得列和行的确切计数,而不是可以使用 Descendants 函数.

P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.

推荐答案

如果一行的所有单元格中都有数据,那么一切正常.当您连续有一个空单元格时,事情就会变得混乱.

Had there been data in all the cells of a row then everything works good. The moment you have even a single empty cell in a row then things go haywire.

为什么会首先发生?

这是因为在下面的代码中:

This is because in the below code:

row.Descendants<Cell>().Count()

Count()非空 填充的单元格(不是所有列)的数量.因此,当您将 row.Descendants().ElementAt(i) 作为参数传递给 GetCellValue 方法时:

The Count() is the number of non-empty populated cells (not all columns). So, when you pass row.Descendants<Cell>().ElementAt(i) as an argument to GetCellValue method:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

然后它将找到下一个非空填充单元格的内容(不一定是该列索引中的内容,i)例如如果第一列是空的并且我们调用 ElementAt(1),它会返回第二列中的值,整个逻辑就会混乱.

Then it will find the content of the next non-empty populated cell (not necessarily what is in that column index, i) e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and the whole logic gets messed up.

解决方案 - 我们需要处理空单元格的出现:本质上我们需要找出单元格的原始列索引,以防它之前有空单元格.因此,您需要将 for 循环代码替换如下:

Solution - We need to deal with the occurrence of empty cells: Essentially we need to figure out the original column index of the cell in case there were empty cells before it. So, you need to substitute your for loop code as below:

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
      tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    Cell cell = row.Descendants<Cell>().ElementAt(i);
    int actualCellIndex = CellReferenceToIndex(cell);
    tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}

并在您的代码中添加以下方法,该方法用于上述修改后的代码片段以获取任何单元格的原始/正确列索引:

and add below method in your code which is used in the above modified code snippet to obtain the original/correct column index of any cell:

private static int CellReferenceToIndex(Cell cell)
{
    int index = 0;
    string reference = cell.CellReference.ToString().ToUpper();
    foreach (char ch in reference)
    {
        if (Char.IsLetter(ch))
        {
            int value = (int)ch - (int)'A';
            index = (index == 0) ? value : ((index + 1) * 26) + value;
        }
        else
        {
            return index;
        }
    }
    return index;
}

这篇关于C# OPEN XML:将数据从 EXCEL 获取到 DATATABLE 时会跳过空单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆