C#OPEN XML:从EXCEL到DATATABLE获取数据时,将跳过空单元格 [英] C# OPEN XML: empty cells are getting skipped while getting data from EXCEL to DATATABLE

查看:431
本文介绍了C#OPEN XML:从EXCEL到DATATABLE获取数据时,将跳过空单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务

将数据从excel导入到DataTable

问题

不包含任何数据的单元格将被跳过,并且该行中具有数据的下一个单元格将用作空列的值. 例如

The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum. E.g

A1 为空 A2 的值为Tom,然后在导入数据A1时获得 A2 A2 保持空白

A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty

为了清楚起见,我在下面提供了一些屏幕截图

To make it very clear I am providing some screen shots below

这是Excel数据

这是从excel导入数据后的数据表

代码

public class ImportExcelOpenXml
{
    public static DataTable Fill_dataTable(string fileName)
    {
        DataTable dt = new DataTable();

        using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
        {

            WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
            IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
            string relationshipId = sheets.First().Id.Value;
            WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
            Worksheet workSheet = worksheetPart.Worksheet;
            SheetData sheetData = workSheet.GetFirstChild<SheetData>();
            IEnumerable<Row> rows = sheetData.Descendants<Row>();

            foreach (Cell cell in rows.ElementAt(0))
            {
                dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
            }

            foreach (Row row in rows) //this will also include your header row...
            {
                DataRow tempRow = dt.NewRow();

                for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                {
                    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
                }

                dt.Rows.Add(tempRow);
            }

        }

        dt.Rows.RemoveAt(0); //...so i'm taking it out here.

        return dt;
    }


    public static string GetCellValue(SpreadsheetDocument document, Cell cell)
    {
        SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
        string value = cell.CellValue.InnerXml;

        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
        {
            return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
        }
        else
        {
            return value;
        }
    }
}

我的想法

我认为这有问题

public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;

如果我想使用Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();

OR

如果我使用Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`

在两种情况下Descendants都跳过空白单元格

In both cases Descendants is skipping the empty cells

Descendants是否有其他选择.

您的建议受到高度赞赏

PS:我还考虑过通过使用像 A1,A2 这样的列名来获取单元格的值,但是为此,我将必须获得确切的列数和行数,这不是通过使用Descendants函数可以实现.

P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.

推荐答案

如果一行的所有单元格中都有数据,则一切正常.一旦您连续有一个空单元格,事情就会变得一团糟.

Had there been data in all the cells of a row then everything works good. The moment you have even a single empty cell in a row then things go haywire.

为什么会首先出现?

这是因为在以下代码中:

This is because in the below code:

row.Descendants<Cell>().Count()

Count()非空填充的单元格的数目(不是所有列).因此,当您将row.Descendants<Cell>().ElementAt(i)作为参数传递给GetCellValue方法时:

The Count() is the number of non-empty populated cells (not all columns). So, when you pass row.Descendants<Cell>().ElementAt(i) as an argument to GetCellValue method:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

然后它将找到下一个非空填充单元的内容(不一定是该列索引i中的内容),例如如果第一列为空,而我们调用ElementAt(1),则它将返回第二列中的值,而整个逻辑将被弄乱.

Then it will find the content of the next non-empty populated cell (not necessarily what is in that column index, i) e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and the whole logic gets messed up.

解决方案-我们需要处理空单元格的发生情况:本质上,我们需要弄清楚单元格的原始列索引,以防万一之前有空单元格.因此,您需要按如下所示替换for循环代码:

Solution - We need to deal with the occurrence of empty cells: Essentially we need to figure out the original column index of the cell in case there were empty cells before it. So, you need to substitute your for loop code as below:

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
      tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}

with

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    Cell cell = row.Descendants<Cell>().ElementAt(i);
    int actualCellIndex = CellReferenceToIndex(cell);
    tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}

并在您的代码中添加以下方法,该方法在上面的修改后的代码段中用于获取任何单元格的原始/正确的列索引:

and add below method in your code which is used in the above modified code snippet to obtain the original/correct column index of any cell:

private static int CellReferenceToIndex(Cell cell)
{
    int index = 0;
    string reference = cell.CellReference.ToString().ToUpper();
    foreach (char ch in reference)
    {
        if (Char.IsLetter(ch))
        {
            int value = (int)ch - (int)'A';
            index = (index == 0) ? value : ((index + 1) * 26) + value;
        }
        else
        {
            return index;
        }
    }
    return index;
}

这篇关于C#OPEN XML:从EXCEL到DATATABLE获取数据时,将跳过空单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆