EPPlus 将 200 万行的 200 多列数据表保存到多个 Excel 文件中 [英] EPPlus save two million rows with 200+ columns datatable to multiple excel files
问题描述
我有使用 EPPlus 将 SQL 表中的所有记录保存到 Excel 工作表的功能.如果我导出少量数据,一切正常,但如果有 200 多列和 500 000 多行,则会出现 OutOfMemory 异常.
I have function that saves all records from SQL table to excel worksheet using EPPlus. If I export small amount of data everything works fine, but with 200+ columns and 500 000+ rows I get OutOfMemory exception.
我想修改我的代码,使每个文件能够保存 50 000 条记录.
I'd like to modify my code in a way to be able to save 50 000 records per file.
这是我的适用于小数据的代码:
Here is my code that works for small data:
private Task SaveAsync(string tableName)
{
return Task.Run(() =>
{
try
{
using (var conn = new SqlConnection(_connectionString))
{
using (var cmd = new SqlCommand(string.Format(DataQuery, tableName), conn))
{
cmd.CommandType = CommandType.Text;
cmd.CommandTimeout = 360;
conn.Open();
using (SqlDataReader sdr = cmd.ExecuteReader())
{
var fileName = string.Format(TargetFile, tableName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
sdr.Read();
var numberOfRecordsInTable = sdr.GetInt32(0);
sdr.NextResult();
using (ExcelPackage pck = new ExcelPackage(new FileInfo(fileName)))
{
ExcelWorksheet ws = pck.Workbook.Worksheets.Add("Results");
int count = sdr.FieldCount;
int col = 1, row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
while (sdr.Read())
{
for (int i = 0; i < count; i++)
{
var val = sdr.GetValue(i);
ws.SetValue(row, col++, val);
}
row++;
col = 1;
}
//autosize
ws.Cells[ws.Dimension.Address].AutoFitColumns();
//autofiltr
ws.Cells[1, 1, 1, count].AutoFilter = true;
}
}
conn.Close();
}
}
}
catch (Exception e)
{
Debug.WriteLine("Error at: " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine(e);
}
});
}
和我修改后的代码,每个文件拆分 50 000 条记录:
and my modified code that splits records 50 000 per file:
private Task SaveAsync2(string tableName)
{
return Task.Run(() =>
{
try
{
using (var conn = new SqlConnection(_connectionString))
{
using (var cmd = new SqlCommand(string.Format(DataQuery, tableName), conn))
{
cmd.CommandType = CommandType.Text;
cmd.CommandTimeout = 360;
conn.Open();
using (SqlDataReader sdr = cmd.ExecuteReader())
{
var fileName = string.Format(TargetFile, tableName,"");
if (File.Exists(fileName))
{
File.Delete(fileName);
}
sdr.Read();
var max = sdr.GetInt32(0);
int filesCount = 1;
if (max > 50000)
{
fileName = string.Format(TargetFile, tableName, filesCount);
}
sdr.NextResult();
ExcelPackage pck = new ExcelPackage(new FileInfo(fileName));
ExcelWorksheet ws = pck.Workbook.Worksheets.Add("RESULTS");
int count = sdr.FieldCount;
int col = 1, row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
while (sdr.Read())
{
for (int i = 0; i < count; i++)
{
var val = sdr.GetValue(i);
ws.SetValue(row, col++, val);
}
row++;
col = 1;
if (row > 50000)
{
pck.Save();
filesCount++;
fileName = string.Format(TargetFile, tableName, filesCount);
pck = new ExcelPackage(new FileInfo(fileName));
ws = pck.Workbook.Worksheets.Add("RESULTS");
count = sdr.FieldCount;
col = 1;
row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
}
}
//autosize
ws.Cells[ws.Dimension.Address].AutoFitColumns();
//autofiltr
ws.Cells[1, 1, 1, count].AutoFilter = true;
pck.Save();
}
}
conn.Close();
}
}
catch (Exception e)
{
Debug.WriteLine("Error at: " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine(e);
}
});
}
基本上这很好用,但是在我的代码的第一个版本中,我使用了 using
语句中的所有内容,而在第二个版本中,我调用了两次相同的代码.
basically this works fine, but in first version of my code I was using everything inside using
statement, when in second version I'm calling same code twice.
- 如何修复我的代码以删除重复代码并将所有内容放入 using 中.
- 我可以添加下一组(50 000 条记录)作为新工作表而不是创建新文件吗?
- 将数据保存到文件时 EPPlus 的限制是什么?
行 x 列
?我发现 EPPlus 应该处理超过一百万行的信息,但没有我拥有的那么多列.我认为我可以用单列导出百万行,但对于 200 多列对我来说 50 000 行是限制.我想知道是否有数量(行 x 列)将限制我的导出工作正常.我希望导出功能是通用的,因此当我传递 50 列的数据表时,它将导出例如每个文件 100 000 行,对于 2 列,它将导出每个文件 50 万行.
- How can I fix my code to remove duplicate code and put everything inside using.
- Can I add next set (50 000 records) as new worksheet instead of creating new file?
- What would be EPPlus limit when saving data to file?
rows x columns
? I found information that EPPlus should handle more than million rows, but not so much columns as I have. I thinks that I can export million rows with single column, but for 200+ columns for me 50 000 rows is limit. I'm wondering if there is number (rows x columns) that will be limit to which my export will work fine. I want that export function to be universal, so when I pass datatable with 50 columns it will export for example 100 000 rows per file and for 2 columns it will export half million per file.
推荐答案
不幸的是,用 Epplus 将这么多数据合并到一个文件中并不容易.基本上,整个文件在打开时都会加载到内存中——要么全部要么全无.理论上,您可以生成 XLSX 包含的 XML 文件(它们是重命名的 zip 文件)并手动插入它,因为它会占用较小的内存,但这不是一件小事.
Unfortunately, there is no easy way to merge that much data with Epplus in a single file. Basically, the entire file is loaded into memory when open - its either all or nothing. In theory, you could generate the XML files that XLSX contains (they are zip files renamed) and manually insert it since it would have a smaller memory footprint but that is no small feat.
对于您当前的代码,如果您想避免使用 using 语句,您总是可以手动调用 .dispose()
.但我知道您想避免重复代码.这样的事情怎么样(但在复制所有对象数据时注意内存使用情况):
For you current code, you could always just call .dispose()
manually if you want to avoid the using statement. But I understand you wanting to avoid duplicate code. What about something like this (but watch for memory usage when copying all the object data):
const int max = 10;
var loop = 0;
using (var sdr = cmd.ExecuteReader())
{
var fieldcount = sdr.FieldCount;
var getfi = new Func<int, FileInfo>(i =>
{
var fi = new FileInfo(String.Format(@"c: empMulti_Files{0}.xlsx", i));
if (fi.Exists) fi.Delete();
return fi;
});
var savefile = new Action<FileInfo, List<Object[]>>((info, rows) =>
{
using (var pck = new ExcelPackage(info))
{
var wb = pck.Workbook;
var ws = wb.Worksheets.Add("RESULTS");
for (var row = 0; row < rows.Count; row++)
for (var col = 0; col < fieldcount; col++)
ws.SetValue(row + 1, col + 1, rows[row][col]);
pck.Save();
}
});
var rowlist = new List<Object[]>();
while (sdr.Read())
{
var rowdata = new Object[sdr.FieldCount];
sdr.GetValues(rowdata);
rowlist.Add(rowdata);
if (rowlist.Count == max)
{
savefile(getfi(++loop), rowlist);
rowlist.Clear();
}
}
if (rowlist.Count > 0)
savefile(getfi(++loop), rowlist);
}
这篇关于EPPlus 将 200 万行的 200 多列数据表保存到多个 Excel 文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!