以与语言环境无关的方式读取Excel文件 [英] Reading Excel files in a locale independent way

查看:71
本文介绍了以与语言环境无关的方式读取Excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码从各种Excel文件中读取数据:

I am using the following code to read data from various Excel files:

    // IMEX=1 - to force strings on mixed data
    // HDR=NO - to process all the available data
    // Locale 1033 is en-US. This was my first attempt to force en-US locale.
    string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Locale Identifier=1033;Extended Properties=\"{1};READONLY=TRUE;HDR=NO;IMEX=1;\"";

    // source type according to the
    // http://www.microsoft.com/en-us/download/details.aspx?id=13255

    // try determining from extension
    bool isOldFormat =
            Path.GetExtension(sourceExcel).Equals(".xls", StringComparison.OrdinalIgnoreCase);

    bool isBinary =
            Path.GetExtension(sourceExcel).Equals(".xlsb", StringComparison.OrdinalIgnoreCase);

    string sourceType = isOldFormat ? "Excel 8.0" : "Excel 12.0";

    if (!isOldFormat)
        sourceType += " Xml"; // for some reason the new binary xlsb files also need Xml

    connectionString = string.Format(connectionString, sourceExcel, sourceType);

    // this was my second attempt to force Excel to use US culture
    var oldCulture = Thread.CurrentThread.CurrentCulture;
    Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");

    var dt = new DataTable();
    try
    {
        using (var con = new OleDbConnection(connectionString))
        {
            con.Open();

            // get all the available sheets
            using (DataTable dataSet = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null))
            {
                // this was my third attempt to force Excel to use US culture
                dataSet.Locale = CultureInfo.CreateSpecificCulture("en-US");
                // get the sheet name in the file (will throw if out of range)
                string workSheetName = dataSet.Rows[worksheetIndex]["TABLE_NAME"].ToString();//.Trim(new[] { '$' }).Replace("'", "");

                string sql = String.Format("select * from [{0}]", workSheetName);

                var da = new OleDbDataAdapter(sql, con);
                // this was my fourth attempt to force Excel to use US culture
                dt.Locale = CultureInfo.CreateSpecificCulture("en-US");
                da.Fill(dt);
            }

            con.Close();
        }

如您所见,我非常绝望,试图强制Excel在导入数据时使用en-US兼容的语言环境.我之所以需要这样做,是因为我的代码可能在具有各种语言环境的服务器上执行,但是数据需要进行一些额外的处理,这些数据假定传入的数据是美国/中性语言环境.

As you see, I was pretty desperate, trying to force Excel to use en-US compatible locale when importing data. I need this because my code might be executed on servers with various locales, but the data needs some additional processing which assumes that the incoming data is en-US/neutral locale.

我也尝试了CultureInfo.InvariantCulture而不是CultureInfo.CreateSpecificCulture("en-US").

无论我如何尝试,当服务器语言环境设置为使用.作为千位分隔符和,作为十进制分隔符的其他语言环境时,我的dt DataTable都会得到错误的结果.

No matter how I try, when the server locale is set to some other locale which uses . as thousands separator and , as decimal separator, I get wrong results in my dt DataTable.

要比较货币值的结果-£200000.00 :

To compare the result for a currency value -£200000.00 :

当服务器区域设置与美国区域设置相对应时,我得到"-£200,000.00"

When the server regional settings correspond to US locale, I get "-£200,000.00"

当服务器区域设置与拉脱维亚语言环境相对应时,我得到"-£200 000,00"

When the server regional settings correspond to Latvian locale, I get "-£200 000,00"

我什至不能使用Thread.CurrentThread.CurrentCulture中的当前数字分隔符对数据进行后处理,因为OleDb似乎完全忽略了它.

I cannot even post-process the data using the current numeric separators from Thread.CurrentThread.CurrentCulture, because OleDb seems to ignore it completely.

OleDb从哪里获得当前的文化?我如何告诉OleDbConnection或Microsoft.ACE.OLEDB.12.0提供程序我需要根据en-USInvariant区域性格式化的数据?

Where does OleDb get the current culture from? Ho do I tell the OleDbConnection or Microsoft.ACE.OLEDB.12.0 provider that I need the data formatted according to en-US or Invariant culture?

推荐答案

经过多次试验和错误并且阅读了过时的文章之后 http://support.microsoft.com/default.aspx?scid = KB; EN-US; Q320744 我发现默认情况下,当前版本的OLEDB似乎正在使用HKEY_CURRENT_USER\Control Panel\International中的区域性.不幸的是,我没有找到如何从C#代码中调用SetVarConversionLocaleSetting函数来强制OLEDB使用当前的线程文化,因此我遵循了原理-如果无法为我的代码调整OLEDB,那么我将调整与OLEDB文化兼容的代码.完成后,我可以将所有数据转换为不变文化.

After lots of trials and errors and after reading this outdated article http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q320744 I found that the current version of OLEDB by default seems to be using the culture from HKEY_CURRENT_USER\Control Panel\International. Unfortunately, I did not found how to call the SetVarConversionLocaleSetting function from my C# code to force OLEDB to use the current thread culture, so I went with the principle - if I cannot adjust OLEDB for my code, then I'll adjust my code to be compatible with OLEDB culture. And after I am done with it, I can convert all the data to the invariant culture.

但是有一个棘手的部分.您不能仅从HKEY_CURRENT_USER\Control Panel\International中获取小数点分隔符,因为OLEDB会忽略用户自定义的数字格式设置. OLEDB仅采用该区域性的默认预设值.因此,我必须执行以下操作:

But there is a tricky part. You cannot just grab the decimal separator from HKEY_CURRENT_USER\Control Panel\International, because OLEDB ignores user customized settings for number formats. OLEDB takes only the default preset values for that culture. So I had to do the following:

var oldCulture = Thread.CurrentThread.CurrentCulture;

using (RegistryKey international = 
        Registry.CurrentUser.OpenSubKey("Control Panel\\International", false))
{
    string userDefaultCulture = international.GetValue("LocaleName").ToString();
    // notice: although the user might have customized his decimal/thousand separators,
    // still OLEDB ignores these customizations. That is why I create a culture with default settings.
    cultureToNormalize = new CultureInfo(userDefaultCulture, false);
}

// force both OLEDB and current thread cultures to match for the next ToString() etc. conversions in my function
Thread.CurrentThread.CurrentCulture = cultureToNormalize;

string decSep = cultureToNormalize.NumberFormat.NumberDecimalSeparator;
string groupSep = cultureToNormalize.NumberFormat.NumberGroupSeparator;

现在,我可以根据需要处理数据,也可以安全地调用ToString()-OLEDB和.NET字符串化数字和货币的区域性将匹配.而且,作为一个好男孩,我在工作结束时恢复了以前的文化.

and now I can process the data as needed, and also I can call ToString() safely - the cultures of OLEDB and .NET stringified numbers and currencies will match. And, to be a good boy, I restore the previous culture at the end of my function.

如果有人有更好的解决方案,我将非常感激.但是现在我将保持原样-我的所有单元测试现在都是绿色的.

If anybody has a better solution, I'll be really grateful. But for now I'll keep it as is - all my unit tests are green now.

这篇关于以与语言环境无关的方式读取Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆