编码问题的dBase III为.dbf在不同机器上的文件 [英] Encoding problems with dBase III .dbf files on different machines

查看:616
本文介绍了编码问题的dBase III为.dbf在不同机器上的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用C#和.NET 3.5,试图导入使用ODBC与Microsoft dBase驱动程序旧的DBF文件的一些数据。

I'm using C# and .NET 3.5, trying to import some data from old dbf files using ODBC with Microsoft dBase Driver.

该DBF的是的dBase III格式,并使用ibm850编码字符串。

The dbf's are in dBase III format and using ibm850 encoding for strings.

现在,当我在我的机器上运行我的程序,从OdbcDataReader读取所有的字符串数据弄出来转换为UTF-16或UTF-8或什么的,IDK的,我把它保存为UTF-8,一切正常,但是当我尝试在XP中使用这个程序,某些字符不能正确转换为UTF-8。 O为例。可能有一些人了。像A,O和U字符都OK。这就是问题所在。 也许ODBC或驱动程序使用了一些机文化信息或东西弄乱一切。

Now, when I run my program on my machine, all string data read from OdbcDataReader comes out converted to UTF-16 or UTF-8 or something, idk and I save it as UTF-8 and everything is ok, but when I try to use this program on an XP box, some characters aren't converted correctly to UTF-8. 'Õ' for example. There may be some others too. Characters like 'Ä', 'Ö' and 'Ü' are ok. This is the problem. Maybe the ODBC or the driver uses some machine culture info or something to mess everything up.

是否有可能从数据库中读取的二进制字符串?也许就像CONVERT或CAST某些功能?或者我能找到的SQL函数和语法,适用于该dBase的驱动程序或其他司机一些参考?我周围中搜索,但没有找到任何东西。我觉得如此盲目使用ODBC和SQL的时候。

Is it possible to read strings from the database as binary? Maybe some functions like CONVERT or CAST? Or where could I find some references for SQL functions and syntax which works for this dBase driver or other drivers? I searched around and couldn't find anything. I feel so blind when using ODBC and SQL.

现在,我使用的是临时的黑客工具,替换所有σ同O的。

Right now I'm using a temporary hack that replaces all σ's with Õ's.

谢谢!

例如code:

System.Data.Odbc.OdbcConnection oConn = new System.Data.Odbc.OdbcConnection();
oConn.ConnectionString = @"Driver={Microsoft dBase Driver (*.dbf)};DriverID=277;Dbq=" + dbPath + ";";
oConn.Open();

System.Data.Odbc.OdbcCommand oCmd = oConn.CreateCommand();
oCmd.CommandText = @"SELECT name FROM " + dbPath + "TABLE.DBF";

System.Data.Odbc.OdbcDataReader reader = oCmd.ExecuteReader();
reader.Read();

byte[] buf = Encoding.UTF8.GetBytes(reader.GetString(0));
BinaryWriter writer = new BinaryWriter(File.Open(@"C:\DBF\Test.txt", FileMode.Create));
writer.Write(buf);

结果:

E5在DBF(o在850)

E5 in dbf (Õ in 850)

Test.txt的在PC1:C3 95(o在UTF-8)

Test.txt on pc1: C3 95 (Õ in UTF-8)

Test.txt的PC2上:CF 83(σ为UTF-8)

Test.txt on pc2: CF 83 (σ in UTF-8)

推荐答案

如果您仍然有这些文件的问题,我或许可以帮你。

If you are still having a problem with these files, I may be able to help you.

什么是在codePAGE字节又名语言驱动程序ID(LDID)的偏移29(十进制)的文件?

What is in the "codepage byte" aka "language driver id" (LDID) at offset 29 (decimal) in the file?

我有一个基于Python的DBF阅读器,可以阅读几乎任何字段的数据类型,几乎所有的codePAGE - 它有一个长长的清单从映射从codePAGE字节各种渠道$ C编译$ CPAGE号码。选项​​是:(1)相信LDID,提供统一code(2)忽略LDID,提供理解过程codeD字节(3)覆盖LDID,德code与特定的codePAGE成统一code。当然,统一code可以再连接codeD转换为UTF-8。

I have a Python-based DBF reader which can read just about any field data type and just about any codepage -- it has a long list compiled from various sources of mappings from codepage byte to codepage number. Options are (1) believe the LDID, deliver Unicode (2) ignore the LDID, deliver undecoded bytes (3) override the LDID, decode with a specific codepage into Unicode. The Unicode can of course be then encoded into UTF-8.

该DBF读者也做了一大堆的合理性进行交叉检查这可以帮助调查为什么VFP认为文件已损坏。

The DBF reader also does a whole lot of reasonableness cross-checks which may help investigating why VFP thinks the file is corrupt.

你怎么知道它使用IBM850?另一块Python的code,我已经是一个原型的编码器,它不像就像它们是从Mozilla的code派生的chardet探测器并不是以网络为中心,可以愉快地承认最老的DOS codepages - - 这可能有助于

How do you know that it's using IBM850? Another piece of Python code that I have is a prototype encoding detector, which unlike detectors like 'chardet' which are derived from Mozilla code is not web-centric and can happily recognise most old DOS codepages -- this may help.

一个观察:希腊字母小写西格玛(σ)是在为0xE5 codePAGE 437,这是succeded由codePAGE 850 - PC2似乎有点过时了......

A observation: the Greek letter lowercase sigma (σ) is 0xE5 in codepage 437, which was succeded by codepage 850 -- "pc2" seems a little outdated ...

如果你认为我可以成为任何帮助,请随时到e-mail给我在insert_punctuation(sjmachin,词典,网)

If you think I can be of any help, feel free to e-mail me at insert_punctuation("sjmachin", "lexicon", "net")

这篇关于编码问题的dBase III为.dbf在不同机器上的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆