将包含混合encoging类型的SQL_ASCII中的PostgreSQL数据库转换为UTF-8 [英] Converting a Postgresql database from SQL_ASCII, containing mixed encoging types, to UTF-8
问题描述
我有一个postgresql数据库我想转换为UTF-8。
I have a postgresql database I would like to convert to UTF-8.
问题是它目前是SQL_ASCII,所以没有做任何类型的编码转换,并且因此已经结束了表中的编码类型的混合的数据。一行可能包含编码为UTF-8的值,另一行可能是ISO-8859-x或Windows-125x等。
The problem is that it is currently SQL_ASCII, so hasn't been doing any kind of encoding conversion on its input, and as such has ended up with data of a mix of encoding types in the tables. One row might contain values encoded as UTF-8, another might be ISO-8859-x, or Windows-125x, etc.
数据库,并将其转换为UTF-8,意图将其导入一个新的UTF-8数据库,很难。如果数据都是一种编码类型,我可以通过iconv运行转储文件,但我不认为这种方法在这里工作。
This has made performing a dump of the database, and converting it to UTF-8 with the intention of importing it into a fresh UTF-8 database, difficult. If the data were all of one encoding type, I could just run the dump file through iconv, but I don't think that approach works here.
问题是根本上下来知道每个数据是如何编码的?在这里,哪里不知道,可以解决,甚至猜测?理想情况下,我喜欢一个脚本,它会采取一个文件,任何文件,并吐出有效的UTF-8。
Is the problem fundamentally down to knowing how each data is encoded? Here, where that is not known, can it be worked out, or even guessed? Ideally I'd love a script which would take a file, any file, and spit out valid UTF-8.
推荐答案
是完全的问题, Encoding :: FixLatin 写入解决*。
This is exactly the problem that Encoding::FixLatin was written to solve*.
如果你安装了Perl模块,那么你还会得到 fix_latin
命令行实用程序,你可以这样使用:
If you install the Perl module then you'll also get the fix_latin
command-line utility which you can use like this:
pg_restore -O dump_file | fix_latin | psql -d database
读取 Limitations 'section of the documentation to understand how it works。
Read of the 'Limitations' section of the documentation to understand how it works.
*]注意我假设当你说ISO-8859-x的意思是ISO-8859-1,当你说CP125x的意思是CP1252 - 因为ASCII,UTF-8,拉丁-1和WinLatin-1的混合是常见的情况。但如果你确实有东西方编码的混合,那么对不起,但你被拧了: - (
[*] Note I'm assuming that when you say ISO-8859-x you mean ISO-8859-1 and when you say CP125x you mean CP1252 - because the mix of ASCII, UTF-8, Latin-1 and WinLatin-1 is a common case. But if you really do have a mixture of eastern and western encodings then sorry but you're screwed :-(
这篇关于将包含混合encoging类型的SQL_ASCII中的PostgreSQL数据库转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!