将包含混合encoging类型的SQL_ASCII中的PostgreSQL数据库转换为UTF-8 [英] Converting a Postgresql database from SQL_ASCII, containing mixed encoging types, to UTF-8

查看:723
本文介绍了将包含混合encoging类型的SQL_ASCII中的PostgreSQL数据库转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个postgresql数据库我想转换为UTF-8。

I have a postgresql database I would like to convert to UTF-8.

问题是它目前是SQL_ASCII,所以没有做任何类型的编码转换,并且因此已经结束了表中的编码类型的混合的数据。一行可能包含编码为UTF-8的值,另一行可能是ISO-8859-x或Windows-125x等。

The problem is that it is currently SQL_ASCII, so hasn't been doing any kind of encoding conversion on its input, and as such has ended up with data of a mix of encoding types in the tables. One row might contain values encoded as UTF-8, another might be ISO-8859-x, or Windows-125x, etc.

数据库,并将其转换为UTF-8,意图将其导入一个新的UTF-8数据库,很难。如果数据都是一种编码类型,我可以通过iconv运行转储文件,但我不认为这种方法在这里工作。

This has made performing a dump of the database, and converting it to UTF-8 with the intention of importing it into a fresh UTF-8 database, difficult. If the data were all of one encoding type, I could just run the dump file through iconv, but I don't think that approach works here.

问题是根本上下来知道每个数据是如何编码的?在这里,哪里不知道,可以解决,甚至猜测?理想情况下,我喜欢一个脚本,它会采取一个文件,任何文件,并吐出有效的UTF-8。

Is the problem fundamentally down to knowing how each data is encoded? Here, where that is not known, can it be worked out, or even guessed? Ideally I'd love a script which would take a file, any file, and spit out valid UTF-8.

推荐答案

完全的问题, Encoding :: FixLatin 写入解决*。

This is exactly the problem that Encoding::FixLatin was written to solve*.

如果你安装了Perl模块,那么你还会得到 fix_latin 命令行实用程序,你可以这样使用:

If you install the Perl module then you'll also get the fix_latin command-line utility which you can use like this:

pg_restore -O dump_file | fix_latin | psql -d database

读取 Limitations 'section of the documentation to understand how it works。

Read of the 'Limitations' section of the documentation to understand how it works.

*]注意我假设当你说ISO-8859-x的意思是ISO-8859-1,当你说CP125x的意思是CP1252 - 因为ASCII,UTF-8,拉丁-1和WinLatin-1的混合是常见的情况。但如果你确实有东西方编码的混合,那么对不起,但你被拧了: - (

[*] Note I'm assuming that when you say ISO-8859-x you mean ISO-8859-1 and when you say CP125x you mean CP1252 - because the mix of ASCII, UTF-8, Latin-1 and WinLatin-1 is a common case. But if you really do have a mixture of eastern and western encodings then sorry but you're screwed :-(

这篇关于将包含混合encoging类型的SQL_ASCII中的PostgreSQL数据库转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆