如何将带有中文字符的EBCDIC转换为UTF-8格式 [英] How to convert EBCDIC with chinese chars to UTF-8 format

查看:946
本文介绍了如何将带有中文字符的EBCDIC转换为UTF-8格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将使用IBM937代码页编码的EBCDIC编码的文件转换为UTF-8格式,以便将该文件加载到启用多字节的DB2数据库中.

I have a requirement to convert a file with EBCDIC encoding which is encoded using the IBM937 code page to UTF-8 format for loading the file into a multi-byte enabled DB2 database.

我尝试了unix重新编码和iconv.它们都不具有将IBM 937转换为UTF8的能力.我正在寻找这个世界上可以在基于unix的系统上做到的任何实用程序(java,perl,unix).有人可以帮我吗?

I have tried unix recode and iconv. None of them has the ability to convert IBM 937 to UTF8. I'm looking for any utility (java, perl, unix ) in this world which can do that on a unix based system. Can someone help me here?

SL

推荐答案

看看ICU(Unicode国际组件):

Take a look at ICU (International Components for Unicode): http://site.icu-project.org/

它具有适用于IBM-937的转换器:

It has a converter for IBM-937: http://demo.icu-project.org/icu-bin/convexp?conv=ibm-937_P110-1999&s=ALL

CU是一套成熟且广泛使用的 C/C ++和Java库提供 Unicode和全球化支持 软件应用程序.重症监护病房广泛 可移植的,并为应用程序提供 在所有平台上的结果相同,并且 在C/C ++和Java软件之间.重症监护病房 在非限制性条件下被释放 合适的开源许可证 与两个商业软件一起使用 并与其他开源软件或免费软件一起使用 软件.

CU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software. ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.

以下是 ICU提供的服务:

Here are a few highlights of the services provided by ICU:

  • 代码页转换:转换文本 到Unicode以及从Unicode到任何其他数据的数据 其他字符集或编码.重症监护病房 转换表基于字符集 IBM在整个课程中收集的数据 是数十年来最 可以在任何地方完成.

  • Code Page Conversion: Convert text data to or from Unicode and nearly any other character set or encoding. ICU's conversion tables are based on charset data collected by IBM over the course of many decades, and is the most complete available anywhere.

整理:根据以下内容比较字符串 遵守公约的约定和标准 特定语言,地区或 国家. ICU的归类基于 Unicode排序算法加 来自的特定于语言环境的比较规则 通用语言环境数据存储库 这种类型的综合资料 数据.

Collation: Compare strings according to the conventions and standards of a particular language, region or country. ICU's collation is based on the Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository, a comprehensive source for this type of data.

格式化:格式化数字,日期, 时间和货币金额 所选语言环境的约定. 这包括翻译月份和 日期名称转换为所选语言, 选择适当的缩写, 正确订购字段等. 数据也来自通用语言环境 数据存储库.

Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.

时间计算:多种类型的 日历提供的范围超出了 传统的公历.一种 完整的时区计算集 提供了API.

Time Calculations: Multiple types of calendars are provided beyond the traditional Gregorian calendar. A thorough set of timezone calculation APIs are provided.

Unicode支持:ICU密切跟踪 Unicode标准,提供了简单的方法 访问所有许多Unicode 字符属性,Unicode 规范化,案例折叠等 由指定的基本操作 Unicode标准.

Unicode Support: ICU closely tracks the Unicode standard, providing easy access to all of the many Unicode character properties, Unicode Normalization, Case Folding and other fundamental operations as specified by the Unicode Standard.

正则表达式:ICU的正则表达式 表达式完全支持Unicode 同时提供非常有竞争力的 性能.

Regular Expression: ICU's regular expressions fully support Unicode while providing very competitive performance.

Bidi:支持处理文本 包含从左到右的混合 (英语)和从右到左(阿拉伯语或 希伯来语)数据.

Bidi: support for handling text containing a mixture of left to right (English) and right to left (Arabic or Hebrew) data.

文本边界:找到位置 内的单词,句子,段落 一定范围的文字或位置 那将适合生产线 显示文字时自动换行.

Text Boundaries: Locate the positions of words, sentences, paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.

还有更多.有关详细信息,请参阅《 ICU用户指南》.

And much more. Refer to the ICU User Guide for details.

这篇关于如何将带有中文字符的EBCDIC转换为UTF-8格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆