OdbcConnection 返回汉字为“?" [英] OdbcConnection returning Chinese Characters as "?"

查看:30
本文介绍了OdbcConnection 返回汉字为“?"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用简体中文存储一些数据值的 Oracle 数据库.我创建了一个应该显示此信息的 ASP.net MVC C# 网页.我正在使用 OdbcConnection 来检索数据,但是当我运行我的 da.Fill(t) 命令时,值返回为?"

I have an Oracle database that stores some data values in Simplified Chinese. I have created an ASP.net MVC C# webpage that is supposed to display this information. I am using a OdbcConnection in order to retrieve the data, however when I run my da.Fill(t) command the values return as "?"

        OdbcCommand cmd = new OdbcCommand();
        cmd.CommandText = select;

        OdbcConnection SqlConn = new OdbcConnection("Driver={Oracle in instantclient_11_2};Dbq=Database;Uid=Username;pwd=password;");
        DataTable t = new DataTable();
        cmd.Connection = SqlConn;

        SqlConn.Open();
        OdbcDataAdapter da = new OdbcDataAdapter(cmd);
        SqlConn.Close();
        da.Fill(t);
        return t;

t 有数据,但所有应该是汉字的东西都只是一串?????"

t has the data but everything that is supposed to be the Chinese characters is just a series of "?????"

推荐答案

字符集的问题比较常见,让我试着给出一些一般性的注释.

Problems with character set are quite common, let me try to give some general notes.

原则上您必须考虑四种不同的字符集设置.

In principle you have to consider four different character set settings.

示例:AL32UTF8

它们在您的数据库中定义,您可以使用

They are defined only on your database, you can interrogate them with

    SELECT * 
    FROM V$NLS_PARAMETERS 
    WHERE PARAMETER IN ('NLS_CHARACTERSET', 'NLS_NCHAR_CHARACTERSET');

这些设置定义了可以在您的数据库中存储哪些字符(以哪种格式)——不多也不少.这需要一些努力(请参阅字符集迁移和/或适用于 Unicode 的 Oracle 数据库迁移助手)(如果您必须在现有数据库上对其进行更改).

These settings define which characters (in which format) can be stored in your database - no more, no less. It requires some effort (see Character Set Migration and/or Oracle Database Migration Assistant for Unicode) if you have to change it on existing database.

示例:AMERICAN_AMERICA.AL32UTF8

此值在您的客户端上定义.NLS_LANG 与在数据库中存储字符的能力无关.它用于让 Oracle 知道您在客户端使用的字符集.当您设置 NLS_LANG 值(例如设置为 AL32UTF8)时,您只需告诉 Oracle 数据库我的客户端使用字符集 AL32UTF8"——这并不一定意味着您的客户端确实在使用 AL32UTF8!(见下文#4)

This value is defined only on your client. NLS_LANG has nothing to do with the ability to store characters in a database. It is used to let Oracle know what character set you are using on the client side. When you set NLS_LANG value (for example to AL32UTF8) then you just tell the Oracle database "my client uses character set AL32UTF8" - it does not necessarily mean that your client is really using AL32UTF8! (see below #4)

NLS_LANG 可以由环境变量 NLS_LANG 或 Windows 注册表在 HKLMSOFTWAREWow6432NodeORACLEKEY_%ORACLE_HOME_NAME%NLS_LANG 定义(对于 32 位),分别HKLMSOFTWAREORACLEKEY_%ORACLE_HOME_NAME%NLS_LANG(64 位).根据您的应用程序,可能有其他方法来指定 NLS_LANG,但让我们坚持基础知识.如果未提供 NLS_LANG 值,则 Oracle 将其默认为 AMERICAN_AMERICA.US7ASCII

NLS_LANG can be defined by environment variable NLS_LANG or by Windows Registry at HKLMSOFTWAREWow6432NodeORACLEKEY_%ORACLE_HOME_NAME%NLS_LANG (for 32 bit), resp. HKLMSOFTWAREORACLEKEY_%ORACLE_HOME_NAME%NLS_LANG (for 64 bit). Depending on your application there might be other ways to specify NLS_LANG, but let's stick to the basics. If NLS_LANG value is not provided then Oracle defaults it to AMERICAN_AMERICA.US7ASCII

NLS_LANG 的格式为 NLS_LANG=language_territory.charset.NLS_LANG 的 {charset} 部分显示在任何系统表或视图中.NLS_LANG 定义的所有组成部分都是可选的,因此以下定义都是有效的:NLS_LANG=.WE8ISO8859P1NLS_LANG=_GERMANYNLS_LANG=AMERICANNLS_LANG=ITALIAN_.WE8MSWIN1252NLS_LANG=_BELGIUM.US7ASCII.

Format of NLS_LANG is NLS_LANG=language_territory.charset. The {charset} part of NLS_LANG is not shown in any system table or view. All components of the NLS_LANG definition are optional, so following definitions are all valid: NLS_LANG=.WE8ISO8859P1, NLS_LANG=_GERMANY, NLS_LANG=AMERICAN, NLS_LANG=ITALIAN_.WE8MSWIN1252, NLS_LANG=_BELGIUM.US7ASCII.

如上所述,NLS_LANG 的 {charset} 部分在任何系统表/视图或任何函数的数据库中都不可用.严格来说这是对的,但是您可以运行此查询:

As stated above the {charset} part of NLS_LANG is not available in database at any system table/view or any function. Strictly speaking this is true, however you can run this query:

SELECT DISTINCT CLIENT_CHARSET
FROM V$SESSION_CONNECT_INFO
WHERE (SID, SERIAL#) = (SELECT SID, SERIAL# FROM v$SESSION WHERE AUDSID = USERENV('SESSIONID'));

它应该从您当前的 NLS_LANG 设置返回字符集 - 但是根据我的经验,该值通常为 NULL 或 Unknown,即不可靠.

It should return character set from your current NLS_LANG setting - however based on my experience the value is often NULL or Unknown, i.e. not reliable.

在此处查找更多非常有用的信息:NLS_LANG 常见问题解答

Find more very useful information here: NLS_LANG FAQ

注意,有些技术没有使用NLS_LANG,那里的设置没有任何作用,例如:

Note, some technologies do not utilize NLS_LANG, settings there do not have any effect, for example:

  • ODP.NET Managed Driver is not NLS_LANG sensitive. It is only .NET locale sensitive. (see Data Provider for .NET Developer's Guide)

OraOLEDB(来自 Oracle)始终使用 UTF-16(参见 OraOLEDB 提供程序特定功能)

OraOLEDB (from Oracle) always use UTF-16 (see OraOLEDB Provider Specific Features)

基于 Java 的 JDBC(例如 SQL Developer)有自己的方法来处理字符集(参见 数据库 JDBC 开发人员指南 - 全球化支持了解更多详情)

Java based JDBC (for example SQL Developer) has its own methods to deal with character sets (see Database JDBC Developer's Guide - Globalization Support for further details)

示例:UTF-8

如果您在 Windows 终端上工作(即使用 SQL*plus),您可以使用命令 chcp 查询代码页,在 Unix/Linux 上等效的是 locale charmapecho $LANG.您可以从此处获取所有 Windows 代码页标识符的列表:代码页标识符.请注意,对于 UTF-8 (chcp 65001) 存在一些问题,请参阅 此讨论.

If you work on a Windows terminal (i.e. with SQL*plus) you can interrogate the code page with command chcp, on Unix/Linux the equivalent is locale charmap or echo $LANG. You can get a list of all Windows code pages identifiers from here: Code Page Identifiers. Note, for UTF-8 (chcp 65001) there are some issues, see this discussion.

如果您使用 .sql 文件和像 TOAD 或 SQL-Developer 这样的编辑器,您必须检查保存选项.通常你可以选择UTF-8ANSIISO-8859-1等值.ANSI 表示 Windows ANSI 代码页,通常是 CP1252,您可以在 HKLMSYSTEMControlSet001ControlNlsCodePageACP 处检查您的注册表> 或此处:国家语言支持 (NLS) API 参考

If you work with .sql files and an editor like TOAD or SQL-Developer you have to check the save options. Usually you can choose values like UTF-8, ANSI, ISO-8859-1, etc. ANSI means the Windows ANSI codepage, typically CP1252, you can check in your Registry at HKLMSYSTEMControlSet001ControlNlsCodePageACP or here: National Language Support (NLS) API Reference

[Microsoft 删除了此引用,将其从网络存档中取出 国家语言支持 (NLS) API 参考]

最重要的一点是匹配 NLS_LANG 和你终端的真实"字符集,分别是.应用程序或 .sql 文件的编码

The most important point is to match NLS_LANG and your "real" character set of your terminal, resp. application or the encoding of your .sql files

一些常见的配对是:

  • CP850 -> WE8PC850

CP1252 或 ANSI(在西方"PC 的情况下)-> WE8MSWIN1252

CP1252 or ANSI (in case of "Western" PC) -> WE8MSWIN1252

ISO-8859-1 -> WE8ISO8859P1

ISO-8859-1 -> WE8ISO8859P1

ISO-8859-15 -> WE8ISO8859P15

ISO-8859-15 -> WE8ISO8859P15

UTF-8 -> AL32UTF8

或运行此查询以获取更多信息:

Or run this query to get some more:

SELECT VALUE AS ORACLE_CHARSET, UTL_I18N.MAP_CHARSET(VALUE) AS IANA_NAME
FROM V$NLS_VALID_VALUES
WHERE PARAMETER = 'CHARACTERSET';

有些技术让您的生活更轻松,例如来自 Oracle 的 ODP.NET(非托管驱动程序)或 ODBC 驱动程序自动从 NLS_LANG 值继承字符集,因此上述条件始终为真.

Some technologies make you life easier, e.g. ODP.NET (unmanged driver) or ODBC driver from Oracle automatically inherits the character set from NLS_LANG value, so condition from above is always true.

是否需要将客户端 NLS_LANG 值设置为等于数据库 NLS_CHARACTERSET 值?

Is it required to set client NLS_LANG value equal to database NLS_CHARACTERSET value?

不,不一定!例如,如果您有 database 字符集 NLS_CHARACTERSET=AL32UTF8client 字符集 NLS_LANG=.ZHS32GB18030那么它就可以正常工作(前提是您的客户端确实使用 GB18030),尽管这些字符集完全不同.GB18030是中文常用的字符集,如UTF-8它支持所有 Unicode 字符.

No, not necessarily! For example, if you have the database character set NLS_CHARACTERSET=AL32UTF8 and the client character set NLS_LANG=.ZHS32GB18030 then it will work without any problem (provided your client really uses GB18030), although these character sets are completely different. GB18030 is a character set commonly used for Chinese, like UTF-8 it supports all Unicode characters.

如果你有,例如 NLS_CHARACTERSET=AL32UTF8NLS_LANG=.WE8ISO8859P1 它也可以工作(同样,如果你的客户真的使用 ISO-8859-P1).但是,数据库可能会存储您的客户端无法显示的字符,而客户端将显示占位符(例如 ¿).

If you have, for example NLS_CHARACTERSET=AL32UTF8 and NLS_LANG=.WE8ISO8859P1 it will also work (again, provided your client really uses ISO-8859-P1). However, the database may store characters which your client is not able to display, instead the client will display a placeholder (e.g. ¿).

无论如何,如果合适,拥有匹配的 NLS_LANG 和 NLS_CHARACTERSET 值是有益的.如果它们相等,您可以确定任何可能存储在数据库中的字符也可以显示,并且您在终端中输入或写入 .sql 文件中的任何字符也可以存储在数据库中并且不会被占位符替换.

Anyway, it is beneficial to have matching NLS_LANG and NLS_CHARACTERSET values, if suitable. If they are equal you can be sure that any character which may be stored in database can also be displayed and any character you enter in your terminal or write in your .sql file can also be stored in database and is not substituted by placeholder.

很多次您可以阅读诸如NLS_LANG 字符集必须与您的数据库字符集相同"之类的建议(也在 SO 上).这根本不是真的,而是一个流行的神话!

So many times you can read advise like "The NLS_LANG character set must be the same as your database character set" (also here on SO). This is simply not true and a popular myth!

这是证明:

C:>set NLS_LANG=.AL32UTF8

C:>sqlplus ...

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  2  CharSet VARCHAR2(20);
  3  BEGIN
  4     SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
  5     DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
  6     IF UNISTR('20AC') = '€' THEN
  7             DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
  8     ELSE
  9             DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
 10     END IF;
 11  END;
 12  /

Database NLS_CHARACTERSET is AL32UTF8
"€" is not the same as U+20AC

PL/SQL procedure successfully completed.

客户端和数据库字符集都是AL32UTF8,但是字符不匹配.原因是,我的 cmd.exe 以及 SQL*Plus 都使用 Windows CP1252.因此我必须相应地设置 NLS_LANG:

Both, client and database character sets are AL32UTF8, however the characters do not match. The reason is, my cmd.exe and thus also SQL*Plus use Windows CP1252. Therefore I must set NLS_LANG accordingly:

C:>chcp
Active code page: 1252

C:>set NLS_LANG=.WE8MSWIN1252

C:>sqlplus ...

SQL> SET SERVEROUTPUT ON
SQL> DECLARE
  2  CharSet VARCHAR2(20);
  3  BEGIN
  4     SELECT VALUE INTO Charset FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
  5     DBMS_OUTPUT.PUT_LINE('Database NLS_CHARACTERSET is '||Charset);
  6     IF UNISTR('20AC') = '€' THEN
  7             DBMS_OUTPUT.PUT_LINE ( '"€" is equal to U+20AC' );
  8     ELSE
  9             DBMS_OUTPUT.PUT_LINE ( '"€" is not the same as U+20AC' );
 10     END IF;
 11  END;
 12  /

Database NLS_CHARACTERSET is AL32UTF8
"€" is equal to U+20AC

PL/SQL procedure successfully completed.

还考虑这个例子:

CREATE TABLE ARABIC_LANGUAGE (
    LANG_CHAR VARCHAR2(20), 
    LANG_NCHAR NVARCHAR2(20));

INSERT INTO ARABIC_LANGUAGE VALUES ('العربية', 'العربية');

您需要为单个语句的 NLS_LANG 设置两个不同的值 - 这是不可能的.

You would need to set two different values for NLS_LANG for a single statement - which is not possible.

这篇关于OdbcConnection 返回汉字为“?"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆