土耳其语字符的Postgres上层函数未返回预期结果 [英] Postgres upper function on turkish character does not return expected result

查看:171
本文介绍了土耳其语字符的Postgres上层函数未返回预期结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎postgres upper/lower函数不能处理土耳其语字符集中的选择字符.

It looks like postgres upper/lower function does not handle select characters in Turkish character set.

select upper('Aaı'), lower('Aaİ') from mytable;

返回:

AAı, aaİ

而不是:

AAI, aai

请注意,正常的英文字符可以正确转换,但土耳其语I(较低或较高)不能正确转换

Note that normal english characters are converted correctly, but not the Turkish I (lower or upper)

Postgres版本:9.2 32 bit

Postgres version: 9.2 32 bit

数据库编码(这些结果均相同):UTF-8, WIN1254, C

Database encoding (Same result in any of these): UTF-8, WIN1254, C

客户端编码:

 UTF-8, WIN1254, C

操作系统:Windows 7 enterprise edition 64bit

SQL函数lowerupper在UTF-8编码的数据库上为ı和İ返回以下相同的字节

SQL functions lower and upper return the following same bytes for ı and İ on UTF-8 encoded database

\xc4b1    
\xc4b0   

以及以下在WIN1254(土耳其语)编码的数据库上

And the following on WIN1254 (Turkish) encoded database

\xfd      
\xdd     

我希望我的调查是错误的,并且我错过了一些事情.

I hope my investigation is wrong, and there is something I missed.

推荐答案

您的问题 100%Windows.(或更确切地说,Microsoft Visual Studio是PostgreSQL所构建的) ,更确切地说.)

Your problem is 100% Windows. (Or rather Microsoft Visual Studio, which PostgreSQL was built with, to be more precise.)

作为记录,SQL UPPER最终调用Windows的 LCMapStringW (通过 towupper 通过

For the record, SQL UPPER ends up calling Windows' LCMapStringW (via towupper via str_toupper) with almost all the right parameters (locale 1055 Turkish for a UTF-8-encoded, Turkish_Turkey database),

但是

Visual Studio运行时(towupper)不会设置 LCMAP_LINGUISTIC_CASING LCMapStringW dwMapFlags 中. (我可以确认设置该功能确实可以解决问题.)在Microsoft,这不被视为错误;它是有意设计的,可能永远不会被修复" (噢,遗产的乐趣.)

the Visual Studio Runtime (towupper) does not set the LCMAP_LINGUISTIC_CASING bit in LCMapStringW's dwMapFlags. (I can confirm that setting it does the trick.) This is not considered a bug at Microsoft; it is by design, and will probably not ever be "fixed" (oh the joys of legacy.)

您有三种解决方法:

  • 实现@Sorrow的包装器解决方案(或编写您自己的本机函数替换(DLL).)
  • 运行PostgreSQL实例,例如Ubuntu ,它在Turkic语言环境中表现出正确的行为(@Sorrow确认它适用于他);这可能是最简单,最干净的方法.
  • 在PostgreSQL的bin目录中放入修补的32位MSVCR100.DLL (但是UPPERLOWER可以工作,但是其他诸如排序规则的操作可能仍会失败- -同样,在Windows级别.YMMV.)
  • implement @Sorrow's wrapper solution (or write your own native function replacement (DLL).)
  • run your PostgreSQL instance on e.g. Ubuntu which exhibits the right behaviour for Turkic locales (@Sorrow confirmed that it works for him); this is probably the simplest and cleanest way out.
  • drop in a patched 32-bit MSVCR100.DLL in your PostgreSQL bin directory (but although UPPER and LOWER would work, other things such as collation may continue to fail -- again, at the Windows level. YMMV.)

出于完整性(以及怀旧的乐趣)唯一,这是修补Windows系统的过程(但请记住,除非您要从摇篮管理该PostgreSQL实例严重的可能会给继任者带来很多麻烦;每当从头开始部署新的测试或备份系统时,您或您的继任者都必须记住再次应用补丁程序-如果假设您是一个升级到PostgreSQL 10(使用MSVCR120.DLL而不是MSVCR100.DLL),那么您还必须尝试修补新的DLL.)在测试系统上

For completeness (and nostalgic fun) ONLY, here is the procedure to patch a Windows system (but remember, unless you'll be managing this PostgreSQL instance from cradle to grave you may cause a lot of grief to your successor(s); whenever deploying a new test or backup system from scratch you or your successor(s) would have to remember to apply the patch again -- and if let's say you one day upgrade to PostgreSQL 10, which say uses MSVCR120.DLL instead of MSVCR100.DLL, then you'll have to try your luck with patching the new DLL, too.) On a test system

  • 使用 HxD 打开C:\WINDOWS\SYSTEM32\MSVCR100.DLL
  • 立即在PostgreSQL bin目录下以相同的名称保存DLL(请勿尝试使用资源管理器或命令行复制文件,它们可能会复制64位版本)
  • 文件仍在HxD中打开,请转到搜索>替换,选择数据类型:十六进制值,然后
    • 搜索... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 00
    • 替换为... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 01
    • ...然后再一次...
    • 搜索... FC 51 6A 01 8D 4D 08 51 68 00 02 00 00 50 E8 E2
    • 替换为... FC 51 6A 01 8D 4D 08 51 68 00 02 00 01 50 E8 E2
    • use HxD to open C:\WINDOWS\SYSTEM32\MSVCR100.DLL
    • save the DLL right away with the same name under you PostgreSQL bin directory (do not attempt to copy the file using Explorer or the command line, they might copy the 64bit version)
    • with the file still open in HxD, go to Search > Replace, pick Datatype: Hexvalues, then
      • search for...... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 00
      • replace with... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 01
      • ...then once more...
      • search for...... FC 51 6A 01 8D 4D 08 51 68 00 02 00 00 50 E8 E2
      • replace with... FC 51 6A 01 8D 4D 08 51 68 00 02 00 01 50 E8 E2
      • 如果您的查询仍然无法正常工作(对于LC_CTYPELC_COLLATE都确保您的数据库使用Turkish_Turkey进行了UTF-8编码),请在postgres.exe. Dependencywalker.com"rel =" noreferrer> 32位Dependency Walker ,并确保它指示它已从PostgreSQL bin目录中加载MSVCR100.DLL.
      • 如果所有功能都很好,则将修补的DLL复制到生产PostgreSQL bin目录中,然后重新启动.
      • if your query still does not work (make sure your database is UTF-8 encoded with Turkish_Turkey for both LC_CTYPE and LC_COLLATE) open postgres.exe in 32-bit Dependency Walker and make sure it indicates it loads MSVCR100.DLL from the PostgreSQL bin directory.
      • if all functions well copy the patched DLL to the production PostgreSQL bin directory and restart.

      但是请记住,当您将数据从Ubuntu系统中移出或从已打补丁的Windows系统移至未打补丁的Windows系统时,您将再次遇到问题,如果使用Windows实例,则可能无法在Ubuntu上重新导入此数据在citext字段或基于UPPER/LOWER的函数索引中引入了重复项.

      BUT REMEMBER, the moment you move the data off the Ubuntu system or off the patched Windows system to an unpatched Windows system you will have the problem again, and you may be unable to import this data back on Ubuntu if the Windows instance introduced duplicates in a citext field or in a UPPER/LOWER-based function index.

      这篇关于土耳其语字符的Postgres上层函数未返回预期结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆