包含阿拉伯语和西方字符的字符串连接 [英] String concatenation containing Arabic and Western characters
问题描述
我正在尝试连接几个包含阿拉伯字符和西方字符的字符串(混合在同一个字符串中)。问题是结果是一个字符串,它很可能在语义上正确,但与我想要获得的字符串不同,因为Unicode双向算法改变了字符的顺序。基本上,我只是想连接好像它们都是LTR,忽略了一些事实,即RTL,一种不可知的连接。
I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.
我不确定如果我的解释清楚,但我认为我不能做得更好。
I'm not sure if I was clear in my explanation, but I don't think I can do it any better.
希望有人可以帮助我。
亲切的问候,
Carlos Ferreira
Carlos Ferreira
BTW,字符串来自数据库。
BTW, the strings are being obtained from the database.
编辑
前两个字符串是我要连接的字符串,第三个字符串是结果。
The first 2 Strings are the strings I want to concatenate and the third is the result.
编辑2
实际上,连接的字符串是一个与图像中的图像略有不同,它在复制+粘贴期间被改变,1在第一个A之后而不是在第二个A之前。
Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.
推荐答案
您可以嵌入bidi regi ons使用unicode格式控制代码点:
You can embed bidi regions using unicode format control codepoints:
- 从左到右嵌入(U + 202A)
- 从右到左嵌入(U + 202B)
- 流行方向格式化(U + 202C)
所以在java中,要像英语这样的LTR语言嵌入像阿拉伯语这样的RTL语言,你可以
So in java, to embed a RTL language like Arabic in an LTR language like English, you would do
myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish
并执行相反的操作
myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic
请参阅双向常规格式有关详细信息,或源材料的方向格式代码的Unicode规范章节。
See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.
这篇关于包含阿拉伯语和西方字符的字符串连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!