使用Delphi 2010替换UTF-8文件中的Unicode字符 [英] Replacing a unicode character in UTF-8 file using delphi 2010

查看:155
本文介绍了使用Delphi 2010替换UTF-8文件中的Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用字符(十进制值65)替换UTF-8文件中的字符(十进制值197)

I am trying to replace character (decimal value 197) in a UTF-8 file with character (decimal value 65)

我可以加载文件并将其放入

I can load the file and put it in a string (may not need to do that though)

SS := TStringStream.Create(ParamStr1, TEncoding.UTF8);
SS.LoadFromFile(ParamStr1);
//S:= SS.DataString;
//ShowMessage(S);

但是,如何将所有197替换为65,并将其另存为UTF-8 ?

However, how do i replace all 197's with a 65, and save it back out as UTF-8?

 SS.SaveToFile(ParamStr2);
 SS.Free;

--------------编辑------ ----------

-------------- EDIT ----------------

reader:= TStreamReader.Create(ParamStr1, TEncoding.UTF8);
 writer:= TStreamWriter.Create(ParamStr2, False, TEncoding.UTF8);

 while not Reader.EndOfStream do
 begin
  S:= reader.ReadLine;
  for I:= 1 to Length(S)  do
  begin
   if Ord(S[I]) = 350 then
   begin
    Delete(S,I,1);
    Insert('A',S,I);
   end;
  end;
  writer.Write(S + #13#10);
 end;

 writer.Free;
 reader.Free;


推荐答案

小数 197 是十六进制 C5 ,十进制 65 是十六进制 41

Decimal 197 is hex C5, and decimal 65 is hex 41.

C5 本身不是有效的UTF-8八位位组,而是 41 是。因此,我必须假设您实际上是指Unicode代码点 U + 00C5带有大括号的拉丁字母大写字母 U + 0041拉丁字母大写A

C5 is not a valid UTF-8 octet by itself, but 41 is. So I have to assume you are actually referring to Unicode codepoints U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE and U+0041 LATIN CAPITAL LETTER A instead.

U + 00C5 在UTF-8中编码为 C3 85 U + 0041 编码为 41 。要执行您要求的操作,您必须解码UTF-8,替换代码点,然后重新编码回UTF-8。 StringReplace()可以很好地工作,例如:

U+00C5 is encoded in UTF-8 as C3 85, and U+0041 is encoded as 41. To do what you are asking, you have to decode the UTF-8, replace the codepoints, then re-encode back to UTF-8. StringReplace() will work just fine for that, eg:

SS := TStringStream.Create('', TEncoding.UTF8);
SS.LoadFromFile(ParamStr1);

S := StringReplace(SS.DataString, 'Å', 'A', [rfReplaceAll]);

SS2 := TStringStream.Create(S, TEncoding.UTF8);
SS2.SaveToFile(ParamStr2);

SS2.Free;
SS.Free;

或:

reader := TStreamReader.Create(ParamStr1, TEncoding.UTF8);
writer := TStreamWriter.Create(ParamStr2, False, TEncoding.UTF8);

while not Reader.EndOfStream do
begin
  S := reader.ReadLine;
  S := StringReplace(S, 'Å', 'A', [rfReplaceAll]);
  writer.WriteLine(S);
end;

writer.Free;
reader.Free;

更新:根据其他评论,看来您实际上不是对Unicode代码点 U + 00C5带有环的拉丁字母大写感兴趣,而对 U + 015E带有Cedilla的拉丁字母大写,它以UTF-8编码为 C5 9E 。如果是这样,则在调用 StringReplace()时,只需将Å替换为Ş 在对UTF-8数据进行解码后,code>:

Update: based on other comments, it looks like you are not actually interested in Unicode codepoint U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE, but rather in U+015E LATIN CAPITAL LETTER S WITH CEDILLA instead, which is encoded in UTF-8 as C5 9E. If that is true, then simply replace Å with Ş when calling StringReplace() after the UTF-8 data has been decoded:

S := StringReplace(S, 'Ş', 'A', [rfReplaceAll]);

这篇关于使用Delphi 2010替换UTF-8文件中的Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆