解码 �真实的性格 [英] Decode � to real character
问题描述
当我从 twitter 的 Stream API 读取数据然后写入 xmlfile 时.
when I read data from Stream API of twitter and then write to xmlfile.
但是像�
这样的特殊字符会导致错误(我的意思是当我在Chrome中打开那个xml文件时,Chrome说那个字符有错误!)
But some special character like �
will cause error (I mean when I open that xmlfile in Chrome, Chrome said that there was an error at that character!)
我想在写入 xmlfile 之前将该编码序列 (�
) 转换为实际字符 ()!
I want to convert that encoded sequence (�
) into real character () before writing to xmlfile!
如何实现?
-------------添加--------------
-------------ADDED--------------
这是 XMLFile 内容:
This is the XMLFile content:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<text>@carlyraejepsen would be a dream if you follow me, please follow me?, I love you so much you're my inspiration</text>
<text>someone please bring me a caramel apple and a mocha from black cat. i'll love you forever</text>
<text>"@G_MartinFlyKick: Marry me Juliet.I love you and that's all I really know."����������</text>
<text>"I need to see a picture of him cuz Im trying to imagine you guys making love and all I see is u climbing on top of a big question mark"lmao</text>
<text>@District3music hi, I LOVE YOU follow me please? &lt;3 xx 23</text>
<text>RT @syardley_: So appreciative of my family and people I love, wouldn't be where I am without them. #thankful</text>
<text>#DISTRICT3HALLOWEENFOLLOWSPREE #DISTRICT3HALLOWEENFOLLOWSPREE #3EEKERFROMTHENETHERLANDS love you! Please follow ? @District3music x42</text>
<text>Arguably my favorite electronic music producer @Kluteuk is coming back to Toronto on Dec 22nd. So stoked. Guy has made so many tunes I LOVE.</text>
<text>The stakes are high, the water's rough, but this love is ours.</text>
<text>@NiallOfficial Answer me, I love you very much. Venezuela loves. jhgj</text>
<text>Love this shit http://t.co/qSP79NKx</text>
</root>
这是来自 Chrome 的错误:
And here is error from Chrome:
This page contains the following errors:
error on line 5 at column 91: xmlParseCharRef: invalid xmlChar value 55357
Below is a rendering of the page up to the first error.
推荐答案
字符引用 �
表示代理代码点(U+D83D),所以尝试是错误的将其转换为字符.它不是一个字符,甚至不是一个字符.
The character reference �
denotes a surrogate code point (U+D83D), so it would be wrong to try to convert it to a character. It is not a character, not even half a character.
您需要追溯到生成引用的点.原因可能是字符编码混乱.在 UTF-16 中,代理代码单元可能会出现,但当数据被解释为字符时必须成对处理,例如转换为另一种编码或转换为字符引用.
You need to track back to the point where the reference was generated. The reason might be a character encoding confusion. In UTF-16, surrogate code units may appear but must be handled in pairs when the data is interpreted as characters and e.g. converted to another encoding or turned to character references.
这篇关于解码 &#55357;真实的性格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!