解码在utf-8格式编码的字符串 [英] decode string encoded in utf-8 format in android

查看:184
本文介绍了解码在utf-8格式编码的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,通过一个xml,它是德文的文本。德语特定的字符通过UTF-8格式进行编码。在显示字符串之前,我需要对其进行解码。



我尝试过以下操作:

  try {
BufferedReader in = new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(nodevalue.getBytes()),UTF8));
event.attributes.put(title,in.readLine());
} catch(UnsupportedEncodingException e){
// TODO自动生成的catch块
e.printStackTrace();
} catch(IOException e){
// TODO自动生成的catch块
e.printStackTrace();
}

我也尝试过:

  try {
event.attributes.put(title,URLDecoder.decode(nodevalue,UTF-8)) ;
} catch(UnsupportedEncodingException e){
// TODO自动生成的catch块
e.printStackTrace();
}

他们都没有工作。如何解码德国字符串



提前谢谢。



UDPDATE:

  @Override 
public void characters(char [] ch,int start,int length)
抛出SAXException {
// TODO自动生成的方法stub
super.characters(ch,start,length);
if(nodename!= null){
String nodevalue = String.copyValueOf(ch,0,length);
if(nodename.equals(startdat)){
if(event.attributes.get(eventid)。equals(187)){
}
}
if(nodename.equals(startscreen)){
imageaddress = nodevalue;
}
else {
if(nodename.equals(title)){
// try {
// BufferedReader in = new BufferedReader(
// new InputStreamReader(
// new ByteArrayInputStream(nodevalue.getBytes()),UTF8));
// event.attributes.put(title,in.readLine());
//} catch(UnsupportedEncodingException e){
// // TODO自动生成的catch块
// e.printStackTrace();
//} catch(IOException e){
// // TODO自动生成的catch块
// e.printStackTrace();
//}
// try {
// event.attributes.put(title,
// URLDecoder.decode(nodevalue,UTF-8)) ;
//} catch(UnsupportedEncodingException e){
// // TODO自动生成的catch块
// e.printStackTrace();
//}
event.attributes.put(title,StringEscapeUtils
.unescapeHtml(new String(ch,start,length).trim()));
} else
event.attributes.put(nodename,nodevalue);
}
}
}


解决方案

您可以使用String构造函数和charset参数:

  try 
{
final String s = new String(nodevalue.getBytes(),UTF-8);
}
catch(UnsupportedEncodingException e)
{
Log.e(utf8,conversion,e);
}

此外,由于您从xml文档获取数据,我认为被编码为UTF-8,可能问题在于解析它。



您应该使用 InputStream / InputSource 而不是 XMLReader 实现,因为它附带了编码。因此,如果您从http响应中获取此数据,则可以使用 InputStream InputSource

  try 
{
HttpEntity entity = response.getEntity();
final InputStream in = entity.getContent();
final SAXParser parser = SAXParserFactory.newInstance()。newSAXParser();
final XmlHandler handler = new XmlHandler();
Reader reader = new InputStreamReader(in,UTF-8);
InputSource is = new InputSource(reader);
is.setEncoding(UTF-8);
parser.parse(is,handler);
// TODO:从处理程序中获取数据
}
catch(final Exception e)
{
Log.e(ParseError,解析XML ,e);
}

或只是 InputStream

  try 
{
HttpEntity entity = response.getEntity();
final InputStream in = entity.getContent();
final SAXParser parser = SAXParserFactory.newInstance()。newSAXParser();
final XmlHandler handler = new XmlHandler();
parser.parse(in,handler);
// TODO:从处理程序中获取数据
}
catch(final Exception e)
{
Log.e(ParseError,解析XML ,e);
}

更新1



以下是完整的请求和响应处理示例:

  try 
{
final DefaultHttpClient client = new DefaultHttpClient();
final HttpPost httppost = new HttpPost(http://example.location.com/myxml);
final HttpResponse response = client.execute(httppost);
final HttpEntity entity = response.getEntity();

final InputStream in = entity.getContent();
final SAXParser parser = SAXParserFactory.newInstance()。newSAXParser();
final XmlHandler handler = new XmlHandler();
parser.parse(in,handler);
// TODO:从处理程序中获取数据
}
catch(final Exception e)
{
Log.e(ParseError,解析XML ,e);
}

更新2



由于问题不是编码而是将xml转义为html实体,所以最好的解决方案是(除了纠正php不要逃避响应),使用 apache.commons.lang库非常方便的 static StringEscapeUtils类



导入库后,在xml处理程序的字符方法中,您将放入以下内容:


  @Override 
public void characters(final char [] ch,final int start,final int length)
throws SAXException
{
//该变量将保存正确的未转义值
final String elementValue = StringEscapeUtils。
unescapeHtml(new String(ch,start,length).trim());
[...]
}

更新3



在最后一个代码中,问题是初始化 nodevalue 变量。它应该是:

  String nodevalue = StringEscapeUtils.unescapeHtml(
new String(ch,start,length).trim ());


I have a string which comes via an xml , and it is text in German. The characters that are German specific are encoded via the UTF-8 format. Before display the string I need to decode it.

I have tried the following:

try {
    BufferedReader in = new BufferedReader(
            new InputStreamReader(
                    new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
    event.attributes.put("title", in.readLine());
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

I have also tried this:

try {
    event.attributes.put("title", URLDecoder.decode(nodevalue, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

None of them are working. How do I decode the German string

thank you in advance.

UDPDATE:

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    // TODO Auto-generated method stub
    super.characters(ch, start, length);
    if (nodename != null) {
        String nodevalue = String.copyValueOf(ch, 0, length);
        if (nodename.equals("startdat")) {
            if (event.attributes.get("eventid").equals("187")) {
            }
        }
        if (nodename.equals("startscreen")) {
            imageaddress = nodevalue;
        }
        else {
            if (nodename.equals("title")) {
                // try {
                // BufferedReader in = new BufferedReader(
                // new InputStreamReader(
                // new ByteArrayInputStream(nodevalue.getBytes()), "UTF8"));
                // event.attributes.put("title", in.readLine());
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // } catch (IOException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                // try {
                // event.attributes.put("title",
                // URLDecoder.decode(nodevalue, "UTF-8"));
                // } catch (UnsupportedEncodingException e) {
                // // TODO Auto-generated catch block
                // e.printStackTrace();
                // }
                event.attributes.put("title", StringEscapeUtils
                        .unescapeHtml(new String(ch, start, length).trim()));
            } else
                event.attributes.put(nodename, nodevalue);
        }
    }
}

解决方案

You could use the String constructor with the charset parameter:

try
{
    final String s = new String(nodevalue.getBytes(), "UTF-8");
}
catch (UnsupportedEncodingException e)
{
    Log.e("utf8", "conversion", e);
}

Also, since you get the data from an xml document, and I assume it is encoded UTF-8, probably the problem is in parsing it.

You should use InputStream/InputSource instead of a XMLReader implementation, because it comes with the encoding. So if you're getting this data from a http response, you could either use both InputStream and InputSource

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    Reader reader = new InputStreamReader(in, "UTF-8");
    InputSource is = new InputSource(reader);
    is.setEncoding("UTF-8");
    parser.parse(is, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

or just the InputStream:

try
{
    HttpEntity entity = response.getEntity();
    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 1

Here is a sample of a complete request and response handling:

try
{
    final DefaultHttpClient client = new DefaultHttpClient();
    final HttpPost httppost = new HttpPost("http://example.location.com/myxml");
    final HttpResponse response = client.execute(httppost);
    final HttpEntity entity = response.getEntity();

    final InputStream in = entity.getContent();
    final SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
    final XmlHandler handler = new XmlHandler();
    parser.parse(in, handler);
    //TODO: get the data from your handler
}
catch (final Exception e)
{
    Log.e("ParseError", "Error parsing xml", e);
}

Update 2

As the problem is not the encoding but the source xml being escaped to html entities, the best solution is (besides correcting the php to do not escape the response), to use the apache.commons.lang library's very handy static StringEscapeUtils class.

After importing the library, in your xml handler's characters method you put the following:

@Override
public void characters(final char[] ch, final int start, final int length) 
    throws SAXException
{
    // This variable will hold the correct unescaped value
    final String elementValue = StringEscapeUtils.
        unescapeHtml(new String(ch, start, length).trim());
    [...]
}

Update 3

In your last code the problem is with the initialization of the nodevalue variable. It should be:

String nodevalue = StringEscapeUtils.unescapeHtml(
    new String(ch, start, length).trim());

这篇关于解码在utf-8格式编码的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆