Spark Stream-'utf8'编解码器无法解码字节 [英] Spark Stream - 'utf8' codec can't decode bytes

查看:97
本文介绍了Spark Stream-'utf8'编解码器无法解码字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对流编程非常陌生.我们有使用Avro的Kafka流.

我想将Kafka Stream连接到Spark Stream.我使用了波纹管代码.

  kvs = KafkaUtils.createDirectStream(ssc,[topic],{"metadata.broker.list":经纪人})行数= kvs.map(lambda x:x [1]) 

我遇到了波纹管错误.

返回s.decode('utf-8')解码中的文件"/usr/lib64/python2.7/encodings/utf_8.py",第16行返回codecs.utf_8_decode(input,errors,True)UnicodeDecodeError:"utf8"编解码器无法解码位置57-58中的字节:无效的连续字节

我是否需要指定Kafka使用Avro,是否存在上述错误?如果是我该如何指定呢?

解决方案

对,问题在于流的反序列化.您可以使用 confluent-kafka-python 库并指定 valueDecoder >在:

从confluent_kafka.avro.cached_schema_registry_client中的

 导入CachedSchemaRegistryClient`从confluent_kafka.avro.serializer.message_serializer导入MessageSerializerkvs = KafkaUtils.createDirectStream(ssc,[topic],{"metadata.broker.list":brokers},valueDecoder = MessageSerializer.decode_message)` 

https://stackoverflow.com/a/49179186/6336337 的解决方案积分

I'm fairly new to stream programming. We have Kafka stream which use Avro.

I want to connect a Kafka Stream to Spark Stream. I used bellow code.

kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})
lines = kvs.map(lambda x: x[1]) 

I got bellow error.

return s.decode('utf-8') File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 57-58: invalid continuation byte

Do i need to specify that Kafka use Avro, Is above error for that? If it is how I can specify it?

解决方案

Right, the problem is with deserialization of the stream. You can use confluent-kafka-python library and specify valueDecoder in :

from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient`
from confluent_kafka.avro.serializer.message_serializer import MessageSerializer

kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message)`

Credits for the solution to https://stackoverflow.com/a/49179186/6336337

这篇关于Spark Stream-'utf8'编解码器无法解码字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆