Spark Streaming-从Kinesis读取时出错 [英] Spark Streaming - Error when reading from Kinesis

查看:182
本文介绍了Spark Streaming-从Kinesis读取时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Apache Spark Streaming的新手.试图构建Spark以从Kinesis Stream读取值.这是我的python脚本

I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script

import settings
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils,   InitialPositionInStream
spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME)

streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION)

kinesis_good_stream = KinesisUtils.createStream(
ssc=streaming_context, kinesisAppName=settings.KINESIS_APP_NAME,
streamName=settings.KINESIS_GOOD_STREAM, endpointUrl=settings.KINESIS_ENDPOINT,
awsAccessKeyId=settings.AWS_ACCESS_KEY, awsSecretKey=settings.AWS_SECRET_KEY,
checkpointInterval=settings.KINESIS_CHECKPOINT_INTERVAL, regionName=settings.KINESIS_REGION,
initialPositionInStream=InitialPositionInStream.LATEST)

counts = kinesis_good_stream.flatMap(lambda line: line.split(" ")) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a+b)
counts.pprint()

streaming_context.start()
streaming_context.awaitTermination()

设置文件

# Kinesis Configuration
KINESIS_REGION = 'ap-southeast-1'
KINESIS_ENDPOINT = 'kinesis.ap-southeast-1.amazonaws.com'
KINESIS_GOOD_STREAM = 'GoodStream'
KINESIS_BAD_STREAM = 'BadStream'
KINESIS_CHECKPOINT_INTERVAL = 2000
KINESIS_APP_NAME = 'test-spark'

# Spark context
BATCH_DURATION = 2

# AWS Credential
AWS_ACCESS_KEY = ''
AWS_SECRET_KEY = ''

我使用此命令运行脚本

spark-submit --jars spark-streaming-kinesis-asl-assembly.jar kinesis.py  

在我的django项目中

From my django project

INFO:snowplow_tracker.emitters:GET request finished with status code: 200
INFO:snowplow_tracker.emitters:POST request finished with status code: 200

从我的收藏家那里注意到,成功写入Kinesis

From my collector, noticed that writing to Kinesis is successful

08:00:19.720 [pool-1-thread-9] INFO  c.s.s.c.s.sinks.KinesisSink - Successfully wrote 2 out of 2 records

对于我的Spark Streaming

For my Spark Streaming

-------------------------------------------
Time: 2016-11-25 07:59:25
-------------------------------------------

16/11/25 07:59:30 ERROR Executor: Exception in task 0.0 in stage 345.0 (TID 173)
java.lang.NoSuchMethodError: org.apache.spark.storage.BlockManager.get(Lorg/apache/spark/storage/BlockId;)Lscala/Option;
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.getBlockFromBlockManager$1(KinesisBackedBlockRDD.scala:104)
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.compute(KinesisBackedBlockRDD.scala:117)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:390)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

对于我的Kinesis Stream,我使用1个Shard并设置2个核心的Spark上下文

For my Kinesis Stream, I'm using 1 Shard and set Spark Context with 2 Cores

推荐答案

已成功解决该错误.我正在运行Spark-2.0.2,但我正在使用streaming-kinesis-asl-assembly.2.10-2.0.0.jar,这会导致java.lang.NoSuchMethodError.

Managed to solve the error. I'm running with Spark-2.0.2 but I'm using streaming-kinesis-asl-assembly.2.10-2.0.0.jar which cause the java.lang.NoSuchMethodError.

这篇关于Spark Streaming-从Kinesis读取时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆