Spark Streaming-从Kinesis读取时出错 [英] Spark Streaming - Error when reading from Kinesis
问题描述
我是Apache Spark Streaming的新手.试图构建Spark以从Kinesis Stream读取值.这是我的python脚本
I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script
import settings
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME)
streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION)
kinesis_good_stream = KinesisUtils.createStream(
ssc=streaming_context, kinesisAppName=settings.KINESIS_APP_NAME,
streamName=settings.KINESIS_GOOD_STREAM, endpointUrl=settings.KINESIS_ENDPOINT,
awsAccessKeyId=settings.AWS_ACCESS_KEY, awsSecretKey=settings.AWS_SECRET_KEY,
checkpointInterval=settings.KINESIS_CHECKPOINT_INTERVAL, regionName=settings.KINESIS_REGION,
initialPositionInStream=InitialPositionInStream.LATEST)
counts = kinesis_good_stream.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
streaming_context.start()
streaming_context.awaitTermination()
设置文件
# Kinesis Configuration
KINESIS_REGION = 'ap-southeast-1'
KINESIS_ENDPOINT = 'kinesis.ap-southeast-1.amazonaws.com'
KINESIS_GOOD_STREAM = 'GoodStream'
KINESIS_BAD_STREAM = 'BadStream'
KINESIS_CHECKPOINT_INTERVAL = 2000
KINESIS_APP_NAME = 'test-spark'
# Spark context
BATCH_DURATION = 2
# AWS Credential
AWS_ACCESS_KEY = ''
AWS_SECRET_KEY = ''
我使用此命令运行脚本
spark-submit --jars spark-streaming-kinesis-asl-assembly.jar kinesis.py
在我的django项目中
From my django project
INFO:snowplow_tracker.emitters:GET request finished with status code: 200
INFO:snowplow_tracker.emitters:POST request finished with status code: 200
从我的收藏家那里注意到,成功写入Kinesis
From my collector, noticed that writing to Kinesis is successful
08:00:19.720 [pool-1-thread-9] INFO c.s.s.c.s.sinks.KinesisSink - Successfully wrote 2 out of 2 records
对于我的Spark Streaming
For my Spark Streaming
-------------------------------------------
Time: 2016-11-25 07:59:25
-------------------------------------------
16/11/25 07:59:30 ERROR Executor: Exception in task 0.0 in stage 345.0 (TID 173)
java.lang.NoSuchMethodError: org.apache.spark.storage.BlockManager.get(Lorg/apache/spark/storage/BlockId;)Lscala/Option;
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.getBlockFromBlockManager$1(KinesisBackedBlockRDD.scala:104)
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.compute(KinesisBackedBlockRDD.scala:117)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:390)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
对于我的Kinesis Stream,我使用1个Shard并设置2个核心的Spark上下文
For my Kinesis Stream, I'm using 1 Shard and set Spark Context with 2 Cores
推荐答案
已成功解决该错误.我正在运行Spark-2.0.2,但我正在使用streaming-kinesis-asl-assembly.2.10-2.0.0.jar,这会导致java.lang.NoSuchMethodError.
Managed to solve the error. I'm running with Spark-2.0.2 but I'm using streaming-kinesis-asl-assembly.2.10-2.0.0.jar which cause the java.lang.NoSuchMethodError.
这篇关于Spark Streaming-从Kinesis读取时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!