AWS数据管道S3 CSV到DynamoDB JSON错误 [英] AWS Data Pipeline S3 CSV to DynamoDB JSON Error
问题描述
我试图通过AWS DATA Pipeline插入S3目录中的多个csv,但是,我遇到了这个错误.
I'm trying to insert several csv located in the S3 directory with the AWS DATA Pipeline But, I'm taking this error.
在org.apache.hadoop.mapred.YarnChild上的org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)在javax.security.auth.Subject.doAs(Subject.java:422) main(YarnChild.java:169)由以下原因引起:com.google.gson.stream.MalformedJsonException:在com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505)的第1行第10列处应为':' com.google.gson.stream.JsonReader.peek(JsonReader.java:414)上的com.google.gson.stream.JsonReader.doPeek(JsonReader.java:519)com.google.gson.internal.bind.ReflectiveTypeAdapterFactory $ com.google.gson.internal.bind.MapTypeAdapterFactory $ Adapter.read(MapTypeAdapterFactory.com)上的Adapter.read(ReflectiveTypeAdapterFactory.java:157),位于com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40). com.google.gson.internal.bind.MapTypeAdapterFactory $ Adapter.read(MapTypeAdapterFactory.java:145)的com.google.gson.Gson.fromJson(Gson.java:803)的java:187)...还有15个例外在线程中主要"java.io. errorStackTrace amazonaws.datapipeline.taskrunner.TaskExecutionException:无法完成EMR转换.在amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67)在amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)在amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) )在amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)在amazonaws.datapipeline.taskrunner.TaskPoller $ 1.run(TaskPoller.java:81)在private.com.amazonaws.services.datapipeline.poller.PollWorker在private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)处的.executeWork(PollWorker.java:76)在java.lang.Thread.run(Thread.java:748)处由以下原因引起: amazonaws.datapipeline.taskrunner.TaskExecutionException:在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)在javax.security.auth.Subject.doAs(Subject.java:422)在org.apache.hadoop .mapred.YarnChild.main(YarnChild.java:169)原因:com.google.gson.stream.MalformedJsonException:预期在第1行第10列为:" com.google.gson.stream.JsonReader.doPeek(JsonReader.java:519)上的com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505)(com.google.gson.stream.JsonReader.peek( com.google.gson.internal.bind.ReflectiveTypeAdapterFactory $ Adapter.read(ReflectiveTypeAdapterFactory.java:157)上的JsonReader.java:414)com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)上的JsonReader.java:414)在com.google.gson.internal.bind.MapTypeAdapterFactory $ Adapter.read(MapTypeAdapterFactory.java:145)在com.google.gson.internal.bind.MapTypeAdapterFactory $ Adapter.read(MapTypeAdapterFactory.java:145)在com.google.gson.internal.bind.MapTypeAdapterFactory $ Adapter.read(MapTypeAdapterFactory.java:187) .gson.Gson.fromJson(Gson.java:803)...还有15个线程"main"中的异常java.io.IOException:作业失败!在org.apache.hadoop.dynamodb.tools.DynamoDBImport.run(DynamoDBImport.java:81)在org.apache.hadoop.dynamodb.tools.org(org.apache.hadoop.util.ToolRunner) org.apache.hadoop.dynamodb.tools.DynamoDBImport.main(DynamoDBImport.java:43)的.run(ToolRunner.java:76)在sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62),位于org.apache.hadoop.util.RunJar,位于java.lang.reflect.Method.invoke(Method.java:498),位于sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43).在amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286)在org.apache.hadoop.util.RunJar.main(RunJar.java:153)处运行(RunJar.java:239) .EmrActivity.runActivity(EmrActivity.java:63)...还有7个
at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 10 at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505) at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:519) at com.google.gson.stream.JsonReader.peek(JsonReader.java:414) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:157) at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40) at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:187) at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145) at com.google.gson.Gson.fromJson(Gson.java:803) ... 15 more Exception in thread "main" java.io. errorStackTrace amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:748) Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException: at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: com.google.gson.stream.MalformedJsonException: Expected ':' at line 1 column 10 at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505) at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:519) at com.google.gson.stream.JsonReader.peek(JsonReader.java:414) at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:157) at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40) at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:187) at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145) at com.google.gson.Gson.fromJson(Gson.java:803) ... 15 more Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873) at org.apache.hadoop.dynamodb.tools.DynamoDBImport.run(DynamoDBImport.java:81) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.dynamodb.tools.DynamoDBImport.main(DynamoDBImport.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more
推荐答案
这解决了我的问题.
AWS DATA管道使用的格式.
format that the AWS DATA Pipeline uses.
{"Name": {"S":"Amazon push"},"Category": {"S":"Amazon Web Services"}}
{"Name": {"S":"Amazon S3"},"Category": {"S":"Amazon Web Services"}}```
References:
https://calorious.wordpress.com/2016/03/18/episode-4-importing-json-into-dynamodb/
https://medium.com/@ashleywnj/appsync-s3-data-pipeline-dynamodb-854f99d70b41
这篇关于AWS数据管道S3 CSV到DynamoDB JSON错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!