spark-1.4.1 saveAsTextFile 到 S3 在 emr-4.0.0 上非常慢 [英] spark-1.4.1 saveAsTextFile to S3 is very slow on emr-4.0.0

查看:29
本文介绍了spark-1.4.1 saveAsTextFile 到 S3 在 emr-4.0.0 上非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 amazom aws emr 4.0.0 中运行 spark 1.4.1

I run spark 1.4.1 in amazom aws emr 4.0.0

由于某些原因,与 emr 3.8 相比,emr 4.0.0 上的 saveAsTextFile 速度非常慢(原为 5 秒,现在为 95 秒)

For some reson spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec)

实际上 saveAsTextFile 说它是在 4.356 秒内完成的,但在那之后我看到很多 INFO 消息,在接下来的 90 秒内来自 com.amazonaws.latency 记录器的 404 错误

Actually saveAsTextFile says that it's done in 4.356 sec but after that I see lots of INFO messages with 404 error from com.amazonaws.latency logger for next 90 sec

spark> sc.parallelize(List.range(0, 1600000),160).map(x => x + "	" + "A"*100).saveAsTextFile("s3n://foo-bar/tmp/test40_20")

2015-09-01 21:16:17,637 INFO  [dag-scheduler-event-loop] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - ResultStage 5 (saveAsTextFile at <console>:22) finished in 4.356 s
2015-09-01 21:16:17,637 INFO  [task-result-getter-2] cluster.YarnScheduler (Logging.scala:logInfo(59)) - Removed TaskSet 5.0, whose tasks have all completed, from pool 
2015-09-01 21:16:17,637 INFO  [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Job 5 finished: saveAsTextFile at <console>:22, took 4.547829 s
2015-09-01 21:16:17,638 INFO  [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://foo-bar/tmp/test40_20/_temporary/0 with recursive false
2015-09-01 21:16:17,651 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 3B2F06FD11682D22), S3 Extended Request ID: C8T3rXVSEIk3swlwkUWJJX3gWuQx3QKC3Yyfxuhs7y0HXn3sEI9+c1a0f7/QK8BZ], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[3B2F06FD11682D22], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.923], HttpRequestTime=[11.388], HttpClientReceiveResponseTime=[9.544], RequestSigningTime=[0.274], HttpClientSendRequestTime=[0.129], 
2015-09-01 21:16:17,723 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[E5D513E52B20FF17], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[71.927], HttpRequestTime=[53.517], HttpClientReceiveResponseTime=[51.81], RequestSigningTime=[0.209], ResponseProcessingTime=[17.97], HttpClientSendRequestTime=[0.089], 
2015-09-01 21:16:17,756 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 62C6B413965447FD), S3 Extended Request ID: 4w5rKMWCt9EdeEKzKBXZgWpTcBZCfDikzuRrRrBxmtHYxkZyS4GxQVyADdLkgtZf], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[62C6B413965447FD], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.044], HttpRequestTime=[10.543], HttpClientReceiveResponseTime=[8.743], RequestSigningTime=[0.271], HttpClientSendRequestTime=[0.138], 
2015-09-01 21:16:17,774 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[F62B991825042889], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[16.724], HttpRequestTime=[16.292], HttpClientReceiveResponseTime=[14.728], RequestSigningTime=[0.148], ResponseProcessingTime=[0.155], HttpClientSendRequestTime=[0.068], 
2015-09-01 21:16:17,786 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 4846575A1C373BB9), S3 Extended Request ID: aw/MMKxKPmuDuxTj4GKyDbp8hgpQbTjipJBzdjdTgbwPgt5NsZS4z+tRf2bk3I2E], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[4846575A1C373BB9], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.531], HttpRequestTime=[11.134], HttpClientReceiveResponseTime=[9.434], RequestSigningTime=[0.206], HttpClientSendRequestTime=[0.13], 
2015-09-01 21:16:17,786 INFO  [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://foo-bar/tmp/test40_20/_temporary/0/task_201509012116_0005_m_000000 with recursive false
2015-09-01 21:16:17,798 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 8A91D9A08CE3C1FE), S3 Extended Request ID: u5RLzX1OvlIHBMCggSs3AGR96raYgD/Xu8RmoJuN/B+qZchoF1ZkbWIHRcqbzPNN], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[8A91D9A08CE3C1FE], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.472], HttpRequestTime=[11.147], HttpClientReceiveResponseTime=[9.594], RequestSigningTime=[0.168], HttpClientSendRequestTime=[0.088], 
2015-09-01 21:16:17,817 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[006EE9124BA77E28], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[19.185], HttpRequestTime=[16.691], HttpClientReceiveResponseTime=[15.039], RequestSigningTime=[0.17], ResponseProcessingTime=[2.141], HttpClientSendRequestTime=[0.11], 
2015-09-01 21:16:17,830 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 62F097583E42AB48), S3 Extended Request ID: EoJ7XNxQzKAm6yanlrf7ukIJOxYrhr5m8xEROkLc1wjFpPRgjuwY/JzznCshredZ], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[62F097583E42AB48], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[12.004], HttpRequestTime=[11.57], HttpClientReceiveResponseTime=[9.879], RequestSigningTime=[0.218], HttpClientSendRequestTime=[0.089], 
2015-09-01 21:16:17,844 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: A96FDB3E0E0E13FE), S3 Extended Request ID: Y1nnEJAd/wNtW+T2pFvg8HG5fzcjs+ztuLcXwFV3I6Bda4nKU+9rSdbTkoDtNwtu], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[A96FDB3E0E0E13FE], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[13.543], HttpRequestTime=[13.145], HttpClientReceiveResponseTime=[11.505], RequestSigningTime=[0.207], HttpClientSendRequestTime=[0.108], 
2015-09-01 21:16:17,911 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[4C105174ADF12A0B], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[66.408], HttpRequestTime=[63.949], HttpClientReceiveResponseTime=[62.298], RequestSigningTime=[0.211], ResponseProcessingTime=[2.049], HttpClientSendRequestTime=[0.085], 
2015-09-01 21:16:17,912 INFO  [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:rename(1182)) - rename s3n://foo-bar/tmp/test40_20/_temporary/0/task_201509012116_0005_m_000000/part-00000 s3n://foo-bar/tmp/test40_20/part-00000
2015-09-01 21:16:17,927 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 547162454610B1C3), S3 Extended Request ID: VgjjiHVtd/RutYxW3jPAZgos64j7JYfBmaMhkZvmyhkgD5ZuCAMSRMd/TrWQmTci], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[547162454610B1C3], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[15.214], HttpRequestTime=[14.764], HttpClientReceiveResponseTime=[13.047], RequestSigningTime=[0.243], HttpClientSendRequestTime=[0.124], 
2015-09-01 21:16:18,037 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 6F10454BF138C69F), S3 Extended Request ID: HSt8mkimmo9fK5qqTaU6OBGKXTQ1wvyctgMZSBsoIgxEFY+Yu5eq/Bn8fOCSsk3B], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[6F10454BF138C69F], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[108.944], HttpRequestTime=[108.542], HttpClientReceiveResponseTime=[106.874], RequestSigningTime=[0.171], HttpClientSendRequestTime=[0.067], 
2015-09-01 21:16:18,215 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[942D4DFF59A2B262], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[177.058], HttpRequestTime=[174.523], HttpClientReceiveResponseTime=[172.689], RequestSigningTime=[0.263], ResponseProcessingTime=[2.049], HttpClientSendRequestTime=[0.117], 
2015-09-01 21:16:18,235 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 712A1FF2554DDD5D), S3 Extended Request ID: RZZDuIrkdE/cdhAFijZix2juyAfZHyj7Mw2xJuyrEaJR5He0HREB30LATWvMJX3A], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[712A1FF2554DDD5D], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[20.187], HttpRequestTime=[19.728], HttpClientReceiveResponseTime=[18.001], RequestSigningTime=[0.238], HttpClientSendRequestTime=[0.125], 
2015-09-01 21:16:18,248 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[B386866C749DB8E0], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.628], HttpRequestTime=[11.091], HttpClientReceiveResponseTime=[9.513], RequestSigningTime=[0.24], ResponseProcessingTime=[0.139], HttpClientSendRequestTime=[0.079], 
2015-09-01 21:16:18,365 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[2621F3858DF8245B], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[117.034], HttpRequestTime=[116.494], HttpClientReceiveResponseTime=[114.81], RequestSigningTime=[0.168], ResponseProcessingTime=[0.202], HttpClientSendRequestTime=[0.1], 
2015-09-01 21:16:18,382 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 595CA0A458D41C97), S3 Extended Request ID: tP+Hh6CER+g31u6GqpWuLttrjUg2oTPCQ9SWVPsSgcD98MvI88eTqSTjIzrSYmu3], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[595CA0A458D41C97], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[16.308], HttpRequestTime=[15.715], HttpClientReceiveResponseTime=[13.752], RequestSigningTime=[0.276], HttpClientSendRequestTime=[0.164], 
2015-09-01 21:16:18,647 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[7785739C9F12EB4A], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[264.11], HttpRequestTime=[261.533], HttpClientReceiveResponseTime=[259.67], RequestSigningTime=[0.309], ResponseProcessingTime=[2.05], HttpClientSendRequestTime=[0.131], 
2015-09-01 21:16:18,674 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[1F975359BBCA55FD], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[25.921], HttpRequestTime=[25.504], HttpClientReceiveResponseTime=[23.823], RequestSigningTime=[0.238], ResponseProcessingTime=[0.003], HttpClientSendRequestTime=[0.118], 
2015-09-01 21:16:18,706 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[144CA7E763BB12C6], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[31.69], HttpRequestTime=[31.444], HttpClientReceiveResponseTime=[29.976], RequestSigningTime=[0.139], ResponseProcessingTime=[0.002], HttpClientSendRequestTime=[0.07], 
2015-09-01 21:16:18,718 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 102338387163D94E), S3 Extended Request ID: iFxuOYrjFEWmk/mCTxIa4OlgWqwAFOh3qE4YxlqkcVb3/oeVuW9usRPRS73w9CAg], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[102338387163D94E], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[11.867], HttpRequestTime=[11.606], HttpClientReceiveResponseTime=[10.146], RequestSigningTime=[0.12], HttpClientSendRequestTime=[0.072], 
2015-09-01 21:16:18,732 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 7FF86B27A748C229), S3 Extended Request ID: tgQfRHB+cLoNpNf6lEWVF3v9LwVwheh+/0Gl0Q8JuQDnV/nkZWfxo29W3ZqUB9uA], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[7FF86B27A748C229], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[13.874], HttpRequestTime=[13.622], HttpClientReceiveResponseTime=[12.153], RequestSigningTime=[0.121], HttpClientSendRequestTime=[0.055], 
2015-09-01 21:16:18,733 INFO  [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://foo-bar/tmp/test40_20/_temporary/0/task_201509012116_0005_m_000001 with recursive false
2015-09-01 21:16:18,749 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: F850C0C2262580C7), S3 Extended Request ID: Sg4K3l/Q3pd1Cyhr5V6y9pH3nDeInGIxZoJdOi6QyTrgFWggw09+HLy82lm8C6sg], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[F850C0C2262580C7], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[15.981], HttpRequestTime=[15.697], HttpClientReceiveResponseTime=[14.223], RequestSigningTime=[0.145], HttpClientSendRequestTime=[0.076], 
2015-09-01 21:16:18,784 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[33695DA390D1B8DF], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[34.601], HttpRequestTime=[32.989], HttpClientReceiveResponseTime=[31.53], RequestSigningTime=[0.126], ResponseProcessingTime=[1.354], HttpClientSendRequestTime=[0.056], 
2015-09-01 21:16:18,801 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 61A128E7DA02A7B7), S3 Extended Request ID: Qc3EqsJl/Pq/e/MnNQrW7/pgqmPZ700D4hA5sZdo/nWolKm6oq5ZYnERIEEElsOP], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[61A128E7DA02A7B7], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[16.427], HttpRequestTime=[16.181], HttpClientReceiveResponseTime=[14.718], RequestSigningTime=[0.123], HttpClientSendRequestTime=[0.072], 
2015-09-01 21:16:18,813 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: F45035D7D2C5B0C9), S3 Extended Request ID: fYLd2JtGOeI2BeltWzcpObGSQBh8VS92dedQuBSDkZVwjCUAVz4k+cv7k+bmLfGb], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[F45035D7D2C5B0C9], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[12.083], HttpRequestTime=[11.832], HttpClientReceiveResponseTime=[10.379], RequestSigningTime=[0.124], HttpClientSendRequestTime=[0.056], 
2015-09-01 21:16:18,828 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[D5899A9BA4A95E07], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[15.137], HttpRequestTime=[13.767], HttpClientReceiveResponseTime=[12.305], RequestSigningTime=[0.123], ResponseProcessingTime=[1.128], HttpClientSendRequestTime=[0.081], 
2015-09-01 21:16:18,829 INFO  [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:rename(1182)) - rename s3n://foo-bar/tmp/test40_20/_temporary/0/task_201509012116_0005_m_000001/part-00001 s3n://foo-bar/tmp/test40_20/part-00001
...skip 3400 rows and 95 sec...
2015-09-01 21:17:53,821 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[CEDEF99979579E6E], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[20.718], HttpRequestTime=[20.288], HttpClientReceiveResponseTime=[18.391], RequestSigningTime=[0.248], ResponseProcessingTime=[0.006], HttpClientSendRequestTime=[0.158], 
2015-09-01 21:17:53,846 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[204], ServiceName=[Amazon S3], AWSRequestID=[80AD0657203B53A6], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[24.782], HttpRequestTime=[24.353], HttpClientReceiveResponseTime=[22.444], RequestSigningTime=[0.236], ResponseProcessingTime=[0.006], HttpClientSendRequestTime=[0.113], 
2015-09-01 21:17:53,859 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: E271C72B2B91FAE6), S3 Extended Request ID: jRwTxrz/DSmPZTWGscxLuhBzRHL5CcXeyPfzQ/urdL0Tyki2mJrl0x3SIS/yGpC5yOzSksZUuAc=], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[E271C72B2B91FAE6], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[11.98], HttpRequestTime=[11.566], HttpClientReceiveResponseTime=[9.793], RequestSigningTime=[0.214], HttpClientSendRequestTime=[0.136], 
2015-09-01 21:17:53,870 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 156B6DC4EE7BABA6), S3 Extended Request ID: F/rPjLYwwXHcxJnpsHwHdUoMQf7diS6r0SV66AvfwQ7mv0z4jigD2RpyXYBTvSvZFODW5E1K8q4=], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[156B6DC4EE7BABA6], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[11.161], HttpRequestTime=[10.893], HttpClientReceiveResponseTime=[9.311], RequestSigningTime=[0.116], HttpClientSendRequestTime=[0.089], 
2015-09-01 21:17:53,889 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[957AFF2AEC49DB6B], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[17.906], HttpRequestTime=[15.035], HttpClientReceiveResponseTime=[13.306], RequestSigningTime=[0.151], ResponseProcessingTime=[2.521], HttpClientSendRequestTime=[0.125], 
2015-09-01 21:17:53,912 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[7CAEE08C0A6B3D2B], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[21.727], HttpRequestTime=[21.166], HttpClientReceiveResponseTime=[19.19], RequestSigningTime=[0.225], ResponseProcessingTime=[0.031], HttpClientSendRequestTime=[0.115], 
2015-09-01 21:17:53,913 INFO  [main] s3n.Jets3tNativeFileSystemStore (Jets3tNativeFileSystemStore.java:storeFile(141)) - s3.putObject foo-bar tmp/test40_20/_SUCCESS 0
2015-09-01 21:17:53,926 INFO  [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[404], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 2D8B08BCE0E24AE5), S3 Extended Request ID: f4gTZ9I05s5IzQnwvJP7QieN5eaO3SBgez5ZS9R+f70n9WWWFeTpcg7WoHPa5bf/cIB2U6hQueM=], ServiceName=[Amazon S3], AWSErrorCode=[404 Not Found], AWSRequestID=[2D8B08BCE0E24AE5], ServiceEndpoint=[https://foo-bar.s3.amazonaws.com], Exception=1, HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=20, ClientExecuteTime=[13.082], HttpRequestTime=[12.543], HttpClientReceiveResponseTime=[10.591], RequestSigningTime=[0.265], HttpClientSendRequestTime=[0.14], 

推荐答案

为了解决这个问题,我按照 Neil Jonkers 在 user@spark.apache.org 上的建议向 mapred-site.xml 添加了以下设置

To solve the problem I added the following settings to mapred-site.xml as suggested by Neil Jonkers on user@spark.apache.org

<property>
  <name>mapred.output.direct.EmrFileSystem</name>
  <value>true</value>
</property>
<property>
  <name>mapred.output.direct.NativeS3FileSystem</name>
  <value>true</value>
</property>

可以通过在 aws 命令中添加以下内容来完成

It can be done by adding the following to aws command

classification=mapred-site,properties=[mapred.output.direct.EmrFileSystem=true,mapred.output.direct.NativeS3FileSystem=true]

或者通过在配置json文件中添加以下内容

or by adding the following to configuration json file

  {
    "Classification": "mapred-site",
    "Properties": {
      "mapred.output.direct.EmrFileSystem": "true",
      "mapred.output.direct.NativeS3FileSystem": "true"
    }
  }

这篇关于spark-1.4.1 saveAsTextFile 到 S3 在 emr-4.0.0 上非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆