由于无法读取hadoop用户`File'/var/aws/emr/userData.json',将ssh插入到粘合dev-endpoint中 [英] ssh into glue dev-endpoint as hadoop user `File '/var/aws/emr/userData.json' cannot be read`

查看:163
本文介绍了由于无法读取hadoop用户`File'/var/aws/emr/userData.json',将ssh插入到粘合dev-endpoint中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我在此问题="https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-pycharm.html" rel ="nofollow noreferrer">此教程.

java.io.IOException: File '/var/aws/emr/userData.json' cannot be read

以上文件归hadoop所有.

The above file is owned by hadoop.

[glue@ip-xx.xx.xx.xx ~]$ ls -la /var/aws/emr/
total 32
drwxr-xr-x 4 root   root    4096 Mar 24 19:35 .
drwxr-xr-x 3 root   root    4096 Feb 12  2019 ..
drwxr-xr-x 3 root   root    4096 Feb 12  2019 bigtop-deploy
drwxr-xr-x 3 root   root    4096 Mar 24 19:35 packages
-rw-r--r-- 1 root   root    1713 Feb 12  2019 repoPublicKey.txt
-r--r----- 1 hadoop hadoop 10221 Mar 24 19:34 userData.json

而且我无法按照Eric的建议在此处更改其权限.我使用我的公钥进入我的dev端点.

And I am not able to change its permission as suggested by Eric here. I ssh into my dev endpoint using my public key.

ssh -i ~/.ssh/<my_private_key> glue@ec2-xx.xx.xx.xx.eu-west-1.compute.amazonaws.com

,并且不能将用户更改为hadoop sudo -su hadoop,因为它要求我提供我不知道[sudo] password for glue:root密码.我也无法使用hadoop用户(而不是root(glue))进入ssh端点,它说权限被拒绝(公钥).我的问题是...我到底怎么知道dev-endpoint的root用户(胶水)密码?在创建开发端点时,从来没有要求我进行任何设置.或者,我如何通过Hadoop用户进入ssh开发端点?

and cannot change the user to hadoop sudo -su hadoop because it asks me for root password which I don't know [sudo] password for glue:. Neither I can ssh into the endpoint using hadoop user (instead of root(glue)), it says permission denied (publickey). My question is ... How on earth I would know the root user (glue) password of dev-endpoint ? I was never asked to setup any while creating the dev-endpoint. Or how can I ssh into dev-endpoint via Hadoop user ?

推荐答案

所以这不是实际的问题.得到了AWS团队的审查,他们说您在通过PyCharm在EMR上运行Spark脚本时会收到这些垃圾警告和错误,但这并不影响脚本的实际任务.原来是我正在创建的dataFrame;

So this wasn't the actual problem. Got review from AWS team and they said you'll get these rubbish warnings and errors while running spark scripts on EMR via PyCharm, but that shouldn't affect the actual task of your script. Turned out that the dataFrame that I was creating;

persons_DyF = glueContext.create_dynamic_frame.from_catalog(database="database", table_name="table")

当我执行persons_DyF.printSchema()时,

没有显示任何模式.而我很确定我定义了该表模式.它仅输出rootpersons_DyF.count() = 0.所以我改用pySpark

was not showing me any schema when I did persons_DyF.printSchema(). Whereas I am pretty sure I defined that table schema. It just outputs root and persons_DyF.count() = 0. So I'd to use pySpark instead

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
df = spark.read.table("ingestion.login_emr_testing")
print(df.printSchema())
df.select(df["feed"], df["timestamp_utc"], df['date'], df['hour']).show()

给我以下结果;

.
.
allot of rubbish errors and warning including `java.io.IOException: File '/var/aws/emr/userData.json' cannot be read`
.
.
+------+--------------------+----------+----+
| feed |       timestamp_utc|      date|hour|
+------+--------------------+----------+----+
|TWEAKS|19-Mar-2020 18:59...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 18:59...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 18:59...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
|TWEAKS|19-Mar-2020 19:00...|2020-03-19|  19|
+-----+--------------------+----------+----+

这篇关于由于无法读取hadoop用户`File'/var/aws/emr/userData.json',将ssh插入到粘合dev-endpoint中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆