将两个浮点数写入常规txt的正确方法 [英] Correct way of writing two floats into a regular txt

查看:243
本文介绍了将两个浮点数写入常规txt的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以群集模式运行一项大工作。但是,我只对两个浮点数感兴趣,我想以某种方式阅读,当工作成功时。

I am running a big job, in cluster mode. However, I am only interested in two floats numbers, which I want to read somehow, when the job succeeds.

这里我正在尝试:

from pyspark.context import SparkContext

if __name__ == "__main__":
    sc = SparkContext(appName='foo')

    f = open('foo.txt', 'w')
    pi = 3.14
    not_pi = 2.79 
    f.write(str(pi) + "\n")
    f.write(str(not_pi) + "\n")
    f.close()

    sc.stop()

然而,'foo.txt'似乎没有写在任何地方(可能是写在一个执行者,或某事)。我试过'/homes/gsamaras/foo.txt',这将是网关的 pwd 。但是,它说:没有这样的文件或目录:'/ home / gsamaras / myfile.txt'

However, 'foo.txt' doesn't appear to be written anywhere (probably it gets written in an executor, or something). I tried '/homes/gsamaras/foo.txt', which would be the pwd of the gateway. However, it says: No such file or directory: '/homes/gsamaras/myfile.txt'.

怎么做?

import os, sys
import socket
print "Current working dir : %s" % os.getcwd()
print(socket.gethostname())

表明驱动程序实际上是群集的一个节点,这就是我在网关中看不到该文件的原因。

suggest that the driver is actually a node of the cluster, that's why I don't see the file in my gateway.

也许以某种方式在HDFS中写文件?

Maybe write the file in the HDFS somehow?

这也不起作用:

Traceback (most recent call last):
  File "computeCostAndUnbalancedFactorkMeans.py", line 15, in <module>
    f = open('hdfs://myfile.txt','w')
IOError: [Errno 2] No such file or directory: 'hdfs://myfile.txt'


推荐答案

乍一看没什么特别的(你应该有上下文管理器)像这样而不是手动关闭,但这不是重点你的代码。如果将此脚本传递给 spark-submit ,则文件将写入驱动程序代码的本地目录。

At the first glance there is nothing particularly (you should context manager in case like this instead of manually closing but it is not the point) wrong with your code. If this script is passed to spark-submit file will be written to the directory local to the driver code.

如果以群集模式提交代码,它将是群集中的任意工作节点。如果您有疑问,可以随时记录 os.getcwd() socket.gethostname()来弄清楚使用哪台机器以及工作目录是什么。

If you submit your code in the cluster mode it will be an arbitrary worker node in your cluster. If you're in doubt you can always log os.getcwd() and socket.gethostname() to figure out which machine is used and what is the working directory.

最后,您无法使用标准Python IO工具写入HDFS。有一些工具可以实现,包括原生 dask / hdfs3

Finally you cannot use standard Python IO tools to write to HDFS. There a few tools which can achieve that including native dask/hdfs3.

这篇关于将两个浮点数写入常规txt的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆