使用Python UDF访问外部文件 [英] Accessing external file in Python UDF
问题描述
我正在使用配置单元和python udf。我定义了一个sql文件,我在其中添加了python udf,并将其称为它。到目前为止这么好,我可以使用我的python函数处理我的查询结果。
但是,在这个时候,我必须在我的python udf中使用一个外部的.txt文件。我将该文件上传到我的集群(与.sql和.py文件相同的目录),并使用以下命令将其添加到我的.sql文件中:
I am using hive and a python udf. I defined a sql file in which I added the python udf and I call it. So far so good and I can process on my query results using my python function. However, at this point of time, I have to use an external .txt file in my python udf. I uploaded that file into my cluster (the same directory as .sql and .py file) and I also added that in my .sql file using this command:
ADD FILE /home/ra/stopWords.txt;
当我在我的python udf中调用这个文件时,这样做:
When I call this file in my python udf as this:
file = open("/home/ra/stopWords.txt", "r")
我有几个错误。我无法弄清楚如何添加嵌套文件并在配置单元中使用它们。
I got several errors. I cannot figure out how to add nested files and using them in hive.
有什么想法?
any idea?
推荐答案
所有添加的文件都位于UDF脚本的当前工作目录( ./
)中。
All added files are located in the current working directory (./
) of UDF script.
如果使用添加文件/dir1/dir2/dir3/myfile.txt
添加单个文件,则其路径将为 p>
If you add a single file using ADD FILE /dir1/dir2/dir3/myfile.txt
, its path will be
./myfile.txt
如果使用添加文件/ dir1 / dir2
添加目录,则文件的路径将为
If you add a directory using ADD FILE /dir1/dir2
, the file's path will be
./dir2/dir3/myfile.txt
这篇关于使用Python UDF访问外部文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!