调用Google的自然语言API时,打开的文件过多:"/home/USER/PATH/SERVICE_ACCOUNT.json" [英] Too many open files: '/home/USER/PATH/SERVICE_ACCOUNT.json' when calling Google's Natural Language API

查看:166
本文介绍了调用Google的自然语言API时,打开的文件过多:"/home/USER/PATH/SERVICE_ACCOUNT.json"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Google Cloud自然语言进行情感分析项目API 和Python,此问题可能类似于另一个问题,我正在做的是以下事情:

  1. 从Google Cloud Storage读取CSV文件,该文件具有大约7000条记录.
  2. 将CSV转换为Pandas DataFrame.
  3. 遍历数据框并调用Natural Language API,在同一 for循环上,对数据框的其中一列执行情感分析,我提取了得分幅值,然后将这些值添加到数据框的新列中.
  4. 将结果数据帧存储回GCS.

我将我的代码放在下面,但是在此之前,我只想提到我已经用少于100条记录的示例CSV测试了它,并且效果很好,我也知道配额限制为600每分钟的请求数,这是为什么我会在每次迭代中都延迟一些时间的原因,但是,仍然出现了我在标题中指定的错误. 我也知道建议增加ulimit,但是我认为这不是一个好的解决方案.

这是我的代码:

from google.cloud import language_v1
from google.cloud.language_v1 import enums
from google.cloud import storage
from time import sleep
import pandas
import sys

pandas.options.mode.chained_assignment = None

def parse_csv_from_gcs(csv_file):
    df = pandas.read_csv(f, encoding = "ISO-8859-1")

    return df

def analyze_sentiment(text_content):
    client = language_v1.LanguageServiceClient()
    type_ = enums.Document.Type.PLAIN_TEXT
    language = 'es'
    document = {"content": text_content, "type": type_, "language": language}
    encoding_type = enums.EncodingType.UTF8
    response = client.analyze_sentiment(document, encoding_type=encoding_type)

    return response

gcs_path = sys.argv[1]
output_bucket = sys.argv[2]
output_csv_file = sys.argv[3]

dataframe = parse_csv_from_gcs(gcs_path)

for i in dataframe.index:
    print(i)
    response = analyze_sentiment(dataframe.at[i, 'FieldOfInterest'])
    dataframe.at[i, 'Score'] = response.document_sentiment.score
    dataframe.at[i, 'Magnitude'] = response.document_sentiment.magnitude
    sleep(0.5)

print(dataframe)
dataframe.to_csv("results.csv", encoding = 'ISO-8859-1')

gcs = storage.Client()
gcs.get_bucket(output_bucket).blob(output_csv_file).upload_from_filename('results.csv', content_type='text/csv')

"analyze_sentiment"功能与 Google的文档非常相似,我只是对其进行了一点修改,但是它的功能几乎相同.

现在,该程序正在引发该错误并在达到550至700之间的记录时崩溃,但是我看不到服务帐户JSON与调用自然语言API之间的相关性,所以我也认为当我调用该API,它将打开帐户凭据JSON文件,但此后不会将其关闭.

我目前对此问题一无所获,没有足够的想法,因此任何帮助将不胜感激,在此先感谢=)!

[更新]

我已经通过从" analyze_sentiment "方法中提取" client "并将其作为参数传递来解决了此问题,如下所示:

def analyze_sentiment(ext_content, client):
    <Code>    

每次到达此行时就像:

client = language_v1.languageServiceClient()

它会打开帐户凭据JSON文件,但不会关闭, 因此将其提取到全局变量中就可以完成这项工作=).

解决方案

我已经用解决方案更新了原始帖子,但无论如何,感谢所有看到此问题并试图回复=)的人!

I'm working on a Sentiment Analysis project using the Google Cloud Natural Language API and Python, this question might be similar to this other question, what I'm doing is the following:

  1. Reads a CSV file from Google Cloud Storage, file has approximately 7000 records.
  2. Converts the CSV into a Pandas DataFrame.
  3. Iterates over the dataframe and calls the Natural Language API to perform sentiment analysis on one of the dataframe's columns, on the same for loop I extract the score and magnitude from the result and add those values to a new column on the dataframe.
  4. Store the result dataframe back to GCS.

I'll put my code below, but prior to that, I just want to mention that I have tested it with a sample CSV with less than 100 records and it works well, I am also aware about the quota limit of 600 requests per minute, reason why I put a delay on each iteration, still, I'm getting the error I specify at the title. I'm also aware about the suggestion of increasing the ulimit, but I don't think that's a good solution.

Here's my code:

from google.cloud import language_v1
from google.cloud.language_v1 import enums
from google.cloud import storage
from time import sleep
import pandas
import sys

pandas.options.mode.chained_assignment = None

def parse_csv_from_gcs(csv_file):
    df = pandas.read_csv(f, encoding = "ISO-8859-1")

    return df

def analyze_sentiment(text_content):
    client = language_v1.LanguageServiceClient()
    type_ = enums.Document.Type.PLAIN_TEXT
    language = 'es'
    document = {"content": text_content, "type": type_, "language": language}
    encoding_type = enums.EncodingType.UTF8
    response = client.analyze_sentiment(document, encoding_type=encoding_type)

    return response

gcs_path = sys.argv[1]
output_bucket = sys.argv[2]
output_csv_file = sys.argv[3]

dataframe = parse_csv_from_gcs(gcs_path)

for i in dataframe.index:
    print(i)
    response = analyze_sentiment(dataframe.at[i, 'FieldOfInterest'])
    dataframe.at[i, 'Score'] = response.document_sentiment.score
    dataframe.at[i, 'Magnitude'] = response.document_sentiment.magnitude
    sleep(0.5)

print(dataframe)
dataframe.to_csv("results.csv", encoding = 'ISO-8859-1')

gcs = storage.Client()
gcs.get_bucket(output_bucket).blob(output_csv_file).upload_from_filename('results.csv', content_type='text/csv')

The 'analyze_sentiment' function is very similar to what we have in Google's documentation, I just modified it a little, but it does pretty much the same thing.

Now, the program is raising that error and crashes when it reaches a record between 550 and 700, but I don't see the correlation between the service account JSON and calling the Natural Language API, so I also think that when I call the the API, it opens the account credential JSON file but doesn't close it afterwards.

I'm currently stuck with this issue and ran out of ideas, so any help will be much appreciated, thanks in advance =)!

[UPDATE]

I've solved this issue by extracting the 'client' out of the 'analyze_sentiment' method and passing it as a parameter, as follows:

def analyze_sentiment(ext_content, client):
    <Code>    

Looks like every time it reaches this line:

client = language_v1.languageServiceClient()

It opens the account credential JSON file and it doesn't get closed, so extracting it to a global variable made this work =).

解决方案

I've updated the original post with the solution for this, but in any case, thanks to everyone that saw this and tried to reply =)!

这篇关于调用Google的自然语言API时,打开的文件过多:"/home/USER/PATH/SERVICE_ACCOUNT.json"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆