在蟒蛇中处理静音错误? [英] Silent erroer handling in python?
问题描述
导入熊猫为pd
csv = [{URL:www。 mercedes-benz.de,electric:1},{URL:www.audi.de,electric:0},{URL:ww.audo.e,electric :0},{URL:NaN,electric:0}]
df = pd.DataFrame(csv)
我的任务是检查网站是否包含某些字符串,并添加一个额外的列,如果是这样,否则为0.例如:我想检查, www.mercedes-benz.de
包含字符串 car
。我执行以下操作:
for i,row in df.iterrows():
page_content = requests.get row ['URL'])
如果page_content.text中的car:
df.loc [i,'car'] ='1'
else:
df。 loc [i,'car'] ='0'
问题是:有时URL错误/失踪。我的小脚本导致错误。
如果URL错误/丢失,我该如何处理/抑制错误?而且,我如何可以在这些情况下,请使用 df.loc [i,'url_wrong'] ='1'
来表示网址错误/丢失?
尝试定义一个执行car检查的函数,并使用 .apply
方法的大熊猫系列
以获取您的 1
, 0
或网址错误
。以下内容应该有助于:
导入熊猫为pd
导入请求
data = [{URL:https://www.mercedes-benz.de,electric:1},
{URL:https://www.audi.de ,电:0},
{URL:https://ww.audo.e,电:0},
{URL:NaN电气:0}]
def contains_car(link):
try:
return int('car'in requests.get(link)
除了
返回错误/缺少URL
df = pd.DataFrame(data)
df ['extra_column' ] = df.URLs.apply(contains_car)
#URL electric extra_column
#0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
#2 https://ww.audo.e 0错误/缺少URL
#3 NaN 0错误/缺少URL
< h3>编辑:
您可以从HTTP请求中搜索返回的文本中的多个关键字。根据您设置的条件,可以使用内置函数 any
或内置函数 all
完成此操作。使用任何
意味着找到任何关键字应返回1,而使用所有
意味着所有关键字必须匹配以返回1.在以下示例中,我使用任何
与关键字,如汽车,汽车,车辆:
import pandas as pd
import requests
data = [{URL: https://www.mercedes-benz.de,electric:1},
{URL:https://www.audi.de,electric:0},
{URL:https://ww.audo.e,electric:0},
{URL:NaN,electric:0}]
def contains_keywords(link,keywords):
try:
output = requests.get(link).text
return int(any(x in output for x在关键字中))
除了:
返回错误/缺少URL
df = pd.DataFrame(数据)
mykeywords =('car ','车','汽车')
df ['extra_column'] = df.URLs.app (lambda l:contains_keywords(l,mykeywords))
应该产生:
#URL electric extra_column
#0 https://www.mercedes-benz.de 1 1
#1 https:/ /www.audi.de 0 1
#2 https://ww.audo.e 0错误/缺少URL
#3 NaN 0错误/缺少URL
我希望这有帮助。
I got csv-file with numerous URLs. I read it into a pandas dataframe for convenience. I need to do some statistical work later - and pandas is just handy. It looks a little like this:
import pandas as pd
csv = [{"URLs" : "www.mercedes-benz.de", "electric" : 1}, {"URLs" : "www.audi.de", "electric" : 0}, {"URLs" : "ww.audo.e", "electric" : 0}, {"URLs" : "NaN", "electric" : 0}]
df = pd.DataFrame(csv)
My task is to check if the websites contain certain strings and to add an extra column with 1 if so, and else 0. For example: I want to check, wether www.mercedes-benz.de
contains the string car
. I do the following:
for i, row in df.iterrows():
page_content = requests.get(row['URLs'])
if "car" in page_content.text:
df.loc[i, 'car'] = '1'
else:
df.loc[i, 'car'] = '0'
The problem is: sometimes the URL is wrong/missing. My little script results in a error.
How can I handle/supress the error if the URL is wrong/missing? And, how can I e.g. use df.loc[i, 'url_wrong'] = '1'
in these cases to indicate that the URL is wrong/missing?
Try defining a function that does the "car" checking first and the use the .apply
method of a pandas Series
to get your 1
, 0
or Wrong URL
. The following should help:
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
def contains_car(link):
try:
return int('car' in requests.get(link).text)
except:
return "Wrong/Missing URL"
df = pd.DataFrame(data)
df['extra_column'] = df.URLs.apply(contains_car)
# URLs electric extra_column
# 0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
# 2 https://ww.audo.e 0 Wrong/Missing URL
# 3 NaN 0 Wrong/Missing URL
Edit:
You can search for more than just one keyword in the returned text from your HTTP request. Depending on the condition you set up, this can be done with either the builtin function any
or the builtin function all
. Using any
means that finding any of the keywords should return 1, while using all
means that all the keywords have to be matched in order to return 1. In the following example, I am using any
with keywords such as 'car', 'automobile', 'vehicle':
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.e", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
def contains_keywords(link, keywords):
try:
output = requests.get(link).text
return int(any(x in output for x in keywords))
except:
return "Wrong/Missing URL"
df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
df['extra_column'] = df.URLs.apply(lambda l: contains_keywords(l, mykeywords))
Should yield:
# URLs electric extra_column
# 0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
# 2 https://ww.audo.e 0 Wrong/Missing URL
# 3 NaN 0 Wrong/Missing URL
I hope this helps.
这篇关于在蟒蛇中处理静音错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!