在蟒蛇中处理静音错误? [英] Silent erroer handling in python?

查看:168
本文介绍了在蟒蛇中处理静音错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了很多网址的csv文件。为方便起见,我将其读入大熊猫数据框。我需要稍后做一些统计工作,而熊猫只是方便。看起来有点像这样:

 导入熊猫为pd 
csv = [{URL:www。 mercedes-benz.de,electric:1},{URL:www.audi.de,electric:0},{URL:ww.audo.e,electric :0},{URL:NaN,electric:0}]
df = pd.DataFrame(csv)

我的任务是检查网站是否包含某些字符串,并添加一个额外的列,如果是这样,否则为0.例如:我想检查, www.mercedes-benz.de 包含字符串 car 。我执行以下操作:

  for i,row in df.iterrows():
page_content = requests.get row ['URL'])
如果page_content.text中的car:
df.loc [i,'car'] ='1'
else:
df。 loc [i,'car'] ='0'

问题是:有时URL错误/失踪。我的小脚本导致错误。



如果URL错误/丢失,我该如何处理/抑制错误?而且,我如何可以在这些情况下,请使用 df.loc [i,'url_wrong'] ='1'来表示网址错误/丢失?

解决方案

尝试定义一个执行car检查的函数,并使用 .apply 方法的大熊猫系列以获取您的 1 0 网址错误。以下内容应该有助于:

 导入熊猫为pd 
导入请求


data = [{URL:https://www.mercedes-benz.de,electric:1},
{URL:https://www.audi.de ,电:0},
{URL:https://ww.audo.e,电:0},
{URL:NaN电气:0}]


def contains_car(link):
try:
return int('car'in requests.get(link)
除了
返回错误/缺少URL


df = pd.DataFrame(data)

df ['extra_column' ] = df.URLs.apply(contains_car)


#URL electric extra_column
#0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
#2 https://ww.audo.e 0错误/缺少URL
#3 NaN 0错误/缺少URL



< h3>编辑:

您可以从HTTP请求中搜索返回的文本中的多个关键字。根据您设置的条件,可以使用内置函数 any 或内置函数 all 完成此操作。使用任何意味着找到任何关键字应返回1,而使用所有意味着所有关键字必须匹配以返回1.在以下示例中,我使用任何与关键字,如汽车,汽车,车辆:

  import pandas as pd 
import requests


data = [{URL: https://www.mercedes-benz.de,electric:1},
{URL:https://www.audi.de,electric:0},
{URL:https://ww.audo.e,electric:0},
{URL:NaN,electric:0}]


def contains_keywords(link,keywords):
try:
output = requests.get(link).text
return int(any(x in output for x在关键字中))
除了:
返回错误/缺少URL


df = pd.DataFrame(数据)
mykeywords =('car ','车','汽车')
df ['extra_column'] = df.URLs.app (lambda l:contains_keywords(l,mykeywords))

应该产生:

 #URL electric extra_column 
#0 https://www.mercedes-benz.de 1 1
#1 https:/ /www.audi.de 0 1
#2 https://ww.audo.e 0错误/缺少URL
#3 NaN 0错误/缺少URL

我希望这有帮助。


I got csv-file with numerous URLs. I read it into a pandas dataframe for convenience. I need to do some statistical work later - and pandas is just handy. It looks a little like this:

import pandas as pd
csv = [{"URLs" : "www.mercedes-benz.de", "electric" : 1}, {"URLs" : "www.audi.de", "electric" : 0}, {"URLs" : "ww.audo.e", "electric" : 0}, {"URLs" : "NaN", "electric" : 0}]
df = pd.DataFrame(csv)

My task is to check if the websites contain certain strings and to add an extra column with 1 if so, and else 0. For example: I want to check, wether www.mercedes-benz.de contains the string car. I do the following:

for i, row in df.iterrows():
    page_content = requests.get(row['URLs'])
    if "car" in page_content.text:
        df.loc[i, 'car'] = '1'
    else:
        df.loc[i, 'car'] = '0' 

The problem is: sometimes the URL is wrong/missing. My little script results in a error.

How can I handle/supress the error if the URL is wrong/missing? And, how can I e.g. use df.loc[i, 'url_wrong'] = '1' in these cases to indicate that the URL is wrong/missing?

解决方案

Try defining a function that does the "car" checking first and the use the .apply method of a pandas Series to get your 1, 0 or Wrong URL. The following should help:

import pandas as pd
import requests


data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
        {"URLs" : "https://www.audi.de", "electric" : 0}, 
        {"URLs" : "https://ww.audo.e", "electric" : 0}, 
        {"URLs" : "NaN", "electric" : 0}]


def contains_car(link):
    try:
        return int('car' in requests.get(link).text)
    except:
        return "Wrong/Missing URL"


df = pd.DataFrame(data)

df['extra_column'] = df.URLs.apply(contains_car)


#                           URLs  electric extra_column
# 0  https://www.mercedes-benz.de         1            1
# 1           https://www.audi.de         0            1
# 2             https://ww.audo.e         0    Wrong/Missing URL
# 3                           NaN         0    Wrong/Missing URL

Edit:

You can search for more than just one keyword in the returned text from your HTTP request. Depending on the condition you set up, this can be done with either the builtin function any or the builtin function all. Using any means that finding any of the keywords should return 1, while using all means that all the keywords have to be matched in order to return 1. In the following example, I am using any with keywords such as 'car', 'automobile', 'vehicle':

import pandas as pd
import requests


data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
        {"URLs" : "https://www.audi.de", "electric" : 0}, 
        {"URLs" : "https://ww.audo.e", "electric" : 0}, 
        {"URLs" : "NaN", "electric" : 0}]


def contains_keywords(link, keywords):
    try:
        output = requests.get(link).text
        return int(any(x in output for x in keywords))
    except:
        return "Wrong/Missing URL"


df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
df['extra_column'] = df.URLs.apply(lambda l: contains_keywords(l, mykeywords))

Should yield:

#                            URLs  electric       extra_column
# 0  https://www.mercedes-benz.de         1                  1
# 1           https://www.audi.de         0                  1
# 2             https://ww.audo.e         0  Wrong/Missing URL
# 3                           NaN         0  Wrong/Missing URL

I hope this helps.

这篇关于在蟒蛇中处理静音错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆