如何修复python中的正则表达式引发的这种类型错误? [英] How to fix this type error thrown by regular expression in python?

查看:32
本文介绍了如何修复python中的正则表达式引发的这种类型错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Python收集Requests库的所有内部链接,并过滤掉所有外部链接.

I am trying to collect all the internal links of Requests library for python and filter out all the external links.

我正在使用正则表达式执行相同的操作.但是它引发了我无法解决的此类型错误.

I am using regular expression to do the same. But it is throwing this type error that I am unable to solve.

我的代码:

import requests
from bs4 import BeautifulSoup
import re

r = requests.get('https://2.python-requests.org/en/master/')
content = BeautifulSoup(r.text)
[i['href'] for i in content.find_all('a') if not re.match("http", i)]

错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-10-b7d82067fe9c> in <module>
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]

<ipython-input-10-b7d82067fe9c> in <listcomp>(.0)
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]

~\Anaconda3\lib\re.py in match(pattern, string, flags)
    171     """Try to apply the pattern at the start of the string, returning
    172     a Match object, or None if no match was found."""
--> 173     return _compile(pattern, flags).match(string)
    174 
    175 def fullmatch(pattern, string, flags=0):

TypeError: expected string or bytes-like object

推荐答案

您正在向其传递BeautifulSoup节点对象,而不是字符串.试试这个:

You are passing it a BeautifulSoup node object not a string. Try this:

[i['href'] for i in content.find_all('a') if not re.match("http", i['href'])]

这篇关于如何修复python中的正则表达式引发的这种类型错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆