正则表达式在bs4中不起作用 [英] regex not working in bs4

查看：86 发布时间：2020/9/20 7:51:29 python regex urllib2 bs4

本文介绍了正则表达式在bs4中不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从watchseriesfree.to网站上的特定Filehoster中提取一些链接.在以下情况下，我需要快速视频链接，因此我使用正则表达式过滤掉包含快速视频文本的标签

I am trying to extract some links from a specific filehoster on watchseriesfree.to website. In the following case I want rapidvideo links, so I use regex to filter out those tags with text containing rapidvideo

import re
import urllib2
from bs4 import BeautifulSoup

def gethtml(link):
    req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"})
    con = urllib2.urlopen(req)
    html = con.read()
    return html


def findLatest():
    url = "https://watchseriesfree.to/serie/Madam-Secretary"
    head = "https://watchseriesfree.to"

    soup = BeautifulSoup(gethtml(url), 'html.parser')
    latep = soup.find("a", title=re.compile('Latest Episode'))

    soup = BeautifulSoup(gethtml(head + latep['href']), 'html.parser')
    firstVod = soup.findAll("tr",text=re.compile('rapidvideo'))

    return firstVod

print(findLatest())

但是，上面的代码返回一个空白列表.我在做什么错了?

However, the above code returns a blank list. What am I doing wrong?

推荐答案

问题在这里:

firstVod = soup.findAll("tr",text=re.compile('rapidvideo'))

当BeautifulSoup将应用文本正则表达式模式时，它将使用所有匹配的tr元素的c1>属性值.现在，.string有一个重要的警告-当元素具有多个子元素时，.string是None :

When BeautifulSoup will apply your text regex pattern, it would use .string attribute values of all the matched tr elements. Now, the .string has this important caveat - when an element has multiple children, .string is None:

如果标签包含多个内容，则不清楚.string应该指的是什么，因此.string被定义为None.

If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None.

因此，您没有结果.

您可以做的是使用tr元素的实际文本. rel ="noreferrer">搜索函数并调用.get_text():

What you can do is to check the actual text of the tr elements by using a searching function and calling .get_text():

soup.find_all(lambda tag: tag.name == 'tr' and 'rapidvideo' in tag.get_text())

这篇关于正则表达式在bs4中不起作用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式在bs4中不起作用 [英] regex not working in bs4

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

正则表达式在bs4中不起作用 [英] regex not working in bs4

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭