爬虫只加载一个标题 [英] Crawler only loads one title

查看：50 发布时间：2021/6/26 20:40:01 python python-3.x python-3.3

本文介绍了爬虫只加载一个标题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在这里做了一些问题，其中一个人给了我这个代码.但我需要帮助，因为它只带来了我的网站的一个结果.txt

i did some questions in here and them one guy gave me this code. But I need help because it is only bringing one result of my websites.txt

爬虫.py

import urllib.request
import re

regex = "<title>(.+?)</title>"
pattern = re.compile(regex)
txtfl = open('websites.txt')
webpgsinfile = txtfl.readlines()
urls = webpgsinfile
htmlfile = urllib.request.urlopen(urls[i])
htmltext = htmlfile.read().decode('utf8')
titles = re.findall(pattern,htmltext)

if len(titles) > 0:
    print(titles[0])
    i+=1

网站.txt

http://youtube.com
http://bigsolutions.com.br

推荐答案

import re
from urllib.request import urlopen

def get_page(url, encoding='utf-8'):
    return urlopen(url).read().decode(encoding, errors='ignore')

def get_title(txt, reg=re.compile('<title>(.*)</title>', re.IGNORECASE | re.DOTALL)):
    match = reg.search(txt)
    if match is None:
        return ''
    else:
        return match.group(1).strip()

def main():
    with open('websites.txt') as inf:
        urls = [line.strip() for line in inf]
    titles = [get_title(get_page(url)) for url in urls if url]
    print(titles)

if __name__=="__main__":
    main()

结果

["LimeCD - Lime's Code Library", 'YouTube', 'Big Solutions - Aqui nós pensamos grande!']

这篇关于爬虫只加载一个标题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

爬虫只加载一个标题 [英] Crawler only loads one title

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

爬虫只加载一个标题 [英] Crawler only loads one title

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭