特定的meta标签中提取内容未使用BeautifulSoup关闭 [英] Extracting contents from specific meta tags that are not closed using BeautifulSoup

查看：704 发布时间：2016/8/5 19:02:46 python beautifulsoup

本文介绍了特定的meta标签中提取内容未使用BeautifulSoup关闭的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图分析出特定的meta标签的内容。这里的meta标签的结构。前两个封闭用反斜杠，但其余没有任何结束标记。当我拿到第3个meta标签，在℃之间的全部内容; HEAD＆GT; 标记返回。我也试过 soup.findAll（文= re.compile（'关键词'））但由于关键字是meta标签的一个属性，不返回任何东西。

 ＆LT; META NAME =CSRF-参数CONTENT =authenticity_token/＆GT;
＆LT; META NAME =CSRF令牌CONTENT =OrpXIt / y9zdAFHWzJXY2EccDi1zNSucxcCOu8 + 6Mc9c =/＆GT;
＆LT;元含量='text / html的;字符集= UTF-8HTTP-EQUIV =Content-Type的'＆GT;
＆LT;元含量='EN_USHTTP的当量='内容的语言'＆GT;
＆LT;元含量='c2y_K2CiLmGeet7GUQc9e3RVGp_gCOxUC4IdJg_RBVo的名字='谷歌定点验证'＆GT;
＆LT;元含量='初始规模= 1.0，最大规模= 1.0，宽=设备宽度的名字='视'＆GT;
＆LT;元含量=notranslateNAME ='谷歌'＆GT;
＆LT;元CONTENT =了解尤伯杯的产品，创始人，投资人及团队每个人的私人司机 - 从任何移动电话，短信，iPhone和Android应用要求汽车在几分钟之内，在一个光滑的黑色车专职司机将抵达。路边，自动记入档案中的信用卡，包括尖。名称='描述'＆GT;

这里的code：

 导入CSV
进口重
进口SYS
从BS4进口BeautifulSoup
从urllib.request里导入请求，的urlopenREQ3 =请求（https://angel.co/uber，标题= {'的User-Agent：Mozilla的/ 5.0'）
第3页=的urlopen（REQ3）.read（）
soup3 = BeautifulSoup（第3页）##这将返回整个网页，因为meta标签不关闭
DESC = soup3.findAll（ATTRS = {名：说明}）

解决方案

虽然我不能肯定它会为每个页面工作：

 从BS4进口BeautifulSoup
进口的urllib第3页=了urllib.urlopen（https://angel.co/uber）.read（）
soup3 = BeautifulSoup（第3页）DESC = soup3.findAll（ATTRS = {名：说明}）
打印说明[0] ['内容'。EN code（UTF-8）

收益率：

 了解尤伯杯的产品，创始人，投资和团队。每个人的私人直接还原铁
版本 - 要求汽车从任何移动phoneΓÇötext消息，iPhone和Android应用
秒。几分钟后，在光滑的黑色轿车专业司机将到达curbsi
德。自动记入档案中的信用卡，包括尖。

I'm trying to parse out content from specific meta tags. Here's the structure of the meta tags. The first two are closed with a backslash, but the rest don't have any closing tags. As soon as I get the 3rd meta tag, the entire contents between the <head> tags are returned. I've also tried soup.findAll(text=re.compile('keyword')) but that does not return anything since keyword is an attribute of the meta tag.

<meta name="csrf-param" content="authenticity_token"/>
<meta name="csrf-token" content="OrpXIt/y9zdAFHWzJXY2EccDi1zNSucxcCOu8+6Mc9c="/>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'>
<meta content='en_US' http-equiv='Content-Language'>
<meta content='c2y_K2CiLmGeet7GUQc9e3RVGp_gCOxUC4IdJg_RBVo' name='google-site-    verification'>
<meta content='initial-scale=1.0,maximum-scale=1.0,width=device-width' name='viewport'>
<meta content='notranslate' name='google'>
<meta content="Learn about Uber's product, founders, investors and team. Everyone's Private Driver - Request a car from any mobile phone—text message, iPhone and Android apps. Within minutes, a professional driver in a sleek black car will arrive curbside. Automatically charged to your credit card on file, tip included." name='description'>

Here's the code:

import csv
import re
import sys
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req3 = Request("https://angel.co/uber", headers={'User-Agent': 'Mozilla/5.0')
page3 = urlopen(req3).read()
soup3 = BeautifulSoup(page3)

## This returns the entire web page since the META tags are not closed
desc = soup3.findAll(attrs={"name":"description"})

解决方案

Although I'm not sure it will work for every page:

from bs4 import BeautifulSoup
import urllib

page3 = urllib.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)

desc = soup3.findAll(attrs={"name":"description"}) 
print desc[0]['content'].encode('utf-8')

Yields:

Learn about Uber's product, founders, investors and team. Everyone's Private Dri
ver - Request a car from any mobile phoneΓÇötext message, iPhone and Android app
s. Within minutes, a professional driver in a sleek black car will arrive curbsi
de. Automatically charged to your credit card on file, tip included.

这篇关于特定的meta标签中提取内容未使用BeautifulSoup关闭的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

特定的meta标签中提取内容未使用BeautifulSoup关闭 [英] Extracting contents from specific meta tags that are not closed using BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

特定的meta标签中提取内容未使用BeautifulSoup关闭 [英] Extracting contents from specific meta tags that are not closed using BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭