使用BeautifulSoup和Python获取元标记内容属性 [英] Get meta tag content property with BeautifulSoup and Python

查看:83
本文介绍了使用BeautifulSoup和Python获取元标记内容属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python和漂亮的汤来提取以下标签的内容部分:

I am trying to use python and beautiful soup to extract the content part of the tags below:

<meta property="og:title" content="Super Fun Event 1" />
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

我正在使用BeautifulSoup很好地加载页面并查找其他内容(这也从隐藏在源代码中的id标记中获取了文章ID),但是我不知道搜索html并找到正确方法这些位,我已经尝试过find和findAll的变体,但无济于事.目前,代码会在一系列网址上进行迭代...

I'm getting BeautifulSoup to load the page just fine and find other stuff (this also grabs the article id from the id tag hidden in the source), but I don't know the correct way to search the html and find these bits, I've tried variations of find and findAll to no avail. The code iterates over a list of urls at present...

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#importing the libraries
from urllib import urlopen
from bs4 import BeautifulSoup

def get_data(page_no):
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()
    soup = BeautifulSoup(webpage, "lxml")
    for tag in soup.find_all("article") :
        id = tag.get('id')
        print id
# the hard part that doesn't work - I know this example is well off the mark!        
    title = soup.find("og:title", "content")
    print (title.get_text())
    url = soup.find("og:url", "content")
    print (url.get_text())
# end of problem

for i in range (1,100):
    get_data(i)

如果有人可以帮助我找到og:title和og:content的话,那就太棒了!

If anyone can help me sort the bit to find the og:title and og:content that'd be fantastic!

推荐答案

提供meta标记名称作为find()的第一个参数.然后,使用关键字参数检查特定属性:

Provide the meta tag name as the first argument to find(). Then, use keyword arguments to check the specific attributes:

title = soup.find("meta",  property="og:title")
url = soup.find("meta",  property="og:url")

print(title["content"] if title else "No meta title given")
print(url["content"] if url else "No meta url given")

如果您知道title和url元属性将始终存在,则if/else此处的检查将是可选的.

The if/else checks here would be optional if you know that the title and url meta properties would always be present.

这篇关于使用BeautifulSoup和Python获取元标记内容属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆