如何用美丽的汤要找到特定的视频html标记? [英] How to find specific video html tag using beautiful soup?

查看:279
本文介绍了如何用美丽的汤要找到特定的视频html标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有谁知道如何在Python中使用beautifulsoup。

Does anyone know how to use beautifulsoup in python.

我有不同的URL列表这个搜索引擎。

I have this search engine with a list of different urls.

我希望得到一个仅包含视频嵌入URL的HTML标记。和获取链接。

I want to get only the html tag containing a video embed url. and get the link.

例如

import BeautifulSoup

html = '''https://archive.org/details/20070519_detroit2'''
    #or this.. html = '''http://www.kumby.com/avatar-the-last-airbender-book-3-chapter-5/'''
    #or this... html = '''https://www.youtube.com/watch?v=fI3zBtE_S_k'''

soup = BeautifulSoup.BeautifulSoup(html)

我应该怎么做下一步。获取视频的HTML标记,或物体或视频的确切联系。

what should I do next . to get the html tag of video, or object or the exact link of the video..

我需要把它放在我的iframe中。我将蟒蛇集成到我的PHP。因此让视频的链接,并使用Python然后我会响应它在我的iframe输出它。

I need it to put it on my iframe. i will integrate the python to my php. so getting the link of the video and outputting it using the python then i will echo it on my iframe.

推荐答案

您需要获得页面的HTML不仅仅是网址

You need to get the html of the page not just the url

使用内置的lib目录的urllib 是这样的:

use the built-in lib urllib like this:

import urllib
from bs4 import BeautifulSoup as BS

url = '''https://archive.org/details/20070519_detroit2'''
#open and read page
page = urllib.urlopen(url)
html = page.read()
#create BeautifulSoup parse-able "soup"
soup = BS(html)
#get the src attribute from the video tag
video = soup.find("video").get("src")

还与您正在使用的网站我注意到,获得嵌入链接,只需更改链接详细信息嵌入,所以它看起来是这样的:

also with the site you are using i noticed that to get the embed link just change details in the link to embed so it looks like this:

https://archive.org/embed/20070519_detroit2

因此​​,如果你想把它做多个网址,而不必解析只是做这样的事情:

so if you want to do it to multiple urls without having to parse just do something like this:

url = '''https://archive.org/details/20070519_detroit2'''
spl = url.split('/')
spl[3] = 'embed'
embed = "/".join(spl)
print embed


修改

要获得嵌入链接,你需要通过你的解析页面的HTML看你在编辑提供的其他环节,看看,直到你FINT然后链接获取代码它在那么属性

to get the embed link for the other links you provided in your edit you need to look through the html of the page you are parsing, look until you fint the link then get the tag its in then the attribute

'''http://www.kumby.com/avatar-the-last-airbender-book-3-chapter-5/'''

只是做

soup.find("iframe").get("src")

IFRAME 监守的链接是在iframe标记和获得(SRC),因为链接src属性

the iframe becuase the link is in the iframe tag and the .get("src") because the link is the src attribute

您可以尝试下一个,因为你应该学会如何做到这一点,如果你希望能够做到这一点在未来的:)

You can try the next one because you should learn how to do it if you want to be able to do it in the future :)

祝你好运!

这篇关于如何用美丽的汤要找到特定的视频html标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆