显示图像和字幕在蟒蛇2列 [英] Display images and titles in 2 column in python
问题描述
我刮所有标题和图像源链接到一个文本文件,然后用数据从文本文件输出一个html文件,2列,一个用于图像,另一个用于titles.How显示可点击图像,画面标题并在2列格式的图像?这里是我有
从BS4进口BeautifulSoup标题= []
图像= []
HREF = []R =了urllib.urlopen('https://www.open2study.com/courses').read()
汤= BeautifulSoup(R)因为我在soup.find_all('格',{阶级:courses_adblock_rollover}):
titles.append(i.h2.text)因为我在soup.find_all('IMG',{阶级:形象的风格,当然,徽标主题块}):
images.append(i.get(SRC))用开放('的test.txt',W)为f:
对于ZIP我(标题,图像):
f.write(I [0] .EN code('ASCII码','无视')+'\\ N'
+我[1] .EN code('ASCII码','无视')+
的'\\ n \\ n)标题='<!doctyle HTML>< HTML>< HEAD><标题>我的网页< /标题>< /头><身体GT;'
身体='<表>< TR>< TD>< / TD>< TD>< / TD>< / TR>'页脚='< /表>< /身体GT;< / HTML>'
用开放('test.txt的','R')作为输入,开放('的test.html','W')作为输出:
output.write(头)
output.write(体) 在输入线:
#ignore空行
如果行=='\\ n'的
继续 COL1 = line.rstrip()
#阅读下一行
COL2 =下一个(输入).rstrip()
output.write('< TR>< TD> {}< / TD>< TD>< IMG SRC ={}的风格=宽度:160像素,高度:100像素>< / TD&GT ;< / TR> \\ n \\ n'.format(COL1,COL2))
output.write(页脚)
我觉得你做你自己该刮的事情真的很难。它更容易开始与最大的元素第一,即全程格,然后拉出它以后的信息。
这code让你在第一列和第二列的课程标题可点击的图像。
从BS4进口BeautifulSoup
进口的urllibBASE_URL ='https://www.open2study.com
R =了urllib.urlopen(BASE_URL +'/课程')。阅读()汤= BeautifulSoup(Rhtml.parser)课程= soup.find_all('格',{阶级:courses_adblock_start})编码=UTF-8
PAGE_TITLE ='艾张庭选'
html_template ='<!doctyle HTML>< HTML>< HEAD><标题> {}< /标题><间的charset ={}/>< /头><身体GT; { }< /身体GT;< / HTML>'
table_template ='<表> {}< /表>'
table_row_template ='&所述; TR>&下; TD> {}&下; / TD>&下; TD> {}&下; / TD>&下; / TR>'
img_template ='&下; A HREF ={}>&下; IMG SRC ={}宽度=160像素; ALT ={}>&下; / A>'table_rows =''
在课程C:
标题= c.h2.text.en code(编码)
图像= c.find('IMG',{'类':'形象的风格,当然,徽标主体块'})。获得(SRC)
HREF = c.parent.get('href属性)
img_tag = img_template.format(BASE_URL + HREF,图片,标题)
table_rows + = table_row_template.format(img_tag,标题)table_tag = table_template.format(table_rows)开放(当然-scrape.html','W')为html_out:
html_out.write(html_template.format(PAGE_TITLE,编码,table_tag))
输出
I scraped all titles and image source links into a text file, then use the data from the text file to output a html file with 2 columns, one for images and one for titles.How to display clickable images, and display title and image in 2 column format? Here is what i have
from bs4 import BeautifulSoup
titles = []
images = []
href = []
r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)
for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
titles.append(i.h2.text)
for i in soup.find_all('img', {'class': "image-style-course-logo-subjects-block"}):
images.append(i.get('src'))
with open('test.txt', "w") as f:
for i in zip(titles, images):
f.write(i[0].encode('ascii', 'ignore') + '\n'
+i[1].encode('ascii', 'ignore') +
'\n\n')
header = '<!doctyle html><html><head><title>My page</title></head><body>'
body = '<table><tr><td></td><td></td></tr>'
footer = '</table></body></html>'
with open('test.txt', 'r') as input, open('test.html', 'w') as output:
output.write(header)
output.write(body)
for line in input:
#ignore blank lines
if line == '\n':
continue
col1 = line.rstrip()
#read next line
col2 = next(input).rstrip()
output.write('<tr><td>{}</td><td><img src="{}" style="width: 160px; height: 100px"></td></tr>\n\n'.format(col1, col2))
output.write(footer)
I feel like you're making this scraping thing really difficult on yourself. It's easier to start with the largest element first, i.e. the whole course div, then pull information out of it later on.
This code give you the clickable images in the first column and titles of the courses in the second column.
from bs4 import BeautifulSoup
import urllib
base_url = 'https://www.open2study.com'
r = urllib.urlopen(base_url + '/courses').read()
soup = BeautifulSoup(r, "html.parser")
courses = soup.find_all('div', {'class': "courses_adblock_start"})
encoding = "utf-8"
page_title = 'Ai Truong'
html_template = '<!doctyle html><html><head><title>{}</title><meta charset="{}" /></head><body>{}</body></html>'
table_template = '<table>{}</table>'
table_row_template = '<tr><td>{}</td><td>{}</td></tr>'
img_template = '<a href="{}"><img src="{}" width="160px;" alt="{}"></a>'
table_rows = ''
for c in courses:
title = c.h2.text.encode(encoding)
image = c.find('img', {'class': 'image-style-course-logo-subjects-block'}).get('src')
href = c.parent.get('href')
img_tag = img_template.format(base_url + href, image, title)
table_rows += table_row_template.format(img_tag, title)
table_tag = table_template.format(table_rows)
with open('course-scrape.html', 'w') as html_out:
html_out.write(html_template.format(page_title, encoding, table_tag))
Output
这篇关于显示图像和字幕在蟒蛇2列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!