如何从一个网站刮的图像和HTML文件显示它们? [英] How to scrape images from a website and display them on html file?
本文介绍了如何从一个网站刮的图像和HTML文件显示它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我从 https://www.open2study.com/courses 刮图片
我得到了所有的图像来源,但不知道如何在一个表格中显示的图像(而不是链接)与一个HTML file.Can专家2列(标题一栏,一个用于图像)帮我吗?
进口的urllib
从BS4进口BeautifulSoup标题= []
图像= []R =了urllib.urlopen('https://www.open2study.com/courses').read()
汤= BeautifulSoup(R)因为我在soup.find_all('格',{阶级:courses_adblock_rollover}):
titles.append(i.h2.text)因为我在soup.find_all(
IMG,{
类:形象的风格,当然,徽标主题块}):
images.append(i.get(SRC))用开放('的test.txt',W)为f:
对于ZIP我(标题,图像):
f.write(I [0] .EN code('ASCII码','无视')+
的'\\ n'+ I [1] .EN code('ASCII码','无视')+
的'\\ n \\ n)标题='<!doctyle HTML>< HTML>< HEAD><标题>我的标题< /标题>< /头><身体GT;'
身体='<表>< THEAD>< TR><第i< /第i个百分位>< /第i< / TR>'页脚='< /表>< /身体GT;< / HTML>'
img_tag ='< IMG SRC = {}>'
用开放('test.txt的','R')作为输入,开放('的test.html','W')作为输出:
output.write(头)
output.write(体) 在输入线:
COL1 = line.rstrip()。斯普利特()
COL2 = line.rstrip()。斯普利特()
output.write('< TR>< TD> {}< / TD>< TD> {}< / TD>< / TR> \\ n'.format(COL1,COL2)) output.write(页脚)
解决方案
这是pretty简单problem.try这个
在输入线:
#ignore空行
如果行=='\\ n'的
继续
#为什么是你这里劈裂?
COL1 = line.rstrip()
#阅读下一行
COL2 =下一个(输入).rstrip()
output.write('< TR>< TD> {}< / TD>< TD>< IMG SRC ={}的风格=宽度:160像素,高度:100像素>< / TD&GT ;< / TR> \\ n'.format(COL1,COL2))
I am scraping images from https://www.open2study.com/courses I got all the image sources but dont know how to display the images (instead of links) on a table with 2 column ( one column for title and one for image) on a html file.Can expert help me out?
import urllib
from bs4 import BeautifulSoup
titles = []
images = []
r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)
for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
titles.append(i.h2.text)
for i in soup.find_all(
'img', {
'class': "image-style-course-logo-subjects-block"}):
images.append(i.get('src'))
with open('test.txt', "w") as f:
for i in zip(titles, images):
f.write(i[0].encode('ascii', 'ignore') +
'\n'+i[1].encode('ascii', 'ignore') +
'\n\n')
header = '<!doctyle html><html><head><title>My Title</title></head><body>'
body = '<table><thead><tr><th></th><th></th></tr>'
footer = '</table></body></html>'
img_tag = '<img src=,{}">'
with open('test.txt', 'r') as input, open('test.html', 'w') as output:
output.write(header)
output.write(body)
for line in input:
col1 = line.rstrip().split()
col2 = line.rstrip().split()
output.write('<tr><td>{}</td><td>{}</td></tr>\n'.format(col1, col2))
output.write(footer)
解决方案
It was pretty simple problem.try this one
for line in input:
#ignore blank lines
if line == '\n':
continue
#why were you spliting here?
col1 = line.rstrip()
#read next line
col2 = next(input).rstrip()
output.write('<tr><td>{}</td><td><img src="{}" style="width: 160px; height: 100px"></td></tr>\n'.format(col1, col2))
这篇关于如何从一个网站刮的图像和HTML文件显示它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文