如何从一个网站刮的图像和HTML文件显示它们? [英] How to scrape images from a website and display them on html file?

查看:180
本文介绍了如何从一个网站刮的图像和HTML文件显示它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 https://www.open2study.com/courses 刮图片
我得到了所有的图像来源,但不知道如何在一个表格中显示的图像(而不是链接)与一个HTML file.Can专家2列(标题一栏,一个用于图像)帮我吗?

 进口的urllib
从BS4进口BeautifulSoup标题= []
图像= []R =了urllib.urlopen('https://www.open2study.com/courses').read()
汤= BeautifulSoup(R)因为我在soup.find_all('格',{阶级:courses_adblock_rollover}):
    titles.append(i.h2.text)因为我在soup.find_all(
    IMG,{
        类:形象的风格,当然,徽标主题块}):
    images.append(i.get(SRC))用开放('的test.txt',W)为f:
    对于ZIP我(标题,图像):
        f.write(I [0] .EN code('ASCII码','无视')+
                的'\\ n'+ I [1] .EN code('ASCII码','无视')+
                的'\\ n \\ n)标题='<!doctyle HTML>< HTML>< HEAD><标题>我的标题< /标题>< /头><身体GT;'
身体='<表>< THEAD>< TR><第i< /第i个百分位>< /第i< / TR>'页脚='< /表>< /身体GT;< / HTML>'
img_tag ='< IMG SRC = {}>'
用开放('test.txt的','R')作为输入,开放('的test.html','W')作为输出:
   output.write(头)
   output.write(体)   在输入线:
       COL1 = line.rstrip()。斯普利特()
       COL2 = line.rstrip()。斯普利特()
       output.write('< TR>< TD> {}< / TD>< TD> {}< / TD>< / TR> \\ n'.format(COL1,COL2))   output.write(页脚)


解决方案

这是pretty简单problem.try这个

 在输入线:
    #ignore空行
    如果行=='\\ n'的
        继续
    #为什么是你这里劈裂?
    COL1 = line.rstrip()
    #阅读下一行
    COL2 =下一个(输入).rstrip()
    output.write('< TR>< TD> {}< / TD>< TD>< IMG SRC ={}的风格=宽度:160像素,高度:100像素>< / TD&GT ;< / TR> \\ n'.format(COL1,COL2))

I am scraping images from https://www.open2study.com/courses I got all the image sources but dont know how to display the images (instead of links) on a table with 2 column ( one column for title and one for image) on a html file.Can expert help me out?

import urllib
from bs4 import BeautifulSoup

titles = []
images = []

r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)

for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
    titles.append(i.h2.text)

for i in soup.find_all(
    'img', {
        'class': "image-style-course-logo-subjects-block"}):
    images.append(i.get('src'))

with open('test.txt', "w") as f:
    for i in zip(titles, images):
        f.write(i[0].encode('ascii', 'ignore') +
                '\n'+i[1].encode('ascii', 'ignore') +
                '\n\n')

header = '<!doctyle html><html><head><title>My Title</title></head><body>'
body = '<table><thead><tr><th></th><th></th></tr>'

footer = '</table></body></html>'
img_tag = '<img src=,{}">'


with open('test.txt', 'r') as input, open('test.html', 'w') as output:
   output.write(header)
   output.write(body)

   for line in input:
       col1 = line.rstrip().split()
       col2 = line.rstrip().split()
       output.write('<tr><td>{}</td><td>{}</td></tr>\n'.format(col1, col2))

   output.write(footer)

解决方案

It was pretty simple problem.try this one

for line in input:
    #ignore blank lines
    if line == '\n':
        continue
    #why were you spliting here?
    col1 = line.rstrip()
    #read next line
    col2 = next(input).rstrip()
    output.write('<tr><td>{}</td><td><img src="{}" style="width: 160px; height: 100px"></td></tr>\n'.format(col1, col2))

这篇关于如何从一个网站刮的图像和HTML文件显示它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆