Python的Beautiful Soup:尝试以正确的方式在html页面上显示for循环的抓取内容 [英] Beautiful Soup, Python: Trying to display scraped contents of a for loop on an html page in the correct manner

查看:150
本文介绍了Python的Beautiful Soup:尝试以正确的方式在html页面上显示for循环的抓取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用漂亮的汤和python,对显示的网站进行了一些网页抓取工作,以隔离等级,公司名称和收入.

我想在我使用flask和jinja2渲染的html表中显示表中前十家公司的结果,但是,我编写的代码仅显示第一条记录五次. /p>

文件中的代码:webscraper.py

url = 'https://en.m.wikipedia.org/wiki/List_of_largest_Internet_companies' 
req = requests.get(url) 
bsObj = BeautifulSoup(req.text, 'html.parser')
data = bsObj.find('table',{'class':'wikitable sortable mw-collapsible'})

table_data=[]
trs = bsObj.select('table tr')
for tr in trs[1:6]: #first element is empty
    row = []
    for t in tr.select('td')[:3]:    #td is referring to the columns
        row.extend([t.text.strip()])
    table_data.append(row)
data=table_data

rank=data[0][0]
name=data[0][1]
revenue=data[0][2]

home.html中的相关代码

<p>{{data}}</p>
<table class="table">
  <thead>
    <tr>
      <th scope="col">#</th>
      <th scope="col">Rank</th>
      <th scope="col">Name</th>
      <th scope="col">Revenue</th>
    </tr>
  </thead>
  <tbody>

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{rank}}</td>
      <td>{{name}}</td>
      <td>{{revenue}}</td>
    </tr>
  {% endfor %}

  </tbody>

HTML输出为:注意:变量{{data}}正确显示了所有五个记录.但是我没有正确隔离数据.

[['1','Amazon','$ 280.5'],['2',Google','$ 161.8'],['3','JD.com','$ 82.8'],[' 4','Facebook','$ 70.69'],['5','Alibaba','$ 56.152']]

排名名称收入

1亚马逊$ 280.5 1亚马逊$ 280.5 1亚马逊$ 280.5 1亚马逊$ 280.5 1亚马逊$ 280.5

如前所述,我想要1-10,所有列出最多10家公司,而不仅仅是亚马逊.

关于我在代码中做错了什么的任何建议-我想要与我自己的代码有关的最优雅的解决方案,而不是一个全新的想法或解决方案.

也请解释for循环及其背后的理论.

我知道这是错误的:

    rank=data[0][0]
    name=data[0][1]
    revenue=data[0][2]

但是不明白为什么以及如何以最优雅的方式构造它,以使我的变量rank,name和Revenue包含各自的数据元素.

解决方案

感谢@mmfallacy,他在上面提出了我刚刚充实的答案的建议.

它可以工作,但是将接受他建议的答案. 这里供参考:

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{element[0]}}</td>
      <td>{{element[1]}}</td>
      <td>{{element[2]}}</td>
    </tr>
  {% endfor %}

我只是删除了所有试图在.py文件中生成变量等级和收入的尝试.

Using beautiful soup and python, I have undertaken some webscraping of the shown website to isolate: the rank, company name and revenue.

I would like to show, in an html table that I am rendering using flask and jinja2, the results of the top ten companies in the table, however, the code I have written is just displaying the first record five times.

Code in file: webscraper.py

url = 'https://en.m.wikipedia.org/wiki/List_of_largest_Internet_companies' 
req = requests.get(url) 
bsObj = BeautifulSoup(req.text, 'html.parser')
data = bsObj.find('table',{'class':'wikitable sortable mw-collapsible'})

table_data=[]
trs = bsObj.select('table tr')
for tr in trs[1:6]: #first element is empty
    row = []
    for t in tr.select('td')[:3]:    #td is referring to the columns
        row.extend([t.text.strip()])
    table_data.append(row)
data=table_data

rank=data[0][0]
name=data[0][1]
revenue=data[0][2]

Relevant code in home.html

<p>{{data}}</p>
<table class="table">
  <thead>
    <tr>
      <th scope="col">#</th>
      <th scope="col">Rank</th>
      <th scope="col">Name</th>
      <th scope="col">Revenue</th>
    </tr>
  </thead>
  <tbody>

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{rank}}</td>
      <td>{{name}}</td>
      <td>{{revenue}}</td>
    </tr>
  {% endfor %}

  </tbody>

The HTML output is: Note: The variable {{data}} is showing all five records correctly..but I am not isolating the data correctly.

[['1', 'Amazon', '$280.5'], ['2', 'Google', '$161.8'], ['3', 'JD.com', '$82.8'], ['4', 'Facebook', '$70.69'], ['5', 'Alibaba', '$56.152']]

Rank Name Revenue

1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5 1 Amazon $280.5

As mentioned, I want 1 - 10, all the companies listed up to 10, not just Amazon.

Any suggestions as to what I've done wrong in my code - I'd like the most elegant solution that pertains to my own code, not a completely new idea or solution.

Explanation of the for loop and theory behind it please too.

I know this is wrong:

    rank=data[0][0]
    name=data[0][1]
    revenue=data[0][2]

but don't understand why and how to go about constructing it in the most elegant way such that I have the variables rank, name and revenue contain the respective data elements.

解决方案

Thank you to @mmfallacy above who suggested this answer that I am just fleshing out.

It works, but will accept the answer above as he suggested it. Here it is for reference:

{% for element in data %}
    <tr>
      <th scope="row"></th>
      <td>{{element[0]}}</td>
      <td>{{element[1]}}</td>
      <td>{{element[2]}}</td>
    </tr>
  {% endfor %}

I simply deleted any tries to generate variables rank, revenue in the .py file.

这篇关于Python的Beautiful Soup:尝试以正确的方式在html页面上显示for循环的抓取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆