Python:在列表中创建换行符,以便openpyxl在.xlsx中识别 [英] Python: Creating row breaks in a list for openpyxl to recognise in .xlsx

查看:202
本文介绍了Python:在列表中创建换行符,以便openpyxl在.xlsx中识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从URL抓取信息

I am scraping information from a URL

我可以成功地将信息保存到.xlsx

I can successfully get the information into a .xlsx

它不是我想要的格式.

element_rows = []
for table_row in Elements.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        sub_rows = column.findAll('p')
        output_row.append('\r\n'.join(row.text for row in sub_rows))
    element_rows.append(output_row)

我觉得这很简单,但是无法放置.

I feel it's something simple, but can't place it.

在迭代过程中,我希望每个'p'都创建一个新行.

As it iterates through, for every 'p' I want it to create a new row.

我一直在尝试使用Excel语法'\ r \ n',但感觉这是不对的. 我已经尝试过追加(行),但是这会向我抛出错误

I've been trying to use Excel syntax '\r\n' but feel this just isn't right. I've tried to append(row) but that throws errors at me

目前,它正在给我一些类似的东西

Currently it's giving me something along the lines of;

 |A    |B
1|Apple|PearOrangeBanana
2|Grape|MandarinOliveTomato

我希望成为这样

 |A    |B
1|Apple|Pear
2|     |Orange
3|     |Banana
4|Grape|Mandarin
5|     |Olive
6|     |Tomato

好.完整的代码如下.

from bs4 import BeautifulSoup
import requests
import csv
from subprocess import Popen
import webbrowser
import re
from openpyxl import *
import tkinter as tk
import openpyxl
from itertools import zip_longest


#Variables
#Name of course
CourseName = 'AURAFA008'#input("Input Course Code: ")
#Base URL
TGAURL = 'https://training.gov.au/Training/Details/'
#.csv filename
CourseCSV = CourseName + '.csv'
CourseXLSX = CourseName + '.xlsx'
#Total URL of course
CourseURL = TGAURL + CourseName
#URL get
website_url = requests.get(CourseURL).text
#Beautiful soup work
soup = BeautifulSoup(website_url,'html.parser')
table = soup.table
#Excel Frameworks
# wb = Workbook()
wb = openpyxl.Workbook()
ws = wb.active
output_row = 1

#Open URL in browser
#webbrowser.open(CourseURL, 2)
# Define the tables I want to grab
Elements = (soup.find("h2", string="Elements and Performance Criteria")).find_next('table')
Foundation = (soup.find("h2", string="Foundation Skills")).find_next('table')
#Extract the data
Element_rows = []
for table_row in Elements.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        sub_rows = column.findAll('p')
        for row in sub_rows:
            output_row.append(row.get_text(separator=' '))
    Element_rows.append(output_row)

Foundation_rows = []

for table_row in Foundation.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        sub_rows = column.findAll('p')
        for row in sub_rows:
            output_row.append(row.get_text(separator=' '))
    Foundation_rows.append(output_row)


# Write the tables to .xlsx
Tab0 = (CourseName + 'Elements')
Tab1 = (CourseName + 'Foundation')
ws1 = wb.create_sheet(Tab0)
ws2 = wb.create_sheet(Tab1)

for row in Element_rows:
    ws1.append(row)
for row in Foundation_rows:
    ws2.append(row)
wb.remove(wb['Sheet'])
wb.save(CourseXLSX)
p = Popen(CourseXLSX, shell=True)

推荐答案

我建议您在进行过程中写入excel文件.对于每个表行,创建一个包含所有子行的列表列表.然后,您可以使用Python的 zip_longest() 函数返回一个每一行的空白条目,其中一个列表短于另一个列表,例如:

I would recommend you write to your excel file as you go along. For each table row, create a list of lists containing any sub rows present. You can then use Python's zip_longest() function to return a sub entry for each row with blanks where one list is shorter than another, for example:

from itertools import zip_longest
from bs4 import BeautifulSoup
import openpyxl


html = """
<table>
  <tr>
    <td><p>a</p><p>b</p></td>
    <td><p>1</p><p>2</p><p>3</p></td>
    <td><p>d</p></td>
  </tr>
  <tr>
    <td><p>a</p><p>b</p></td>
    <td><p>1</p><p>2</p><p>3</p></td>
    <td><p>d</p></td>
  </tr>
</table>
"""

soup = BeautifulSoup(html, "html.parser")
table = soup.table

wb = openpyxl.Workbook()
ws = wb.active
output_row = 1

for table_row in table.find_all('tr'):
    cells = table_row.find_all('td')
    row = [[row.text for row in cell.find_all('p')] for cell in cells]

    for row_number, cells in enumerate(zip_longest(*row, fillvalue=""), start=output_row):
        for col_number, value in enumerate(cells, start=1):
            ws.cell(column=col_number, row=row_number, value=value)

    output_row += len(cells)

wb.save('output.xlsx')

这将为您提供以下输出:

This would give you the following output:

enumerate() 函数可以是用于为列表中的每个条目提供递增编号.可以用来为openpyxl单元格提供合适的行号和列号.

The enumerate() function can be used to give you an incrementing number for each entry in a list. This can be used to give you suitable row and column numbers for openpyxl cells.

这篇关于Python:在列表中创建换行符,以便openpyxl在.xlsx中识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆