将列表拆分为Google表格中的不同单元格 [英] split list to diferent cells in Google Sheets
问题描述
我在Google表格中有一些单元格:
I have cells in Google Sheets:
我需要Separe价格,地址需要:
I need separe price, address to have:
名称1 = 1次运行,名称2 =价格,名称3 =地址,名称4 =消息
Name1 = 1run, Name2 = price, Name3 = address, Name4 = message
我有此代码(问题在在< div>中找到特定的< li>< ul> ):
I have this code (question is there Find specific <li> in <div><ul>):
print(" ".join(c.getText(strip=True) for c in cena))
它在一个单元格中打印(我在GS中插入时使用它),例如:价格地址
It print (I use it in insert to GS) like: price address in one cell
该怎么做,仅将价格打印到Name2列,然后第二次打印(或插入)到Name3中?
How to do that, to print only price to column Name2, and second print(or insert) into Name3?
我用于运行.py脚本的这段代码:
This code I use for running .py script:
import gspread
import requests
import datetime
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
from datetime import timedelta
import time
datetime.datetime.now()
stranka =1
stranka_1 = '/'
scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]
while stranka < 5:
URL = 'url_address_here' + stranka_1
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
#headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
#response = requests.get(URL, headers=headers)
pocet_bytu = 0
#Google Sheet
data = ServiceAccountCredentials.from_json_keyfile_name("data.json", scope)
client = gspread.authorize(data)
sheet = client.open("rrfile).worksheet('site_name_sheet')
data = sheet.get_all_records()
#log
sheet2 = client.open("rrfile").worksheet('LOG')
data = sheet2.get_all_records()
insertRow = ["site, "START: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)
#Scraping web site
results = soup.find_all('li', attrs={'class':'list-items__item'})
for job_data in results:
n = job_data.find('a', attrs={'class':'js-simulate-link-target'})
n_final = n.text.strip()
url = job_data.find('a', attrs={"class":"js-simulate-link-target"})
url_pred_final = url.get('href')
url_final = "site_url" + url_pred_final
cena = job_data.select(".list-items__content__in > ul > li")
pocet_bytu += 1
#přidání řádku do sheetu
insertRow = ["site", n_final,'', " ".join(c.getText(strip=True) for c in cena), str(pocet_bytu), url_final]
print(insertRow)
sheet.insert_row(insertRow,2)
insertRow = ["site", "KONEC: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
sheet2.insert_row(insertRow,2)
stranka +=1
stranka_1 = '/page-' + str(stranka) + '/'
print(stranka_1)
print(URL)
time.sleep(60)
输出为:
网站,n_value,'',价格地址,1,网址
但是我需要分割价格并单独解决每个问题,因此我需要输出:
But i need split price and address each of them alone to cell, so i need output:
网站,n_value,'',价格,地址,1,网址
当我有一个值时,是否有办法在inserRow中拆分(u可以有疑问地看到-如何获取这些值的价格,地址)?
Is there a way to split in inserRow when I have one value (u can see in question up - how to get these values price, address)?
Edit2:网页上有10个此元素.下页的另一个.仅更改价格和地址-但所有这些代码都相同.
There are 10 this element on the web page. Another on next pages. There change only price and Address - but same for every these code.
<div class="list-items__content list-items__content__1">
<div class="list-items__content__in">
<a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
<i class="icon icon__heart-grey"></i>
</a>
</div>
<div class="list-items__content__in">
<h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
<a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
some another text</a>
</h2>
<!-- -->
<!-- <p>--><!--</p>-->
<ul>
<li>
price1 </li>
<!-- --> <li>
<!-- --><!-- Kč/m<sup>2</sup>-->
Address1</li>
<!-- --> </ul>
</div>
</div>
尝试从@Nikko J.回答:
当我尝试使用您的代码时,它会向我显示网站上所有价值的价格和所有价值的地址.我的意思是:
When I try your code, it print me all values price and addres of all values from website. I mean:
一个块元素具有price1 ... address1.第二个块元素具有price2 ... address2...
One block element have price1... address1. Second block element have price2... address2. . . .
所以输出就像:
['site', 1, '', price1, address1, price2, address2,..., 456654]
['site', 2, '', price1, address1, price2, address2,..., 456654]
['site', 3, '', price1, address1, price2, address2,..., 456654]
['site', 4, '', price1, address1, price2, address2,..., 456654]
我只需要为该区块打印价格和地址,而不是所有区块的所有值,就像这样:
I need only print price and address for the block, not all values for all blocks, so like:
['site', 1, '', price1, address1, 456654]
['site', 2, '', price2, address2, 456654]
['site', 3, '', price3, address3, 456654]
['site', 4, '', price4, address4, 456654]
推荐答案
替换"" .join(cena中c的c.getText(strip = True))
与 [cena中c中的c的c.getText(strip = True)] [code]整理列表.
Replace " ".join(c.getText(strip=True) for c in cena)
with [c.getText(strip=True) for c in cena[:2]]
and just flatten the list.
示例:
cena = BeautifulSoup(page.content, "html.parser").select(".list-items__content__in > ul > li")
insertRow = ["site", 1234,'',[c.getText(strip=True) for c in cena[:2]] , 456654, 32452]
def flatten_list(_2d_list):
flat_list = []
for element in _2d_list:
if type(element) is list:
for item in element:
flat_list.append(item)
else:
flat_list.append(element)
return flat_list
print('insertRow value:', insertRow)
print('Transformed Flat List:', flatten_list(insertRow))
输出:
insertRow value: ['site', 1234, '', ['2 890 000Kč', 'Address'], 456654, 32452]
Transformed Flat List: ['site', 1234, '', '2 890 000Kč', 'Address', 456654, 32452]
参考:
据我了解, [cena中c的c.getText(strip = True)]
的内容可以有多套价格和地址.
From what I've understood, the content of [c.getText(strip=True) for c in cena]
can have multiple sets of price and address.
您可以使用以2进行迭代的for循环语句,并使用迭代器访问每个集合.
You can use a for loop statement that iterates by 2 and access each set by using the iterator.
示例:
from bs4 import BeautifulSoup
import itertools
import datetime
text = """
<div class="list-items__content list-items__content__1">
<div class="list-items__content__in">
<a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
<i class="icon icon__heart-grey"></i>
</a>
</div>
<div class="list-items__content__in">
<h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
<a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
some another text</a>
</h2>
<ul>
<li>
price1
</li>
<li>
Address1
</li>
</ul>
</div>
</div>
<div class="list-items__content list-items__content__1">
<div class="list-items__content__in">
<a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
<i class="icon icon__heart-grey"></i>
</a>
</div>
<div class="list-items__content__in">
<h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
<a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
some another text</a>
</h2>
<ul>
<li>
price2
</li>
<li>
Address2
</li>
</ul>
</div>
</div>
"""
cena = BeautifulSoup(text, "html.parser").select(".list-items__content__in > ul > li")
test = [c.getText(strip=True) for c in cena]
site = 1
for i in range(0,len(test),2):
data = test[i:i+2]
insertRow = ["site", site,'', data[0], data[1] , 456654]
site = site+1
print('insertRow value:', insertRow)
输出:
insertRow value: ['site', 1, '', 'price1', 'Address1', 456654]
insertRow value: ['site', 2, '', 'price2', 'Address2', 456654]
这篇关于将列表拆分为Google表格中的不同单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!