将列表拆分为Google表格中的不同单元格 [英] split list to diferent cells in Google Sheets

查看:51
本文介绍了将列表拆分为Google表格中的不同单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Google表格中有一些单元格:

I have cells in Google Sheets:

在此处输入图片描述

我需要Separe价格,地址需要:

I need separe price, address to have:

名称1 = 1次运行,名称2 =价格,名称3 =地址,名称4 =消息

Name1 = 1run, Name2 = price, Name3 = address, Name4 = message

我有此代码(问题在在< div>中找到特定的< li>< ul> ):

I have this code (question is there Find specific <li> in <div><ul>):

print(" ".join(c.getText(strip=True) for c in cena))

它在一个单元格中打印(我在GS中插入时使用它),例如:价格地址

It print (I use it in insert to GS) like: price address in one cell

该怎么做,仅将价格打印到Name2列,然后第二次打印(或插入)到Name3中?

How to do that, to print only price to column Name2, and second print(or insert) into Name3?

我用于运行.py脚本的这段代码:

This code I use for running .py script:

import gspread
import requests
import datetime 
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials
from datetime import timedelta
import time

datetime.datetime.now()

stranka =1
stranka_1 = '/'

scope = [
'https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive'
]


while stranka < 5:
    URL = 'url_address_here' + stranka_1
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'html.parser')

    #headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}
    #response = requests.get(URL, headers=headers)

    pocet_bytu = 0

    #Google Sheet
    data = ServiceAccountCredentials.from_json_keyfile_name("data.json", scope)
    client = gspread.authorize(data)
    sheet = client.open("rrfile).worksheet('site_name_sheet')
    data = sheet.get_all_records()

    #log
    sheet2 = client.open("rrfile").worksheet('LOG')
    data = sheet2.get_all_records()

    insertRow = ["site, "START: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
    sheet2.insert_row(insertRow,2)

    #Scraping web site
    results = soup.find_all('li', attrs={'class':'list-items__item'})
    for job_data in results:
        n = job_data.find('a', attrs={'class':'js-simulate-link-target'})
        n_final = n.text.strip()

        url = job_data.find('a', attrs={"class":"js-simulate-link-target"})
        url_pred_final = url.get('href')
        url_final = "site_url" + url_pred_final

        cena = job_data.select(".list-items__content__in > ul > li")

        pocet_bytu += 1

        #přidání řádku do sheetu
        insertRow = ["site", n_final,'', " ".join(c.getText(strip=True) for c in cena), str(pocet_bytu), url_final]
        
        print(insertRow)
        sheet.insert_row(insertRow,2)

    insertRow = ["site", "KONEC: " + str(datetime.datetime.now().strftime('%d-%m-%Y ve %H:%M:%S'))]
    sheet2.insert_row(insertRow,2)
    stranka +=1
    stranka_1 = '/page-' + str(stranka) + '/'
    print(stranka_1)
    print(URL)
    time.sleep(60)

输出为:

网站,n_value,'',价格地址,1,网址

但是我需要分割价格并单独解决每个问题,因此我需要输出:

But i need split price and address each of them alone to cell, so i need output:

网站,n_value,'',价格,地址,1,网址

当我有一个值时,是否有办法在inserRow中拆分(u可以有疑问地看到-如何获取这些值的价格,地址)?

Is there a way to split in inserRow when I have one value (u can see in question up - how to get these values price, address)?

Edit2:网页上有10个此元素.下页的另一个.仅更改价格和地址-但所有这些代码都相同.

There are 10 this element on the web page. Another on next pages. There change only price and Address - but same for every these code.

<div class="list-items__content list-items__content__1">
            <div class="list-items__content__in">
                <a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
                    <i class="icon icon__heart-grey"></i>
                </a>
            </div>

            <div class="list-items__content__in">
                                    <h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
                        <a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
                            some another text</a>
                    </h2>
<!--                -->
<!--                <p>--><!--</p>-->

                <ul>
                    <li>
                        price1                    </li>

<!--                    -->                    <li>
<!--                        --><!-- Kč/m<sup>2</sup>-->
                        Address1</li>
<!--                    -->                </ul>
            </div>
        </div>

尝试从@Nikko J.回答:

当我尝试使用您的代码时,它会向我显示网站上所有价值的价格和所有价值的地址.我的意思是:

When I try your code, it print me all values price and addres of all values from website. I mean:

一个块元素具有price1 ... address1.第二个块元素具有price2 ... address2...

One block element have price1... address1. Second block element have price2... address2. . . .

所以输出就像:

['site', 1, '', price1, address1, price2, address2,..., 456654]
['site', 2, '', price1, address1, price2, address2,..., 456654]
['site', 3, '', price1, address1, price2, address2,..., 456654]
['site', 4, '', price1, address1, price2, address2,..., 456654]

我只需要为该区块打印价格和地址,而不是所有区块的所有值,就像这样:

I need only print price and address for the block, not all values for all blocks, so like:

['site', 1, '', price1, address1, 456654]
['site', 2, '', price2, address2, 456654]
['site', 3, '', price3, address3, 456654]
['site', 4, '', price4, address4, 456654]

推荐答案

替换"" .join(cena中c的c.getText(strip = True)) [cena中c中的c的c.getText(strip = True)] [code]整理列表.

Replace " ".join(c.getText(strip=True) for c in cena) with [c.getText(strip=True) for c in cena[:2]] and just flatten the list.

示例:

cena = BeautifulSoup(page.content, "html.parser").select(".list-items__content__in > ul > li")
insertRow = ["site", 1234,'',[c.getText(strip=True) for c in cena[:2]] , 456654, 32452]

def flatten_list(_2d_list):
    flat_list = []
    for element in _2d_list:
        if type(element) is list:
            for item in element:
                flat_list.append(item)
        else:
            flat_list.append(element)
    return flat_list

print('insertRow value:', insertRow)
print('Transformed Flat List:', flatten_list(insertRow))

输出:

insertRow value: ['site', 1234, '', ['2 890 000Kč', 'Address'], 456654, 32452]
Transformed Flat List: ['site', 1234, '', '2 890 000Kč', 'Address', 456654, 32452]

参考:

据我了解, [cena中c的c.getText(strip = True)] 的内容可以有多套价格和地址.

From what I've understood, the content of [c.getText(strip=True) for c in cena] can have multiple sets of price and address.

您可以使用以2进行迭代的for循环语句,并使用迭代器访问每个集合.

You can use a for loop statement that iterates by 2 and access each set by using the iterator.

示例:

from bs4 import BeautifulSoup
import itertools
import datetime
text = """
<div class="list-items__content list-items__content__1">
   <div class="list-items__content__in">
      <a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
      <i class="icon icon__heart-grey"></i>
      </a>
   </div>
   <div class="list-items__content__in">
      <h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
         <a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
         some another text</a>
      </h2>
      <ul>
         <li>
            price1
         </li>
         <li>
            Address1
         </li>
      </ul>
   </div>
</div>
<div class="list-items__content list-items__content__1">
   <div class="list-items__content__in">
      <a href="#" class="in-heart js-heart " data-tooltip="Přidat do oblíbených" onclick="toggleFavorite(8826547, this)">
      <i class="icon icon__heart-grey"></i>
      </a>
   </div>
   <div class="list-items__content__in">
      <h2 class="list-items__item__title list-items__item__title__1" itemprop="name">
         <a href="url" itemprop="url" class="js-simulate-link-target" onclick="return loadPropertyToModal(8826547);" title="some text">
         some another text</a>
      </h2>
      <ul>
         <li>
            price2
         </li>
         <li>
            Address2
         </li>
      </ul>
   </div>
</div>
"""

cena = BeautifulSoup(text, "html.parser").select(".list-items__content__in > ul > li")
test = [c.getText(strip=True) for c in cena]
site = 1
for i in range(0,len(test),2):
   data = test[i:i+2]
   insertRow = ["site", site,'', data[0], data[1] , 456654]
   site = site+1
   print('insertRow value:', insertRow)
   

输出:

insertRow value: ['site', 1, '', 'price1', 'Address1', 456654]
insertRow value: ['site', 2, '', 'price2', 'Address2', 456654]

这篇关于将列表拆分为Google表格中的不同单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆