从json源进行数据抓取,为什么我只有1行? [英] Webscraping data from a json source, why i get only 1 row?

查看:186
本文介绍了从json源进行数据抓取,为什么我只有1行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从网上商店的python网站获取一些信息.

I'am trying to get some information from a website with python, from a webshop.

我尝试了这个:

def proba():

    my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL')
    data = my_url.json()
    results = []
    products = data['MainContent'][0]['contents'][0]['productList']['products']
    for product in products:
        name = product['productModel']['displayName']
        try:
            priceGross = product['priceInfo']['priceItemSale']['gross']
        except:
            priceGross = product['priceInfo']['priceItemToBase']['gross']
        url = product['productModel']['url']
        results.append([name, priceGross, url])
    df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url'])    
# print(df)  ## print df
    df.to_csv(r'/usr/src/Python-2.7.13/test.csv', sep=',', encoding='utf-8-sig',index = False )

while True:
    mytime=datetime.now().strftime("%H:%M:%S")
    while mytime < "23:59:59":
    print mytime
    proba()
    mytime=datetime.now().strftime("%H:%M:%S")

在此网上商店中有9个项目,但我在csv文件中仅看到1行.

In this webshop there are 9 items, but i see only 1 row in the csv file.

推荐答案

不完全确定最终目标是什么.您要更新现有文件吗?一次性获取数据并全部写入?后面的示例如下所示,在该示例中,我将每个新数据帧添加到整体数据帧,并对函数调用使用Return语句以提供每个新数据帧.

Not entirely sure what you intend as end result. Are you wanting to update an existing file? Get data and write out all in one go? Example of latter shown below where I add each new dataframe to an overall dataframe and use a Return statement for the function call to provide each new dataframe.

import requests
from datetime import datetime
import pandas as pd

def proba():
    my_url = requests.get('https://www.telekom.hu/shop/categoryresults/?N=10994&contractType=list_price&instock_products=1&Ns=sku.sortingPrice%7C0%7C%7Cproduct.displayName%7C0&No=0&Nrpp=9&paymentType=FULL')
    data = my_url.json()
    results = []
    products = data['MainContent'][0]['contents'][0]['productList']['products']
    for product in products:
        name = product['productModel']['displayName']
        try:
            priceGross = product['priceInfo']['priceItemSale']['gross']
        except:
            priceGross = product['priceInfo']['priceItemToBase']['gross']
        url = product['productModel']['url']
        results.append([name, priceGross, url])
    df = pd.DataFrame(results, columns = ['Name', 'Price', 'Url'])  
    return df

headers = ['Name', 'Price', 'Url']
df = pd.DataFrame(columns = headers)

while True:
    mytime = datetime.now().strftime("%H:%M:%S")
    while mytime < "23:59:59":
        print(mytime)
        dfCurrent = proba()
        mytime=datetime.now().strftime("%H:%M:%S")
        df = pd.concat([df, dfCurrent])

df.to_csv(r"C:\Users\User\Desktop\test.csv", encoding='utf-8') 

这篇关于从json源进行数据抓取,为什么我只有1行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆