Feedparser到数据框不会输出所有列 [英] Feedparser to dataframe doesnt ouput all columns

查看：46 发布时间：2020/10/16 23:13:27 python dataframe

本文介绍了Feedparser到数据框不会输出所有列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从feedparser解析URL并尝试获取所有列，但是我没有得到所有列作为输出，所以不确定问题出在哪里。如果执行以下内容。我没有得到几列的数据，但是数据确实存在，因为您可以在浏览器中检入。

I parsing a URL from feedparser and trying to get all columns, but i do not get all columns as a out put, not sure where the issue is. If you execute the below. I do not get data for few columns, but the data do exist as u can check in browser.

我的代码

import feedparser
import pandas as pd 

xmldoc = feedparser.parse('http://www.ebay.com/rps/feed/v1.1/epnexcluded/EBAY-US')
df_cols = [
    "title", "url", "endsAt", "image225","currency"
    "price", "orginalPrice", "discountPercentage", "quantity", "shippingCost","dealUrl"
]
rows = []

for entry in xmldoc.entries:
    s_title = entry.get("title","")
    s_url = entry.get("url", "")
    s_endsAt = entry.get("endsAt", "")
    s_image225 = entry.get("image225", "")
    s_currency = entry.get("currency", "")
    s_price = entry.get("price","")
    s_orginalPrice = entry.get("orginalPrice","")
    s_discountPercentage = entry.get ("discountPercentage","")
    s_quantity = entry.get("quantity","")
    s_shippingCost = entry.get("shippingCost", "")
    s_dealUrl = entry.get("dealUrl", "")#.replace('YOURUSERIDHERE','2427312')
       
        
    rows.append({"title":s_title, "url": s_url, "endsAt": s_endsAt, 
                 "image225": s_image225,"currency": s_currency,"price":s_price,
                 "orginalPrice": s_orginalPrice,"discountPercentage": s_discountPercentage,"quantity": s_quantity,
                 "shippingCost": s_shippingCost,"dealUrl": s_dealUrl})

out_df = pd.DataFrame(rows, columns=df_cols)

out_df

尝试过此操作，但这并不能给我任何数据，只有几列（我想是标题）

tried this, but this doesn't give me any data only few columns (headers i suppose)

import lxml.etree as ET 
import urllib

response = urllib.request.urlopen('http://www.ebay.com/rps/feed/v1.1/epnexcluded/EBAY-US')
xml = response.read()

root = ET.fromstring(xml)
for item in root.findall('.*/item'):
       
    df = pd.DataFrame([{item.tag: item.text if item.text.strip() != "" else item.find("*").text
                       for item in lnk.findall("*") if item is not None} 
                       for lnk in root.findall('.//item')])
                       
df

可以在一个排列如下，并将结果输出到PD。当我尝试此操作时，它确实可以部分解决问题（即，我缺少一些元素，导致此错误 AttributeError：对象没有属性价格，运输成本等），如果元素为null，我们该如何处理？

Is is possible to iterate the URL offset within an array as below and result the output to a PD. When I try this it does work partially with issues (i.e) I have few elements missing, resulting in this error AttributeError: object has no attribute 'price', shipping cost etc., How do we handle if null for an element?

我的代码

 import feedparser
    import pandas as pd
    #from simplified_scrapy import SimplifiedDoc, utils, req
    getdeals = ['http://www.ebay.com/rps/feed/v1.1/epnexcluded/EBAY-US?limit=200',
            'http://www.ebay.com/rps/feed/v1.1/epnexcluded/EBAY-US?limit=200&offset=200',
            'http://www.ebay.com/rps/feed/v1.1/epnexcluded/EBAY-US?limit=200&offset=400']
    
    posts=[]
    for urls in getdeals:
        feed = feedparser.parse(urls)
        for deals in feed.entries:
            print (deals)
            posts.append((deals.title,deals.endsat,deals.image225,deals.price,deals.originalprice,deals.discountpercentage,deals.shippingcost,deals.dealurl))
    df=pd.DataFrame(posts,columns=['title','endsat','image2255','price','originalprice','discountpercentage','shippingcost','dealurl'])
    df.tail()

同样，如何循环多个JSON响应

Also , similarly how to loop multiple JSON responses

 url= ["https://merchants.apis.com/v4/publisher/159663/offers?country=US&limit=2000",
"https://merchants.apis.com/v4/publisher/159663/offers?country=US&offset=2001&limit=2000"]
    
    
    response = requests.request("GET", url, headers=headers, params=querystring)
    response = response.json()
    
    
    name = []
    logo = []
    date_added = []
    description = []
    for i in range(len(response['offers'])):
        name.append(response['offers'][i]['merchant_details']['name'])
        logo.append(response['offers'][i]['merchant_details']['metadata']['logo'])
        date_added.append(response['offers'][i]['date_added'])
        description.append(response['offers'][i]['description'])
         try:
            verticals.append(response['offers'][i]['merchant_details']['verticals'][0])
        except IndexError:
            verticals.append('NA')
        pass
        
    data1 = pd.DataFrame({'name':name,'logo':logo,'verticals':verticals, 'date_added':date_added,'description':description})

Feedparser到数据框不会输出所有列 [英] Feedparser to dataframe doesnt ouput all columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Feedparser到数据框不会输出所有列 [英] Feedparser to dataframe doesnt ouput all columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭