Python:如何访问生成器对象中的元素并将其放入Pandas数据框或字典中? [英] Python: How to access the elements in a generator object and put them in a Pandas dataframe or in a dictionary?

查看:456
本文介绍了Python:如何访问生成器对象中的元素并将其放入Pandas数据框或字典中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在python中使用scholarly模块搜索关键字.我返回了一个生成器对象,如下所示:

I am using the scholarly module in python to search for a keyword. I am getting back a generator object as follows:

import pandas as pd
import numpy as np
import scholarly

search_query = scholarly.search_keyword('Python')
print(next(search_query))

{'_filled': False,
 'affiliation': 'Juelich Center for Neutron Science',
 'citedby': 75900,
 'email': '@fz-juelich.de',
 'id': 'zWxqzzAAAAAJ',
 'interests': ['Physics', 'C++', 'Python'],
 'name': 'Gennady Pospelov',
 'url_picture': 'https://scholar.google.com/citations?view_op=medium_photo&user=zWxqzzAAAAAJ'}

我想访问元素"citedby",但是当我尝试执行next(search_query)['citedby']时,它返回TypeError: 'Author' object is not subscriptable.

I want to access the element 'citedby' but when I try to do next(search_query)['citedby'] it returns TypeError: 'Author' object is not subscriptable.

我的问题是如何访问生成器对象中的元素?以及如何将该对象转换为Pandas数据框?

My question is how can I access elements in the generator object? and How can I convert that object to a Pandas dataframe?

推荐答案

这不是生成器问题.生成器生成的对象不是字典.

This is not a generator problem. The objects the generator produces are not dictionaries.

当然,scholary库不能通过为Author实例提供类似于字典的字符串转换,并且没有实际记录类支持的API来解决问题.

Granted, the scholary library does not help matters by giving the Author instances that you are given a dictionary-like string conversion, and not actually documenting what API that class does support.

Author表示形式中的每个键"实际上是对象上的一个属性:

Each of the 'keys' in the Author representation is actually an attribute on the object:

author = next(search_query)
print(author.citedby)

可以使用不过,数据不一定直接映射到数据框.例如,如何在数据框表格数据结构中表示interests列表?而且您也不希望包含_filled内部属性(这是一个记录,以记录是否已调用author.fill()).

The data doesn't necessarily map to a dataframe directly, though. How would the interests list be represented in the dataframe tabular data structure, for example? And you wouldn't want to include the _filled internal attribute either (that's a flag to record if author.fill() has been called yet).

也就是说,您可以通过在vars函数上映射生成器来从字典创建一个数据框:

That said, you could just create a dataframe from the dictionaries by mapping the generator over the vars function:

search_query = scholarly.search_keyword('Python')
df = pd.DataFrame(map(vars, search_query))

,然后在必要时放下_filled列,然后将interests列转换为更具结构性的内容,例如具有0/1值或类似值的单独列.

and then drop the _filled column if necessary, and convert the interests column into something a bit more structured, such as separate columns with 0 / 1 values or similar.

请注意,这将是 slow ,因为scholarly库会顺序浏览Google搜索结果,而库故意会延迟请求并随机休眠每次间隔5-10秒,以避免Google阻止请求.因此,您必须要有耐心,因为Python关键字搜索很容易产生将近30页的结果.

Note that this is going to be slow, because the scholarly library pages through the Google search results sequentially, and the library deliberately delays requests with a random sleep interval of 5-10 seconds each time to avoid Google blocking the requests. So you'll have to be patient as the Python keyword search easily produces nearly 30 pages of results.

这篇关于Python:如何访问生成器对象中的元素并将其放入Pandas数据框或字典中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆