Pandas:将数据帧附加到另一个 df [英] Pandas: append dataframe to another df

查看:72
本文介绍了Pandas:将数据帧附加到另一个 df的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在附加数据框时遇到问题.我尝试执行此代码

I have a problem with appending of dataframe. I try to execute this code

df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000)
urls = pd.read_excel('url_june.xlsx')
substr = urls.url.values.tolist()
df_res = pd.DataFrame()
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        df_res.append(res)

当我尝试保存 df_res 时,我得到了空的数据框.df_all 看起来像

And when I try to save df_res I get empty dataframe. df_all looks like

ID,"url","used_at","active_seconds"
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30
f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34
078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9

urls 看起来像

url
shoppingcart.aliexpress.com/order/confirm_order
ozon.ru/?context=order_done&number=
lk.wildberries.ru/basket/orderconfirmed
lamoda.ru/checkout/onepage/success/quick
mvideo.ru/confirmation?_requestid=
eldorado.ru/personal/order.php?step=confirm

当我在循环中打印 res 时,它不会为空.但是当我在追加后尝试在循环 df_res 中打印时,它返回空数据帧.我找不到我的错误.我该如何解决?

When I print res in a loop it doesn't empty. But when I try print in a loop df_res after append, it return empty dataframe. I can't find my error. How can I fix it?

推荐答案

如果您查看 pd.DataFrame.append

将 other 的行附加到此帧的末尾,返回一个新对象.不在此框架中的列将作为新列添加.

Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.

(强调我的).

试试

df_res = df_res.append(res)


顺便说一句,请注意,pandas 通过连续连接创建 DataFrame 的效率并不高.你可以试试这个:


Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:

all_res = []
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        all_res.append(res)

df_res = pd.concat(all_res)

这首先创建一个包含所有部分的列表,然后在最后从所有这些部分创建一个 DataFrame.

This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.

这篇关于Pandas:将数据帧附加到另一个 df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆