如果 pandas 数据框具有超过10行,则将其分成两部分 [英] Split pandas dataframe in two if it has more than 10 rows

查看:47
本文介绍了如果 pandas 数据框具有超过10行,则将其分成两部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的CSV文件,其中包含许多行很多的表.我想简单地将每个数据框拆分成2个(如果包含10行以上).

I have a huge CSV with many tables with many rows. I would like to simply split each dataframe into 2 if it contains more than 10 rows.

如果为true,我希望第一个数据帧包含前10个,其​​余的包含在第二个数据帧中.

If true, I would like the first dataframe to contain the first 10 and the rest in the second dataframe.

有方便的功能吗?我环顾四周,但没发现有用的东西...

Is there a convenient function for this? I've looked around but found nothing useful...

split_dataframe(df, 2(if > 10))?

推荐答案

如果满足条件,它将返回拆分的DataFrame,否则返回原始数据和None(然后需要分别处理).请注意,这假设每个df拆分仅需进行一次,并且拆分的第二部分(如果长度超过10行(表示原始长度超过20行))就可以.

This will return the split DataFrames if the condition is met, otherwise return the original and None (which you would then need to handle separately). Note that this assumes the splitting only has to happen one time per df and that the second part of the split (if it is longer than 10 rows (meaning that the original was longer than 20 rows)) is OK.

df_new1, df_new2 = df[:10, :], df[10:, :] if len(df) > 10 else df, None

请注意,您也可以根据需要使用df.head(10)df.tail(len(df) - 10)进行正面和反面的装饰.您还可以使用各种索引方法:如果需要,您可以只提供第一个维度索引,例如df[:10]而不是df[:10, :](尽管我想明确地编写关于所用维度的代码).您也可以使用df.ilocdf.ix以类似的方式进行索引.

Note you can also use df.head(10) and df.tail(len(df) - 10) to get the front and back according to your needs. You can also use various indexing approaches: you can just provide the first dimensions index if you want, such as df[:10] instead of df[:10, :] (though I like to code explicitly about the dimensions you are taking). You can can also use df.iloc and df.ix to index in similar ways.

但是请谨慎使用df.loc,因为它基于标签,输入将永远不会被解释为整数位置. .loc仅在您偶然发现索引标签是从0开始没有间隔的整数的情况下,才偶然地"工作.

Be careful about using df.loc however, since it is label-based and the input will never be interpreted as an integer position. .loc would only work "accidentally" in the case when you happen to have index labels that are integers starting at 0 with no gaps.

但是您还应该考虑pandas提供的各种选项,这些选项用于将DataFrame的内容转储到HTML中,还可能考虑将LaTeX转储为呈现更好的设计表(而不只是复制和粘贴).只是简单地使用Google搜索方法将DataFrame转换为这些格式,就可以为该应用程序提供大量的教程和建议.

But you should also consider the various options that pandas provides for dumping the contents of the DataFrame into HTML and possibly also LaTeX to make better designed tables for the presentation (instead of just copying and pasting). Simply Googling how to convert the DataFrame to these formats turns up lots of tutorials and advice for exactly this application.

这篇关于如果 pandas 数据框具有超过10行,则将其分成两部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆