如何使用 Pandas 从一个文件中读取多个数据集? [英] How do I use Pandas to read in multiple datasets from one file?
问题描述
我有一个文件,其中包含由行分隔的多组数据.它看起来像:
I have a file that has multiple sets of data separated by rows. It looks something like:
country1
0.9
1.3
2.9
1.1
...
country2
4.1
3.1
0.2
...
我想使用 Pandas 将整个文件读入多个数据框,其中每个数据框对应一个国家.有什么简单的方法可以做到这一点吗?每个国家/地区都有不同数量的条目.
I would like to use Pandas to read the whole file into multiple dataframes, where each dataframe corresponds to a country. Is there any easy way to do this? Each country has a different number of entries.
推荐答案
您可以通过 to_numeric
和 errors='coerce'
,所以得到 NaN
列名.然后通过 isnull
找到它们并创建按 cumsum
分组:
You can create mask
by to_numeric
with errors='coerce'
, so get NaN
where are column names. Then find them by isnull
and create groups by cumsum
:
import pandas as pd
import io
temp=u"""country1
0.9
1.3
2.9
1.1
country2
4.1
3.1
0.2"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), index_col=None, header=None)
print (df)
0
0 country1
1 0.9
2 1.3
3 2.9
4 1.1
5 country2
6 4.1
7 3.1
8 0.2
mask = pd.to_numeric(df.iloc[:,0], errors='coerce').isnull().cumsum()
print (mask)
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
Name: 0, dtype: int32
最后使用 list comprehension
作为 dataframes
的列表:
Last use list comprehension
for list of dataframes
:
dfs = [g[1:].rename(columns={0:g.iloc[0].values[0]}) for i, g in df.groupby(mask)]
print (dfs)
print (dfs[0])
country1
1 0.9
2 1.3
3 2.9
4 1.1
print (dfs[1])
country2
6 4.1
7 3.1
8 0.2
如果需要重置索引
:
dfs = [g[1:].rename(columns={0:g.iloc[0].values[0]}).reset_index(drop=True) for i, g in df.groupby(mask)]
print (dfs)
print (dfs[0])
country1
0 0.9
1 1.3
2 2.9
3 1.1
print (dfs[1])
country2
0 4.1
1 3.1
2 0.2
这篇关于如何使用 Pandas 从一个文件中读取多个数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!