pandas 中的不同read_csv index_col = None/0/False [英] Different read_csv index_col = None / 0 / False in pandas
问题描述
我在下面使用了read_csv命令:
I used the read_csv command following below:
In [20]:
dataframe = pd.read_csv('D:/UserInterest/output/ENFP_0719/Bookmark.csv', index_col=None)
dataframe.head()
Out[20]:
Unnamed: 0 timestamp url visits
0 0 1.404028e+09 http://m.blog.naver.com/PostView.nhn?blogId=mi... 2
1 1 1.404028e+09 http://m.facebook.com/l.php?u=http%3A%2F%2Fblo... 1
2 2 1.404028e+09 market://details?id=com.kakao.story 1
3 3 1.404028e+09 https://story-api.kakao.com/upgrade/install 4
4 4 1.403889e+09 http://m.cafe.daum.net/WorldcupLove/Knj/173424... 1
结果显示列Unnamed:0
,当我使用index_col=False
时它是相似的,但是当我使用index_col=0
时,结果如下:
The result shows column Unnamed:0
and it is simillar when I used index_col=False
, but when I used index_col=0
, the result is following below:
dataframe = pd.read_csv('D:/UserInterest/output/ENFP_0719/Bookmark.csv', index_col=0)
dataframe.head()
Out[21]:
timestamp url visits
0 1.404028e+09 http://m.blog.naver.com/PostView.nhn?blogId=mi... 2
1 1.404028e+09 http://m.facebook.com/l.php?u=http%3A%2F%2Fblo... 1
2 1.404028e+09 market://details?id=com.kakao.story 1
3 1.404028e+09 https://story-api.kakao.com/upgrade/install 4
4 1.403889e+09 http://m.cafe.daum.net/WorldcupLove/Knj/173424... 1
结果确实显示了列Unnamed:0
,在这里我想问一下index_col=None
,index_col=0
和index_col=False
有什么区别,我已经阅读了
The result did show the column Unnamed:0
, In here I want to ask, what is the difference between index_col=None
, index_col=0
, and index_col=False
, I have read the documentation in this, but I still did not get the idea.
推荐答案
更新
我认为自版本 0.16.1 如果您尝试为index_col
传递True
以避免这种歧义,则会出现一个错误
I think since version 0.16.1 it will now raise an error if you try to pass True
for index_col
to avoid this ambiguity
原始
很多人对此感到困惑,在这种情况下,要指定列的序号索引,您应该传递int位置.
A lot of people get confused by this, to specify the ordinal index of your column you should pass the int position in this case 0
.
In [3]:
import io
import pandas as pd
t="""index,a,b
0,hello,pandas"""
pd.read_csv(io.StringIO(t))
Out[3]:
index a b
0 0 hello pandas
默认值为index_col=None
,如上所示.
如果设置index_col=0
,我们将明确声明将第一列视为索引:
If we set index_col=0
we're explicitly stating to treat the first column as the index:
In [4]:
pd.read_csv(io.StringIO(t), index_col=0)
Out[4]:
a b
index
0 hello pandas
如果通过index_col=False
,我们将得到与None
相同的结果:
If we pass index_col=False
we get the same result as None
:
In [5]:
pd.read_csv(io.StringIO(t), index_col=False)
Out[5]:
index a b
0 0 hello pandas
如果我们现在声明index_col=None
,我们将得到与未通过此参数时相同的行为:
If we now state index_col=None
we get the same behaviour as when we didn't pass this param:
In [6]:
pd.read_csv(io.StringIO(t), index_col=None)
Out[6]:
index a b
0 0 hello pandas
有一个错误,如果您通过True
,则会错误地将其转换为index_col=1
,而将True
转换为1
:
There is a bug where if you pass True
this was erroneously being converted to index_col=1
as True
was being converted to 1
:
In [6]:
pd.read_csv(io.StringIO(t), index_col=True)
Out[6]:
index b
a
0 hello pandas
编辑
对于拥有空白索引列的情况:
For the case where you have a blank index column which is what you have:
In [7]:
import io
import pandas as pd
t=""",a,b
0,hello,pandas"""
pd.read_csv(io.StringIO(t))
Out[7]:
Unnamed: 0 a b
0 0 hello pandas
In [8]:
pd.read_csv(io.StringIO(t), index_col=0)
Out[8]:
a b
0 hello pandas
In [9]:
pd.read_csv(io.StringIO(t), index_col=False)
Out[9]:
Unnamed: 0 a b
0 0 hello pandas
In [10]:
pd.read_csv(io.StringIO(t), index_col=None)
Out[10]:
Unnamed: 0 a b
0 0 hello pandas
这篇关于 pandas 中的不同read_csv index_col = None/0/False的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!