pandas :read_csv表示“以空格分隔" [英] Pandas: read_csv indicating 'space-delimited'
问题描述
我有以下file.txt(摘要):
I have the following file.txt (abridged):
SICcode Catcode Category SICname MultSIC
0111 A1500 Wheat, corn, soybeans and cash grain Wheat X
0112 A1600 Other commodities (incl rice, peanuts) Rice X
0115 A1500 Wheat, corn, soybeans and cash grain Corn X
0116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
0119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC X
0131 A1100 Cotton Cotton X
0132 A1300 Tobacco & Tobacco products Tobacco X
将其读入pandas df时遇到一些问题.我尝试使用以下规范 engine ='python',sep ='Tab'
的 pd.read_csv
,但它在一列中返回了文件:
I'm having some problems reading it into a pandas df. I tried pd.read_csv
with the following specifications engine='python', sep='Tab'
but it returned the file in one column:
SICcode Catcode Category SICname MultSIC
0 0111 A1500 Wheat, corn, soybeans...
1 0112 A1600 Other commodities (in...
2 0115 A1500 Wheat, corn, soybeans...
3 0116 A1500 Wheat, corn, soybeans...
然后,我尝试使用'tab'作为分隔符将其放入一个数字文件,但它将文件读为一列.有人对此有想法吗?
Then I tried to put it into a gnumeric file using 'tab' as a delimiter, but it read the file as one column. Does anyone have an idea on this?
推荐答案
如果 df = pd.read_csv('file.txt',sep ='\ t')
返回带有一列的DataFrame,则显然 file.txt
并未使用制表符作为分隔符.您的数据可能仅使用空格作为分隔符.在这种情况下,您可以尝试
If df = pd.read_csv('file.txt', sep='\t')
returns a DataFrame with one column, then apparently file.txt
is not using tabs as separators. Your data might simply have spaces as separators. In that case you could try
df = pd.read_csv('data', sep=r'\s{2,}')
使用正则表达式模式 \ s {2,}
作为分隔符.此正则表达式匹配2个或多个空格字符.
which uses the regex pattern \s{2,}
as the separator. This regex matches 2-or-more whitespace characters.
In [8]: df
Out[8]:
SICcode Catcode Category SICname \
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat
1 112 A1600 Other commodities (incl rice, peanuts) Rice
2 115 A1500 Wheat, corn, soybeans and cash grain Corn
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains, NEC
5 131 A1100 Cotton Cotton
6 132 A1300 Tobacco & Tobacco products Tobacco
MultSIC
0 X
1 X
2 X
3 X
4 X
5 X
6 X
如果这不起作用,请发布 print(repr(open(file.txt,'rb').read(100))
.这将向我们清晰显示前100个 file.txt
的字节.
If this does not work, please post print(repr(open(file.txt, 'rb').read(100))
. This will show us an unambiguous representation of the first 100 bytes of file.txt
.
这篇关于 pandas :read_csv表示“以空格分隔"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!