将CSV格式的大纲格式转换为两列 [英] Convert Outline format in CSV to Two Columns
问题描述
我具有以下格式的CSV文件中的数据(dataframe
中的一列).从本质上讲,这就像Word文档中的大纲,我在这里显示的标题是字母,是主要标题,而数字项是子标题:
I have data in a CSV file of the following format (one column in a dataframe
). This is essentially like an outline in a Word document, where the headers I've shown here are letters are the main headers, and the items as numbers are subheaders:
- A
- 1
- 2
- 3
- B
- 1
- 2
- C
- 1
- 2
- 3
- 4
- A
- 1
- 2
- 3
- B
- 1
- 2
- C
- 1
- 2
- 3
- 4
我想将此转换为以下格式(dataframe
中的两列):
I want to convert this to the following format (two columns in a dataframe
):
- A 1
- A 2
- A 3
- B 1
- B 2
- C 1
- C 2
- C 3
- C 4
我正在使用pandas
read_csv
将数据转换为dataframe
,并且尝试重新格式化for循环,但是由于数据重复并被覆盖,我遇到了困难.例如,在循环的稍后部分,A 3
将被C 3
覆盖(当只需要一个C 3
时将导致两个C 3
实例,并且完全丢失A 3
).最好的方法是什么?
I'm using pandas
read_csv
to convert the data into a dataframe
, and I'm trying to reformat through for loops, but I'm having difficulty because the data repeats and gets overwritten. For example, A 3
will get overwritten with C 3
(resulting in two instance of C 3
when only one is desired, and losing A 3
altogether) later in the loop. What's the best way to do this?
不好意思的道歉,这是该网站的新功能.
Apologies for poor formatting, new to the site.
推荐答案
使用:
#if no csv header use names parameter
df = pd.read_csv(file, names=['col'])
df.insert(0, 'a', df['col'].mask(df['col'].str.isnumeric()).ffill())
df = df[df['a'] != df['col']]
print (df)
a col
1 A 1
2 A 2
3 A 3
5 B 1
6 B 2
8 C 1
9 C 2
10 C 3
11 C 4
详细信息:
检查 isnumeric
值:
print (df['col'].str.isnumeric())
0 False
1 True
2 True
3 True
4 False
5 True
6 True
7 False
8 True
9 True
10 True
11 True
Name: col, dtype: bool
通过True替换为NaN
s > mask
并向前填充缺失值:
Replace True
by NaN
s by mask
and forward fill missing values:
print (df['col'].mask(df['col'].str.isnumeric()).ffill())
0 A
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
10 C
11 C
Name: col, dtype: object
df.insert(0, 'a', df['col'].mask(df['col'].str.isnumeric()).ffill())
print (df)
a col
0 A A
1 A 1
2 A 2
3 A 3
4 B B
5 B 1
6 B 2
7 C C
8 C 1
9 C 2
10 C 3
11 C 4
,最后通过 boolean indexing
删除具有相同值的行.
and last remove rows with same values by boolean indexing
.
这篇关于将CSV格式的大纲格式转换为两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!