将CSV格式的大纲格式转换为两列 [英] Convert Outline format in CSV to Two Columns

查看：113 发布时间：2020/5/24 2:19:59 python pandas

本文介绍了将CSV格式的大纲格式转换为两列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我具有以下格式的CSV文件中的数据(dataframe中的一列).从本质上讲，这就像Word文档中的大纲，我在这里显示的标题是字母，是主要标题，而数字项是子标题:

I have data in a CSV file of the following format (one column in a dataframe). This is essentially like an outline in a Word document, where the headers I've shown here are letters are the main headers, and the items as numbers are subheaders:

我想将此转换为以下格式(dataframe中的两列):

I want to convert this to the following format (two columns in a dataframe):

我正在使用pandas read_csv将数据转换为dataframe，并且尝试重新格式化for循环，但是由于数据重复并被覆盖，我遇到了困难.例如，在循环的稍后部分，A 3将被C 3覆盖(当只需要一个C 3时将导致两个C 3实例，并且完全丢失A 3).最好的方法是什么?

I'm using pandas read_csv to convert the data into a dataframe, and I'm trying to reformat through for loops, but I'm having difficulty because the data repeats and gets overwritten. For example, A 3 will get overwritten with C 3 (resulting in two instance of C 3 when only one is desired, and losing A 3 altogether) later in the loop. What's the best way to do this?

不好意思的道歉，这是该网站的新功能.

Apologies for poor formatting, new to the site.

推荐答案

使用:

#if no csv header use names parameter
df = pd.read_csv(file, names=['col'])

df.insert(0, 'a', df['col'].mask(df['col'].str.isnumeric()).ffill())

df = df[df['a'] != df['col']]
print (df)
    a col
1   A   1
2   A   2
3   A   3
5   B   1
6   B   2
8   C   1
9   C   2
10  C   3
11  C   4

详细信息:

检查 isnumeric 值:

print (df['col'].str.isnumeric())
0     False
1      True
2      True
3      True
4     False
5      True
6      True
7     False
8      True
9      True
10     True
11     True
Name: col, dtype: bool

通过True替换为NaN s > mask 并向前填充缺失值:

Replace True by NaNs by mask and forward fill missing values:

print (df['col'].mask(df['col'].str.isnumeric()).ffill())
0     A
1     A
2     A
3     A
4     B
5     B
6     B
7     C
8     C
9     C
10    C
11    C
Name: col, dtype: object

通过 :

df.insert(0, 'a', df['col'].mask(df['col'].str.isnumeric()).ffill())
print (df)
    a col
0   A   A
1   A   1
2   A   2
3   A   3
4   B   B
5   B   1
6   B   2
7   C   C
8   C   1
9   C   2
10  C   3
11  C   4

，最后通过 boolean indexing删除具有相同值的行.

and last remove rows with same values by boolean indexing.

这篇关于将CSV格式的大纲格式转换为两列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将CSV格式的大纲格式转换为两列 [英] Convert Outline format in CSV to Two Columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将CSV格式的大纲格式转换为两列 [英] Convert Outline format in CSV to Two Columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭