使用Notepad++、Regex重构CSV数据 [英] Restructure CSV data with Notepad++, Regex

查看：19 发布时间：2022/3/8 19:24:42 regex notepad++

本文介绍了使用Notepad++、Regex重构CSV数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含以下标头和(示例)数据的CSV文件：

StopName,RouteName,Travel_Direction,Latitude,Longitude
StreetA @ StreetB,1 NameA,DirectionA,Lat,Long
StreetC @ StreetD,1 NameA,DirectionA,Lat,Long
...
StreetE @ StreetF,1 NameA,DirectionB,Lat,Long
StreetG @ StreetH,1 NameA,DirectionB,Lat,Long
...
StreetI @ StreetJ,2 NameB,DirectionC,Lat,Long
StreetK @ StreetL,2 NameB,DirectionC,Lat,Long
...
StreetM @ StreetN,2 NameB,DirectionD,Lat,Long
StreetO @ StreetP,2 NameB,DirectionD,Lat,Long
.
.
.

我想使用regex(当前在Notepad++中)获得以下结果：

1 NameA - DirectionA=[[StreetA @ StreetB,[Lat,Long]], [StreetC @ StreetD,[Lat,Long]], ...]
1 NameA - DirectionB=[[StreetD @ StreetE,[Lat,Long]], [StreetF @ StreetG,[Lat,Long]], ...]
2 NameB - DirectionC=[[StreetH @ StreetI,[Lat,Long]], [StreetJ @ StreetK,[Lat,Long]], ...]
2 NameB - DirectionD=[[StreetL @ StreetM,[Lat,Long]], [StreetN @ StreetO,[Lat,Long]], ...]
.
.
.

使用正则表达式和替换，

RgX: ^([^,]*),([^,]*),([^,]*),(.*)
Sub: $2 - $3=[$1,[4]]

Demo: https://regex101.com/r/gS9hD6/1

我已经走到这一步了：

1 NameA - DirectionA=[StreetA @ StreetB,[Lat,Long]]
1 NameA - DirectionA=[StreetC @ StreetD,[Lat,Long]]
1 NameA - DirectionB=[StreetE @ StreetF,[Lat,Long]]
1 NameA - DirectionB=[StreetG @ StreetH,[Lat,Long]]
2 NameB - DirectionC=[StreetI @ StreetJ,[Lat,Long]]
2 NameB - DirectionC=[StreetK @ StreetL,[Lat,Long]]
2 NameB - DirectionD=[StreetM @ StreetN,[Lat,Long]]
2 NameB - DirectionD=[StreetO @ StreetP,[Lat,Long]]

在新的正则表达式中，我尝试在"="上拆分以上结果，但不知道从那里开始。

我认为获得所需结果的一种方法是保留"="之前的第一个唯一实例，用"，"替换新行，并用[..]括起来若要使其成为数组形式，请执行以下操作。

编辑： 总共约有10k个停靠站，但只有大约100条唯一路线。

编辑2：(可能我现在要求的更改太多)

对于第一个正则表达式：

如果我要使用" 而不是"="？

在第二次正则表达式替换开始时，

如果我只有RouteName和StopName列，怎么办，如下所示：1 NameA - DirectionA=[StreetA @ StreetB, ...]？
类似地，如果我只有RouteName和坐标会怎么样，如下所示： 1 NameA - DirectionA=[[Lat,Long]]？

步骤

%1。首次更换：

查找内容：^([^,]*),([^,]*),([^,]*),(.*)
替换为：2 - 3=[[1,[4]]]
全部替换

2。第二次更换：

查找内容：^[Ss]*?^([^][]*=)[[.*]]K]R1[(.*)]$
替换为：, 2]
全部替换

3。重复步骤%2，直到不再出现为止。

这意味着如果同一密钥(路径-方向对)有100个实例(停靠点)，您必须单击全部替换7次(ceiling(log2(N)))。

说明

我在步骤1中修改了您的正则表达式，以添加额外的一对括号来括起整个集合。

对于步骤2，它查找同一方向的一对行，将最后一行追加到上一行之后。

^[Ss]*?^([^][]*=)     #Group 1: captures "1 NameA - DirA="
[[.*]]              #matches the set of Stops - "[[StA @ StB,[Lat,Long]], ..."
K                      #keeps the text matched so far out of the match
]R                    #closing "]" and newline
1                      #match next line (if the same route)
[(.*)]$               #and capture the Stop (Group 2)

regex101 Demo for step 1
regex101 Demo for step 2

这篇关于使用Notepad++、Regex重构CSV数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Notepad++、Regex重构CSV数据 [英] Restructure CSV data with Notepad++, Regex

问题描述

推荐答案

步骤

说明

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Notepad++、Regex重构CSV数据 [英] Restructure CSV data with Notepad++, Regex

问题描述

推荐答案

步骤

说明

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭