在 gsub 期间保留换行符并有选择地缩进字符串 [英] Keep newline character and selectively indent in string during gsub
问题描述
原标题:gsub 时保持字符串中的换行符
有一个 post,我尝试将 JSON 转换为 Markdown 无序列表.它几乎完成了,但是有一个我无法处理的模式.如果字符串中包含空格、换行符、空格序列,则它将被视为列表项连字符.如果我尝试使用对换行符的一些引用来避免这种情况,那么一切都不会如我所愿.
There is a post, where I try to convert JSON to markdown unordered lists. It is almost done, but there is a pattern which I can not handle. If a string has a space, newline, space sequence in it, then it will be treated as the list item hyphen. If I try to avoid this using some reference to a newline character, then nothing works as I expect.
输入 JSON:https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
首选输出(两个空格缩进)降价:
Input JSON: https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
Preferred output (two space indentation) markdown:
- Info
- Python
- The Ultimate Python Beginner's Handbook
- Python Like You Mean It
- Automate the Boring Stuff with Python
- Data science Python notebooks
- Frontend
- CodePen
- JavaScript - Wikipedia
- CSS-Tricks
- Butterick’s Practical Typography
- Front-end Developer Handbook 2019
- Using Ethics In Web Design
- Client-Side Web Development
- Stack Overflow
- HUP
- Hope in Source
为了生成markdown,我使用了以下两个脚本:generate_md()
To generate the markdown, I use the following two scripts:
generate_md()
library(jsonlite)
generate_md <- function (jsonfile) {
bmarks_json_lite <- fromJSON(txt = jsonfile)
level1 <- bmarks_json_lite$children$children[[2]]
markdown_result <- recursive_func(level = level1)
return(markdown_result)
}
recursive_func()
recursive_func <- function (level) {
md_result <- character()
for (i in seq_len(nrow(level))) {
if (level[i, "type"] == "text/x-moz-place"){
md_title <- paste0("- ", level[i, "title"], "\n")
} else if (level[i, "type"] == "text/x-moz-place-container") {
md_title <- paste0("- ", level[i, "title"], "\n")
md_recurs <- recursive_func(level = level[i, "children"][[1]])
# >>>>> This is the problematic part. <<<<<
md_recurs <- gsub("-(?= )", " -", md_recurs, perl = T)
md_title <- paste0(md_title, md_recurs)
}
md_result <- paste0(md_result, md_title)
}
return(md_result)
}
通过这些函数,我可以实现以下目标(注意 JavaScript 维基百科条目中不必要的空格).我想获得 - JavaScript - Wikipedia
而不是 - JavaScript - Wikipedia
.我希望这个例子用连字符和缩进来代表不同的场景,但是,这只是我书签的一小部分.我想举一个最小的例子.
With these functions I can achieve the following (note the unnecessary spaces at the JavaScript Wikipedia entry). I want to get - JavaScript - Wikipedia
instead - JavaScript - Wikipedia
. I hope this example represents the different scenarios with hyphens and indentation, but still, this is just a fraction of my bookmarks. I wanted to give a minimal example.
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
"76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
"bookmarks-example-hyphen.json")))
# Output
- Info
- Python
- The Ultimate Python Beginner's Handbook
- Python Like You Mean It
- Automate the Boring Stuff with Python
- Data science Python notebooks
- Frontend
- CodePen
- JavaScript - Wikipedia
- CSS-Tricks
- Butterick’s Practical Typography
- Front-end Developer Handbook 2019
- Using Ethics In Web Design
- Client-Side Web Development
- Stack Overflow
- HUP
- Hope in Source
我修改了 recursive_func
中的 gsub
函数部分,如下所示,没有想要的输出:
I modified the gsub
function part in recursive_func
as seen below, without the desired output:
md_recurs <- gsub("-(?= )", " -", md_recurs, perl = T) # Original
md_recurs <- gsub("(\n)?-(?= )", " -", md_recurs, perl = T) # No newlines
md_recurs <- gsub("(-)(?= )(?<=\n)?", " -", md_recurs, perl = T) # Same as Original
在 Google 上搜索 regex newline before char gsub site:stackoverflow.com
,我找不到这个问题的答案或提示.我也玩过 regex101.com,但找不到正确的路径.
Searching for regex newline before char gsub site:stackoverflow.com
on Google, I find no answer or hint to this question. I also played with regex101.com, but could not find the right path.
推荐答案
在仔细考虑问题和字符串的结构并阅读了lookbehind之后,我终于想出了解决方案.
After I thought over the problem and the structure of the string and read about lookbehind I finally came up with the solution.
md_recurs
行需要修改为:
md_recurs <- gsub("(?<!(\\w ))-(?= )", " -", md_recurs, perl = T)
这意味着必须将 gsub()
pattern
参数修改为:
Which means the gsub()
pattern
parameter had to be modified to:
(?<!(\\w ))-(?= )
这意味着:
- 替换一个连字符
-
(两个空格和一个连字符-
) - 如果前面没有字符串和空格
(?<!(\\w ))
和 - 如果后面没有空格
(?=)
.
- replace a hyphen
-
(to two space and a hyphen-
) - if it is not preceded by a word string and a space
(?<!(\\w ))
and - if it is not followed by a space
(?= )
.
这篇关于在 gsub 期间保留换行符并有选择地缩进字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!