在 gsub 期间保留换行符并有选择地缩进字符串 [英] Keep newline character and selectively indent in string during gsub

查看:62
本文介绍了在 gsub 期间保留换行符并有选择地缩进字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原标题:gsub 时保持字符串中的换行符

有一个 post,我尝试将 JSON 转换为 Markdown 无序列表.它几乎完成了,但是有一个我无法处理的模式.如果字符串中包含空格、换行符、空格序列,则它将被视为列表项连字符.如果我尝试使用对换行符的一些引用来避免这种情况,那么一切都不会如我所愿.

There is a post, where I try to convert JSON to markdown unordered lists. It is almost done, but there is a pattern which I can not handle. If a string has a space, newline, space sequence in it, then it will be treated as the list item hyphen. If I try to avoid this using some reference to a newline character, then nothing works as I expect.

输入 JSON:https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
首选输出(两个空格缩进)降价:

Input JSON: https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
Preferred output (two space indentation) markdown:

- Info
  - Python
    - The Ultimate Python Beginner's Handbook
    - Python Like You Mean It
    - Automate the Boring Stuff with Python
    - Data science Python notebooks
  - Frontend
    - CodePen
    - JavaScript - Wikipedia
    - CSS-Tricks
    - Butterick’s Practical Typography
    - Front-end Developer Handbook 2019
    - Using Ethics In Web Design
    - Client-Side Web Development
  - Stack Overflow
  - HUP
  - Hope in Source

为了生成markdown,我使用了以下两个脚本:
generate_md()

To generate the markdown, I use the following two scripts:
generate_md()

library(jsonlite)

generate_md <- function (jsonfile) {
  bmarks_json_lite <- fromJSON(txt = jsonfile)
  level1 <- bmarks_json_lite$children$children[[2]]
  markdown_result <- recursive_func(level = level1)
  return(markdown_result)
}

recursive_func()

recursive_func <- function (level) {
  md_result <- character()
  
  for (i in seq_len(nrow(level))) {
    if (level[i, "type"] == "text/x-moz-place"){
      md_title <- paste0("- ", level[i, "title"], "\n")
    } else if (level[i, "type"] == "text/x-moz-place-container") {
      md_title <- paste0("- ", level[i, "title"], "\n")
      md_recurs <- recursive_func(level = level[i, "children"][[1]])
      
      # >>>>> This is the problematic part. <<<<<
      md_recurs <- gsub("-(?= )", "  -", md_recurs, perl = T)
      md_title <- paste0(md_title, md_recurs)
    }
    
    md_result <- paste0(md_result, md_title)
  }
  
  return(md_result)
}

通过这些函数,我可以实现以下目标(注意 JavaScript 维基百科条目中不必要的空格).我想获得 - JavaScript - Wikipedia 而不是 - JavaScript - Wikipedia.我希望这个例子用连字符和缩进来代表不同的场景,但是,这只是我书签的一小部分.我想举一个最小的例子.

With these functions I can achieve the following (note the unnecessary spaces at the JavaScript Wikipedia entry). I want to get - JavaScript - Wikipedia instead - JavaScript - Wikipedia. I hope this example represents the different scenarios with hyphens and indentation, but still, this is just a fraction of my bookmarks. I wanted to give a minimal example.

cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
                       "381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
                       "76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
                       "bookmarks-example-hyphen.json")))
# Output
- Info
  - Python
    - The Ultimate Python Beginner's Handbook
    - Python Like You Mean It
    - Automate the Boring Stuff with Python
    - Data science Python notebooks
  - Frontend
    - CodePen
    - JavaScript     - Wikipedia
    - CSS-Tricks
    - Butterick’s Practical Typography
    - Front-end Developer Handbook 2019
    - Using Ethics In Web Design
    - Client-Side Web Development
  - Stack Overflow
  - HUP
  - Hope in Source

我修改了 recursive_func 中的 gsub 函数部分,如下所示,没有想要的输出:

I modified the gsub function part in recursive_func as seen below, without the desired output:

md_recurs <- gsub("-(?= )", "  -", md_recurs, perl = T)  # Original
md_recurs <- gsub("(\n)?-(?= )", "  -", md_recurs, perl = T)  # No newlines
md_recurs <- gsub("(-)(?= )(?<=\n)?", "  -", md_recurs, perl = T)  # Same as Original

在 Google 上搜索 regex newline before char gsub site:stackoverflow.com,我找不到这个问题的答案或提示.我也玩过 regex101.com,但找不到正确的路径.

Searching for regex newline before char gsub site:stackoverflow.com on Google, I find no answer or hint to this question. I also played with regex101.com, but could not find the right path.

推荐答案

在仔细考虑问题和字符串的结构并阅读了lookbehind之后,我终于想出了解决方案.

After I thought over the problem and the structure of the string and read about lookbehind I finally came up with the solution.

md_recurs 行需要修改为:

md_recurs <- gsub("(?<!(\\w ))-(?= )", "  -", md_recurs, perl = T)

这意味着必须将 gsub() pattern 参数修改为:

Which means the gsub() pattern parameter had to be modified to:

(?<!(\\w ))-(?= )

这意味着:

  • 替换一个连字符 -(两个空格和一个连字符 -)
  • 如果前面没有字符串和空格 (?<!(\\w ))
  • 如果后面没有空格(?=).
  • replace a hyphen - (to two space and a hyphen -)
  • if it is not preceded by a word string and a space (?<!(\\w )) and
  • if it is not followed by a space (?= ).

这篇关于在 gsub 期间保留换行符并有选择地缩进字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆