jq-csv到数组的json的bash脚本吗? [英] jq - bash script for csv to json with arrays?

查看:103
本文介绍了jq-csv到数组的json的bash脚本吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用jq来解析来自curl响应的JSON,它对此非常酷.

I've been using jq to parse JSON from curl responses and it's been super cool for that.

我现在想做的是,将已整理成CSV格式的各种信息收集到JSON中,以便可以将其卷曲到API.这是我尝试以编程方式复制的JSON示例:

What I'm trying to do now is take various bits of information I've cobbled together in a CSV and get it into JSON so that I can curl it to an API. Here's my example JSON that I'm trying to replicate programmatically:

{
  "title": "New Story",
  "channels": [
    {
      "id": "65tyc2TLUZsO"
    }
  ],
  "description": "Story Description Here",
  "new_files": [
    {
      "filename": "419155345a7b449df3baca76694b64efbec9bcf3983b51e02f92e7ef29fc26ee.pptx",
      "description": "File Description ABC"
    },
    {
    "filename": "5cdd90989c03d3fb619df6f9294b1fcb537b4f3b55737465930b220507f30e75.pdf",
    "description":"File Description XYZ"
    }
  ]
}

标题,id,描述,文件名和(文件)描述值均以CSV格式表示.我不知道的是如何从已有的东西生成JSON.

The title, id, description, filename, and (file) description values are all in a CSV. What I can't figure out is how to generate JSON from what I've got.

鉴于以下原因,我不知道如何格式化CSV文件:
1.我在顶层(标题,描述)有一些key:value对,然后还有一些带有key:value对的数组.我可以告诉您,渠道"数组中将始终有1个值.
2. new_files数组中不会有相同数量的文件,因此能够动态构建该数组会很棒.

I don't know how I should be formatting my CSV file given that:
1. I have some key:value pairs at the top level (title, description) and then some arrays with key:value pairs as well. I can tell you I will always have 1 value in the "channels" array.
2. I won't have the same number of files in the new_files array so being able to dynamically build that array would be great.

如果有人能给我提供有关此类事情的教程,那真是太棒了.我确定我不是第一个尝试此操作的人.我正在为这些东西使用bash脚本(这是我所知道的),但是我不反对其他解决方案. (我需要花更长的时间来学习.)

If anyone's got a tutorial to point me to on this kind of thing, that'd be awesome. I'm sure I'm not the first to try this. I'm using bash scripts for this stuff (as it's what I know) but I'm not opposed to other solutions. (It'd just take longer for me to learn.)

总体上,我了解我想做什么以及从哪里获取信息,应该在哪里拥有哪些变量等,我只是在深入了解实施细节方面遇到了一些问题.

At a high level I understand what I want to do and where to pull my information from, what variables I should have where etc, I just have some issues with getting down and dirty with the implementation details.

推荐答案

也许这会有所帮助.首先编写一个函数,将输出的嵌套表示形式转换为简单的函数:

Perhaps this will help. Start by writing a function to go from your output's nested representation to a flat one:

def flatten:
    {
      title: .title,
      id:    .channels[].id,
      story: .description,
    } + .new_files[]
;

这会将您的示例json转换为对象流:

This will convert your sample json to a stream of objects:

{
  "title": "New Story",
  "id": "65tyc2TLUZsO",
  "story": "Story Description Here",
  "filename": "419155345a7b449df3baca76694b64efbec9bcf3983b51e02f92e7ef29fc26ee.pptx",
  "description": "File Description ABC"
}
{
  "title": "New Story",
  "id": "65tyc2TLUZsO",
  "story": "Story Description Here",
  "filename": "5cdd90989c03d3fb619df6f9294b1fcb537b4f3b55737465930b220507f30e75.pdf",
  "description": "File Description XYZ"
}

可以像这样轻松地转换为csv

Which can be easily converted to csv like

"New Story","65tyc2TLUZsO","Story Description Here","419155345a7b449df3baca76694b64efbec9bcf3983b51e02f92e7ef29fc26ee.pptx","File Description ABC"
"New Story","65tyc2TLUZsO","Story Description Here","5cdd90989c03d3fb619df6f9294b1fcb537b4f3b55737465930b220507f30e75.pdf","File Description XYZ"

使用类似以下的过滤器:

with a filter like:

  flatten
| [.title, .id, .story, .filename, .description ]
| @csv

要从此csv表示形式转换为对象流,可以将jq -s-R选项与类似功能一起使用

To go from this csv representation to a stream of objects you can use the jq -s and -R option with a function like

def readcsv:
      split("\n")
    | .[]
    | select(length > 0)  
    | split(",")
    | map(fromjson)
    | {
          title:       .[0]
        , id:          .[1]
        , story:       .[2]
        , filename:    .[3]
        , description: .[4]
      }
;

并将对象流重新组合到原始json中,您可以使用类似的功能

and to reassemble that object stream into your original json you can use a function like

def unflatten:
      group_by(.title)
    | .[]
    | {
          title:       .[0].title
        , description: .[0].story
      }
      + { channels:    map(.id) | unique | map({id:.}) }
      + { new_files:   map({filename, description}) | unique }
;

通过组合过滤器

  [ readcsv ]
| unflatten

这些功能对数据中字段之间的关系进行了一些假设,可能需要进行审查.特别是,您可能不想像我一样完全对id和filename/description列进行非规范化.但是一旦有了像这样的工具就可以将嵌套的json来回转换为平面csv,则可以尝试每种表示形式,直到满意为止.

These functions made some assumptions about the relationships between the fields in your data that will likely need review. In particular, you probably don't want to completely denormalize the id and filename/description columns as I did. But once you have tools like these to convert back and forth from nested json to flat csv you can experiment with each representation until you're satisified.

这篇关于jq-csv到数组的json的bash脚本吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆