jq
is a very useful tool to process JSON.
The idea of jq is centered on the idea of pipe
and data stream
.
Note below that a JSON entity is a self-contained and complete JSON object/array.
Pipes
- When jq is invoked, the input JSON file is parsed as one or multiple JSON entities.
- The input file can have multiple JSON entities, separated by new lines or spaces. One stream is created for each JSON entity.
- If multiple files are supplied to jq, each file should contain one or more self-contained JSON entities.
- To combine all the multiple input JSON entities into one big JSON entity, pass the “–slurp” flag to jq. It will automatically create a containing JSON array and contain all input JSON entities into it. This includes the case of multiple input JSON entities in one input file, or multiple input files.
- In short, jq is started with one or multiple data streams which are from the input JSON entities.
- Each stream is then passed through the specified
filter
independently. - One can pass the
-c
(compact) flag to jq to print each stream as one line. - Some key concepts of the filter.
.
means the entire input data stream.[]
moves into the JSON object/array one layer, and splits the data stream into multiple streams, one for each entry in the object or array. For example.{"a":1,"b":2}
will broken into two data streams after passing this filter, becoming1
and2
respectively. Note that they are the values of the entires and are not valid JSON anymore, which is okay but needs to be noted. The same streaming-splitting will also happen for JSON arrays after passing this filter..["key"]
generates one stream from the input stream. The shorthand is.key
;,
can be used to manually generate multiple streams.- To collect multiple streams into one stream, one can use
[ ]
to generate an array, or use{ }
to generate an object. Note that to generate an object, one has to manually specify the keys, e.g.{"key1":.a, "key2":.b}
, or use the shorthand{key1,key2}
which is the same as{"key1":.key1,"key2":.key2}
; ( )
can be used to instruct jq which part of the filter to be processed first, but it does not remove/combine/split the data streams.|
has lower priority than,
. Therefore,1,2,3|3
will produce three streams first 1,2,3, respectively, and then pass each stream to the second filter3
, with the end result of 3 streams of3
. If we use()
, we can change that order.1,2,(4|3)
will process4|3
first, generating3
, then processing1,2,3
, generating exact that as 3 streams of outoput.1|4,2,3
will process1
first, then passing to4,2,3
three streams, causing the ending result to be4,2,3
3 streams.- The filter is not a programming language. Usually stream data is not outputted in the middle of the filter processing. It usually is only outputted at the end of the filter processing, one stream at a time.
- One can use variable assignment such as
length as $array_length | .
to have jq remember a interim value and used in later stages of the filter. - On the very basic level, data stream is passed to each filter unchanged. For example,
.|.|.|.|.
is the same as.
map(x)
takes an array as input, and outputs a new array, after running that filter for each element of the input array.- The
select()
function can be used to remove undesired streams.[1,2,3] | map(select(. >= 2))
will give you[2,3]
- One can use
if-then-else-end
in the filter. Note that theelse
clause must be there. Sometimes theempty
function can help. a//b
The alternative operator can also be useful. It outputs a, if a is not false or null, otherwise it outputs b.input_filename
can be used to find the input file name.- One can use
numbers/objects/arrays/etc.
to select results from multiple streams
- The final out is the multiple streams, one by one.
Tags: jq
No comments:
Post a Comment