October 15, 2019

jq

jq is a very useful tool to process JSON.

The idea of jq is centered on the idea of pipe and data stream.

Note below that a JSON entity is a self-contained and complete JSON object/array.

Pipes

  • When jq is invoked, the input JSON file is parsed as one or multiple JSON entities.
    • The input file can have multiple JSON entities, separated by new lines or spaces. One stream is created for each JSON entity.
    • If multiple files are supplied to jq, each file should contain one or more self-contained JSON entities.
    • To combine all the multiple input JSON entities into one big JSON entity, pass the “–slurp” flag to jq. It will automatically create a containing JSON array and contain all input JSON entities into it. This includes the case of multiple input JSON entities in one input file, or multiple input files.
  • In short, jq is started with one or multiple data streams which are from the input JSON entities.
  • Each stream is then passed through the specified filter independently.
  • One can pass the -c (compact) flag to jq to print each stream as one line.
  • Some key concepts of the filter.
    • . means the entire input data stream
    • .[] moves into the JSON object/array one layer, and splits the data stream into multiple streams, one for each entry in the object or array. For example. {"a":1,"b":2} will broken into two data streams after passing this filter, becoming 1 and 2 respectively. Note that they are the values of the entires and are not valid JSON anymore, which is okay but needs to be noted. The same streaming-splitting will also happen for JSON arrays after passing this filter.
    • .["key"] generates one stream from the input stream. The shorthand is .key;
    • , can be used to manually generate multiple streams.
    • To collect multiple streams into one stream, one can use [ ] to generate an array, or use { } to generate an object. Note that to generate an object, one has to manually specify the keys, e.g.{"key1":.a, "key2":.b}, or use the shorthand {key1,key2} which is the same as {"key1":.key1,"key2":.key2};
    • ( ) can be used to instruct jq which part of the filter to be processed first, but it does not remove/combine/split the data streams.
    • | has lower priority than ,. Therefore, 1,2,3|3 will produce three streams first 1,2,3, respectively, and then pass each stream to the second filter 3, with the end result of 3 streams of 3. If we use (), we can change that order. 1,2,(4|3) will process 4|3 first, generating 3, then processing 1,2,3, generating exact that as 3 streams of outoput. 1|4,2,3 will process 1 first, then passing to 4,2,3 three streams, causing the ending result to be 4,2,3 3 streams.
    • The filter is not a programming language. Usually stream data is not outputted in the middle of the filter processing. It usually is only outputted at the end of the filter processing, one stream at a time.
    • One can use variable assignment such as length as $array_length | . to have jq remember a interim value and used in later stages of the filter.
    • On the very basic level, data stream is passed to each filter unchanged. For example, .|.|.|.|. is the same as .
    • map(x) takes an array as input, and outputs a new array, after running that filter for each element of the input array.
    • The select() function can be used to remove undesired streams. [1,2,3] | map(select(. >= 2)) will give you [2,3]
    • One can use if-then-else-end in the filter. Note that the else clause must be there. Sometimes the empty function can help.
    • a//b The alternative operator can also be useful. It outputs a, if a is not false or null, otherwise it outputs b.
    • input_filename can be used to find the input file name.
    • One can use numbers/objects/arrays/etc. to select results from multiple streams
  • The final out is the multiple streams, one by one.

Tags: jq

No comments:

Post a Comment