Replace value with jolt (1 / 0 - > male / female)

Posted by msound on Sat, 18 Dec 2021 05:13:28 +0100

Scenario requirements

Now there is a set of data in JSON format as follows. Maybe for various reasons, the sex field indicating gender does not use direct values such as men and women. Then the boss said, "I don't want 1 / 0, you can replace it with Chinese characters that I can understand."

[{
    "id": 1,
    "name": "Zhang San",
    "sex": "1"
}, {
    "id": 2,
    "name": "Li Si",
    "sex": "0"
}]

Then the boy ventured to ask the boss, do you want the following data, right? Then the boss nodded happily, patted you on the shoulder and said, "young man, I'll take good care of you. I'll add a shift to finish it tonight!"

[{
    "id": 1,
    "name": "Zhang San",
    "sex": "male"
}, {
    "id": 2,
    "name": "Li Si",
    "sex": "female"
}]

The boy thought, "sprinkle water, small case, just a few lines of code.".

Then the boss said, "Oh, by the way, I don't want you to write code to solve this problem. I use the jolt library to solve this problem. There are ready-made jolt components in Apache NIFI. Whether the quality of the code you write is high or not, the universality is really not high. So many people write so much garbage back and forth. When people leave, they throw a pile of broken code. It's not easy to use it, and there are still various problems."

Boy: "OK! OK" (= = I got a C)

JOLT script scheme

The following is the final JOLT script scheme, which can meet the needs of the boss.

[{
    "operation": "shift",
    "spec": {
        "*": {
            "sex": {
                "1": {
                    "#male": "[#4].sex"
                },
                "0": {
                    "#female": "[#4].sex"
                }
            },
            "*": "[#2].&"
        }
    }
}]

design sketch

Explain in detail

JOLT is a library that uses scripting language to process JSON, and the scripting language also uses JSON format. When I was young, I tried to interpret the source code and gave a tutorial. Hahaha (looking back, I can't understand some of the contents of the tutorial, especially those related to walkpath).

JOLT has several operations. Today we use shift. If this operation is not studied in detail, we can simply understand its script: the key in the script JSON matches the field name in your data layer by layer, and then write the matched field value to the position pointed by the value of the script JSON.

shift basic format

[{
    "operation": "shift",
    "spec": {
        // ...  This is the standard format of shift, and the core matching logic and output logic in spec
    }
}]

Original value output script interpretation

Next, let's remove the logic about the replacement of male and female values in the script to see the effect

[{
    "operation": "shift",
    "spec": {
        "*": {
            "*": "[#2].&"
        }
    }
}]

As can be seen in the effect drawing, the data is output as is without change.

Although the data has not been modified, this process must have happened, right. For this simplified script, let's explain the role of some of these symbols.

The meanings of symbols on the left and right are often different, and some are even allowed only on the left or right. In the following symbol explanation, we only refer to the script library in this article. Don't pull it out alone. It's not comprehensive. If you explain it comprehensively, you will be a little confused.

  1. *Wildcard, match any. The first * matches every element in the original JSON array, and the second * matches every key in the element in the original JSON array.
  2. [] means array. The #2 value in the middle represents the subscript of the array. Here, #2 will obtain the array subscript matched by the first * through calculation.
  3. &On the right side, it means to take the key of the original JSON matched to the left side of the current same layer (although it is not rigorous, it is simple to understand first)

To be simpler, the simplified script above without the second * and the right & can be equivalent to the following script:

[{
    "operation": "shift",
    "spec": {
        "*": {
            "id": "[#2].id",
            "name": "[#2].name",
            "sex": "[#2].sex"
        }
    }
}]

design sketch:

Remove the first * and #2 the simplified script above, which can be equivalent to the following script:

[{
    "operation": "shift",
    "spec": {
        "0": {
            "id": "[0].id",
            "name": "[0].name",
            "sex": "[0].sex"
        },
        "1": {
            "id": "[1].id",
            "name": "[1].name",
            "sex": "[1].sex"
        }
    }
}]

design sketch:

So when you see the last wordy and bloated script, it doesn't look so high-end, atmospheric and high-grade. It feels like a script without clothes, which can only match two array elements and bind all field names. You should understand it! (if you don't understand, you can read it again. It's hard to write nonsense)

Male female value replacement script interpretation

Let's look at the script for replacing male and female values separately

[{
    "operation": "shift",
    "spec": {
        "*": {
            "sex": {
                "1": {
                    "#male": "[#4].sex"
                },
                "0": {
                    "#female": "[#4].sex"
                }
            }
        }
    }
}]
  1. The first * matches every element in the original JSON array.
  2. sex matches the element with the field name sex in each element.
  3. Constant values 1 and 0 respectively match the values of sex.
  4. #Male # and female do not mean matching, but output the value after the # symbol as value to the position specified by the script on the right.
  5. [] means array. The #4 value in the middle represents the subscript of the array. Here, #4 will obtain the array subscript matched by the first * through calculation.

design sketch

Removing the first * and #4 the simplified script above can be equivalent to the following script:

[{
    "operation": "shift",
    "spec": {
        "0": {
            "sex": {
                "1": {
                    "#Male ":" [0] sex"
                },
                "0": {
                    "#Female ":" [0] sex"
                }
            }
        },
        "1": {
            "sex": {
                "1": {
                    "#Male ":" [1] sex"
                },
                "0": {
                    "#Female ":" [1] sex"
                }
            }
        }
    }
}]

Renderings (no paste, just skin).

last

According to my practical experience, you don't have to understand the jolt script very clearly, and you don't have to memorize it deliberately. Collect more and save more classic examples. When you really need it, first list your original JSON value and expected JSON value, and keep trying the script against the collected examples.

JOLT in NIFI