[Elastic project practice] the nested query matches multiple values with the same element of the array

Posted by Mindwreck on Sat, 11 Dec 2021 02:53:31 +0100

0. Introduction

Before, some students encountered a problem in actual production. The problem itself does not involve the problem of production environment. It is pure DSL. However, because it is actual data, the amount of data will be much larger, which also increases the difficulty of troubleshooting. Let's look at this problem in detail, so that you can experience the difference between the actual production problem and the training problem.

Statement: the sharing has been authorized by the student, and the data has been desensitized

1. Title

Question: you need to query the commodity data. You need to query SKUs wareSkuAttrList. Wareattrvalue, the original data is as follows. Its combination includes 6kg,L, 9,M, 9 and XL, but the combination of 9 and l is also queried during query. It is required to find the error in query.

skus:[
	{wareSkuAttrList:[{wareAttrValue:6KG},{wareAttrValue:L}]},
	{wareSkuAttrList:[{wareAttrValue:9},{wareAttrValue:M}]},
	{wareSkuAttrList:[{wareAttrValue:9},{wareAttrValue:XL}]}
]

Index mappings

PUT test_product
{
  "mappings": {
    "properties": {
      "activityId": {
        "type": "keyword"
      },
      "attrs": {
        "type": "nested",
        "properties": {
          "attrId": {
            "type": "long"
          },
          "attrName": {
            "type": "keyword"
          },
          "attrValue": {
            "type": "keyword"
          }
        }
      },
      "floorName": {
        "type": "keyword"
      },
      "goodPercent": {
        "type": "long"
      },
      "hosStock": {
        "type": "boolean"
      },
      "marketPrice": {
        "type": "float"
      },
      "saleCount": {
        "type": "long"
      },
      "salePrice": {
        "type": "float"
      },
      "skus": {
        "type": "nested",
        "properties": {
          "attrId": {
            "type": "long"
          },
          "attrValue": {
            "type": "keyword"
          },
          "marketPrice": {
            "type": "float"
          },
          "skuId": {
            "type": "keyword"
          },
          "skuImage": {
            "type": "keyword"
          },
          "skuName": {
            "type": "text"
          },
          "skuPrice": {
            "type": "float"
          },
          "wareSkuAttrList": {
            "type": "nested",
            "properties": {
              "attrDefinitionId": {
                "type": "float"
              },
              "attrDefinitionName": {
                "type": "keyword"
              },
              "wareAttrValue": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
} 

Index data

# data
PUT test_product/_doc/1
{
  "activityId": "1469181609600225280",
  "attrs": [
    {
      "attrId": 101,
      "attrName": "Weight type",
      "attrValue": "6kg"
    },
    {
      "attrId": 107,
      "attrName": "size",
      "attrValue": "L"
    },
    {
      "attrId": 101,
      "attrName": "Weight type",
      "attrValue": "9"
    },
    {
      "attrId": 107,
      "attrName": "size",
      "attrValue": "M"
    },
    {
      "attrId": 101,
      "attrName": "Weight type",
      "attrValue": "9"
    },
    {
      "attrId": 107,
      "attrName": "size",
      "attrValue": "XL"
    }
  ],
  "floorName": "",
  "goodPercent": 100,
  "hosStock": true,
  "marketPrice": 500,
  "saleCount": 1,
  "salePrice": 400,
  "skus": [
    {
      "marketPrice": 500,
      "skuId": "3100140",
      "skuImage": "xxx.jpg",
      "skuName": "Test master diagram (6) kg,L)",
      "skuPrice": 400,
      "wareSkuAttrList": [
        {
          "attrDefinitionId": 101,
          "attrDefinitionName": "Weight type",
          "wareAttrValue": "6kg"
        },
        {
          "attrDefinitionId": 107,
          "attrDefinitionName": "size",
          "wareAttrValue": "L"
        }
      ]
    },
    {
      "marketPrice": 600,
      "skuId": "3100141",
      "skuImage": "xxx.jpg",
      "skuName": "Test master diagram (9, M)",
      "skuPrice": 500,
      "wareSkuAttrList": [
        {
          "attrDefinitionId": 101,
          "attrDefinitionName": "Weight type",
          "wareAttrValue": "9"
        },
        {
          "attrDefinitionId": 107,
          "attrDefinitionName": "size",
          "wareAttrValue": "M"
        }
      ]
    },
    {
      "marketPrice": 800,
      "skuId": "3100142",
      "skuImage": "xxx.jpg",
      "skuName": "Test master diagram (9, XL)",
      "skuPrice": 700,
      "wareSkuAttrList": [
        {
          "attrDefinitionId": 101,
          "attrDefinitionName": "Weight type",
          "wareAttrValue": "9"
        },
        {
          "attrDefinitionId": 107,
          "attrDefinitionName": "size",
          "wareAttrValue": "XL"
        }
      ]
    }
  ]
}

Original query
The real original query also contains complex aggregation statements, but the error of this question has nothing to do with aggregation, so it is preferred to filter it out

# query
GET test_product/_search
{
  "from": 0,
  "size": 40,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "activityId": {
              "value": "1469181609600225280",
              "boost": 1
            }
          }
        }
      ],
      "filter": [
        {
          "nested": {
            "query": {
              "nested": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "terms": {
                          "skus.wareSkuAttrList.wareAttrValue": [
                            "9"
                          ],
                          "boost": 1
                        }
                      }
                    ],
                    "adjust_pure_negative": true,
                    "boost": 1
                  }
                },
                "path": "skus.wareSkuAttrList"
              }
            },
            "path": "skus"
          }
        },
        {
          "nested": {
            "query": {
              "nested": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "terms": {
                          "skus.wareSkuAttrList.wareAttrValue": [
                            "L"
                          ],
                          "boost": 1
                        }
                      }
                    ],
                    "adjust_pure_negative": true,
                    "boost": 1
                  }
                },
                "path": "skus.wareSkuAttrList"
                
              }
            },
            "path": "skus"
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

2. Ideas

First, understand the problem clearly. The title is to query the same wareAttrValue field in the array, that is, it is for the query scenario where the same element of the array matches multiple values. The reason why the original query queries the combination of 9 and l is that the data contains 9, so it is output when a condition is matched.

In fact, when I saw the topic, my first reaction was that it was a nested structure. Was it caused by not using nested query? This kind of situation is quite common. However, looking at the query statement, I found that nested query has been used, which is not the case

For initial contact with such troubleshooting, there is a preprocessing that can help us quickly locate the problem, that is, delete unnecessary information. First, we look at the information given to us by the topic, including mappings,doc and query statements. We can simplify mappings and query statements first, and doc only needs to be executed.

Note: the real questions are more complex than the mappings,doc and query statements I gave above, so in fact, the above data has been deleted once. During real processing, you should learn to quickly locate the valid information and invalid information, and give priority to eliminating the confirmed invalid information

Post processing mappings

PUT test_product
{
  "mappings": {
    "properties": {
      "skus": {
        "type": "nested",
        "properties": {
          "wareSkuAttrList": {
            "type": "nested",
            "properties": {
              "wareAttrValue": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}

Post processing query

GET test_product/_search
{
  "from": 0,
  "size": 40,
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "query": {
              "nested": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "terms": {
                          "skus.wareSkuAttrList.wareAttrValue": [
                            "9"
                          ],
                          "boost": 1
                        }
                      }
                    ],
                    "adjust_pure_negative": true,
                    "boost": 1
                  }
                },
                "path": "skus.wareSkuAttrList"
              }
            },
            "path": "skus"
          }
        },
        {
          "nested": {
            "query": {
              "nested": {
                "query": {
                  "bool": {
                    "must": [
                      {
                        "terms": {
                          "skus.wareSkuAttrList.wareAttrValue": [
                            "L"
                          ],
                          "boost": 1
                        }
                      }
                    ],
                    "adjust_pure_negative": true,
                    "boost": 1
                  }
                },
                "path": "skus.wareSkuAttrList"
              }
            },
            "path": "skus"
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

If you are interested here, you can try to solve this problem first, and then look down

Now it's clear to check the original query. First, we check the syntax of the nested query and find that there is no problem. Then we write a separate query that matches multiple values for array elements

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "tags": "xxx"
          }
        },
        {
          "term": {
            "tags": "yyy"
          }
        }
      ]
    }
  }
}

Looking at the above query, it is found that although must is also written, it is two separate must, which leads to the failure of the function of must. It is equivalent to writing two filter s and meeting one of them. Therefore, this is why the original query can query the combination of 9 and l, because 9 matches.

Therefore, our next idea is clear, that is, we should combine the two must in the original index into one must. The remaining difficulty is to query the familiarity with the grammar of nested. If we are not familiar with the grammar, we may make mistakes. Here, we suggest to check the official documents more and compare writing

3 answer

The adjusted query statement is as follows

GET test_product/_search
{
  "from": 0,
  "size": 40,
  "query": {
    "bool": {
      "filter": [
        {
         "nested": {
           "path": "skus",
           "query": {
             "bool": {
               "must": [
                 {
                   "nested": {
                     "path": "skus.wareSkuAttrList",
                     "query": {
                       "term": {
                         "skus.wareSkuAttrList.wareAttrValue": {
                           "value": "9"
                         }
                       }
                     }
                   }
                 },
                 {
                   "nested": {
                     "path": "skus.wareSkuAttrList",
                     "query": {
                       "term": {
                         "skus.wareSkuAttrList.wareAttrValue": {
                           "value": "XL"
                         }
                       }
                     }
                   }
                 }
               ]
             }
           }
         }
        }
      ]
    }
  }
}

4 test

When the query condition is changed to 9,XL, the query result matches the result


When the query condition is changed to 9 and l, there is no matching result, which meets the requirements and passes the test

Topics: ElasticSearch