- Because an interface is too slow... Try to optimize (/ baldness)
- Spring data used- mongo:2.1.3
- mongo driver: 3.8.2
- You can see me First query sentence.
Optimization ideas
paging
- Original page query
- Due to the large amount of data, the query was slow, and it was later found that there was no need to use aggregation at all. With aggregate queries, even with indexes, a large amount of useless data is traversed. And the efficiency of in query itself is not very high
- So I changed this aggregation to multiple queries, and used findFirst() to query the first one and return it directly. Due to the existence of connection pool, query efficiency has been improved a lot.
Query Ontology
-
First, match is optimized. Reference blog
- $where and $exists: these two operators cannot use indexes at all.
- $ne and $not: Generally speaking, inversions are not equal to, and indexes can be used, but the efficiency is extremely low, not very effective, and often degenerates to scan the whole table.
- $nin: No, this operator will always scan the whole table
- For the index in the pipeline, it is also easy to be unexpected. Only the match sort at the beginning of the pipeline can be used to index. Once there are project projection, group grouping, lookup table Association, unwind break and other operations, the index cannot be used at all.
-
So let's optimize all the empty files in the database to
- db.workflow.update({"parentWorkflowId":{$exists:false}},{$set: {"parentWorkflowId": ""}},{"multi" : true, "upsert" : false})
- "Remove $or from the original query, leaving only parentWorkflowId:" "",
-
Adjust sorting in group
- The cond sorting in the group is modified to put the one with the largest amount of data and the easiest to match in front.
-
$in no adjustment here (tentative)
- Because the data in in is uncertain. At present, it is not sure whether the modification to multiple queries is valid. $in is actually $or.
- Plus the need to sort by time. Splitting up multiple times can cause data problems.
New index
-
grammar
- db.collection.createIndex({"xxx":1,"yyy":-1});
-
Index selection
- First of all, I built multiple indexes with high performance
- use db.collection.explain().aggregate(). To select an index.
- The winningPlan in explain() shows the better index
- Query again after modification
Stability problem
- After the modification, I found that the query stability was very poor. The difference between the fast and the slow is 2-5s.
- To troubleshoot connection pooling: db.serverStatus().connections (admin user), and no new or disconnected discipline found in the log
Troubleshooting with Profiling
-
Use the admin user to start the profile. If the setting is more than 500ms, it is slow operation. Record.
- db.setProfilingLevel(1,500).
-
View data after multiple executions:
- The index of every call of the same query is the same.. (look at the whole person)
- Modify the code to specify the index through hint. (it's like spitting blood)
Document projectSet = Document.parse("{projectId:1, process:1, startTime:1, parentWorkflowId:1}"); projectSet.put("id", "$_id"); Document matchSet = Document.parse("{parentWorkflowId: '',projectId:{" + "$in: " + listStringToString(projectString) + "}}"); List<Document> aggregateList = new ArrayList<>(); aggregateList.add(new Document("$project", projectSet)); aggregateList.add(new Document("$match", matchSet)); aggregateList.add(new Document("$group", group())); aggregateList.add(new Document("$sort", Document.parse("{maxStartTime:-1}"))); aggregateList.add(new Document("$skip", ((page - 1) * limit))); aggregateList.add(new Document("$limit", limit)); AggregateIterable<Map> cursors = mongoTemplate .getCollection(MongoCollection.WORK_FLOW) .aggregate(aggregateList, Map.class) //Specified index, no error will be reported in the index .hint(Document.parse("{ parentWorkflowId: 1, projectId: 1 }")); // . allowDiskUse(true); double the speed; don't drive until the number is too small for sort.
First query
First query page
[{ "$project": { "projectId": NumberInt("1"), "process": NumberInt("1"), "startTime": NumberInt("1"), "parentWorkflowId": NumberInt("1"), "id": "$_id" } }, { "$match": { "$or": [ { "parentWorkflowId": { "$exists": false } }, { "parentWorkflowId": "" } ], "projectId": { "$in": [ "ba8ec7bb4dc19a501f296186706c1e31", "597a937634f8716cad98c66dbb376792", "18294187351fc4310470f531ca57450f", "847d91f66357e3c03dd1608b5ba49532", "6b4ccbffcb258489789c8b2a4520166e", "89ba378a355479f0d57fbc4ed7c8f22e", "0f76097d86bf19458ed5c7147bdd55c8", "9b2c5fd3a1a76f350fbb642c5b3dee56", "cfd10e186b177d7d165c3f4f8bc68938", "8fb8f7aeddd89230f5e62713bf1845a0" ] } } }, { "$group": { "_id": "$projectId", "count":{$sum:1} } }
First query Ontology
[{ "$project": { "projectId": NumberInt("1"), "process": NumberInt("1"), "startTime": NumberInt("1"), "parentWorkflowId": NumberInt("1"), "id": "$_id" } }, { "$match": { "$or": [ { "parentWorkflowId": { "$exists": false } }, { "parentWorkflowId": "" } ] "projectId": { "$in": [ "ba8ec7bb4dc19a501f296186706c1e31", "597a937634f8716cad98c66dbb376792", "18294187351fc4310470f531ca57450f", "847d91f66357e3c03dd1608b5ba49532", "6b4ccbffcb258489789c8b2a4520166e", "89ba378a355479f0d57fbc4ed7c8f22e", "0f76097d86bf19458ed5c7147bdd55c8", "9b2c5fd3a1a76f350fbb642c5b3dee56", "cfd10e186b177d7d165c3f4f8bc68938", "8fb8f7aeddd89230f5e62713bf1845a0" ] } } }, { "$group": { "_id": "$projectId", "endNum": { "$sum": { "$cond": { "if": { "$eq": [ "$process", NumberInt("5") ] }, "then": NumberInt("1"), "else": NumberInt("0") } } }, "stopNum": { "$sum": { "$cond": { "if": { "$eq": [ "$process", NumberInt("4") ] }, "then": NumberInt("1"), "else": NumberInt("0") } } }, "runningNum": { "$sum": { "$cond": { "if": { "$eq": [ "$process", NumberInt("1") ] }, "then": NumberInt("1"), "else": NumberInt("0") } } }, "others": { "$sum": { "$cond": { "if": { "$in": [ "$process", [ NumberInt("6"), NumberInt("3"), NumberInt("2"), NumberInt("7"), NumberInt("8") ] ] }, "then": NumberInt("1"), "else": NumberInt("0") } } }, "maxStartTime": { "$max": "$startTime" }, "allProcess": { "$sum": NumberInt("1") } } }, { "$sort": { "maxStartTime": NumberInt("-1") } }, { "$skip": NumberLong("0") }, { "$limit": NumberLong("10") }]