scene
What are the files under Tag1, Tag2 and Tag3 in the many to many relationship table of a file and tag?
id | tag_id | file_id |
---|---|---|
1 | TAG_1 | F1 |
2 | TAG_2 | F1 |
3 | TAG_3 | F1 |
4 | TAG_1 | F2 |
5 | TAG_2 | F2 |
6 | TAG_3 | F3 |
7 | TAG_1 | F3 |
For example, find the compliance TAG_1 and TAG_2 all documents below. Through observation, it can be found that F1 and F3 meet the requirements. Because F3 contains tags, TAG_1, TAG_2, TAG_3. F1 contains TAG_1, TAG_2, TAG_3. F2 contains TAG_2. Therefore, F1 and F3 meet the requirements.
How to find intersection?
There are many ways to get the intersection. The way to get the intersection mentioned in the textbook is to use intersect and then get the intersection. However, many DBMS do not support this method of finding intersection. What should I do? Let's analyze a wave in detail.
Use intersect directly to find the intersection
Intuitively, find TAG_1 file: F1, F3. Get TAG_2 files: F1, F2,F3. Finally, the intersection of the obtained results is obtained to obtain the file ID results F1 and F3.
select file_id from tag_label where tag_id = 'TAG_1' intersect select file_id from tag_label where tag_id = 'TAG_1';
Use join query instead of intersect to find intersection
Because the keyword intersection operation is not owned by all DBMS, connection query can be used in MySQL to obtain the intersection.
-
TAG_1 and TAG_2 connection operation
SELECT TAG_1.`file_id` T1_F, TAG_2.`file_id` T2_F FROM (SELECT * FROM `tag_file` WHERE `tag_id` = 'TAG_1') TAG_1 LEFT JOIN (SELECT * FROM `tag_file` WHERE `tag_id` = 'TAG_2') TAG_2 ON TAG_1.`file_id` = TAG_2.`file_id`;
T1_F T2_F F1 F1 F2 F2 F3 NULL -
TAG_1 and TAG_2 connect query and add filtering operation
SELECT TAG_1.`file_id` FROM (SELECT * FROM `tag_file` WHERE `tag_id` = 'TAG_1') TAG_1 LEFT JOIN (SELECT * FROM `tag_file` WHERE `tag_id` = 'TAG_2') TAG_2 ON TAG_1.`file_id` = TAG_2.`file_id` WHERE TAG_1.`file_id` IS NOT NULL AND TAG_2.`file_id` IS NOT NULL;
file_id F1 F2
Using join query is a costly query operation. Although it can be queried in this way, it is not the best. Second, if multiple tag ID S need to be built, a long SQL query statement needs to be constructed, which is an elegant query operation. To sum up, a new query operation is needed to solve this problem.
Use grouping to find intersections
Because this method of finding intersection is not supported by many DBMS, this method is not feasible. What should I do? It's better to change the idea and group according to the file name (file_id). Then judge whether the file belongs to these tags at the same time according to the length of the tag, so as to obtain the file ID list. It's useless to say more. It's better to do a few experiments.
-
See what files contain a list of tags?
SELECT * FROM tag_file WHERE `tag_id` IN ('TAG_1', 'TAG_2');
id tag_id file_id 1 TAG_1 F1 2 TAG_2 F1 4 TAG_1 F2 5 TAG_2 F2 7 TAG_1 F3 By analyzing these results, it is found that the result in the file id column is consistent with TAG_1 or tag compliant_ 2, the result of union is obtained. But if you want to ask for intersection, how should you proceed?
-
Group by file ID and check the length of tag
SELECT `file_id`, COUNT(`tag_id`) tag_count FROM tag_file WHERE `tag_id` IN ('TAG_1', 'TAG_2') GROUP BY `file_id`;
file_id tag_count F1 2 F2 2 F3 1 According to the nature of GROUP BY, you can know that GROUP BY is grouped according to the queried result set, so you can avoid the performance loss of connection query. Secondly, a rule can be found, because grouping is based on search results, so tag_ The length of count cannot exceed the length of the tag list to be queried. According to the scenario, that is, tag_ The result of count must be less than or equal to 2, and the result with length equal to 2 must meet all labels. Therefore, the intersection can be obtained in this way.
-
Find the list of qualified files
SELECT `file_id` FROM tag_file WHERE `tag_id` IN ('TAG_1', 'TAG_2') GROUP BY `file_id` HAVING COUNT(`tag_id`) = 2;
file_id F1 F2 Finally, the result is obtained in this way.
Construct a dynamic SQL using MyBatis:
<select id="getFileIdListByTagIds" resultType="String" parameterType="java.lang.List"> <bind name="tagListSize" value="tagList.size()" /> SELECT `file_id` FROM tag_file WHERE `tag_id` IN <foreach collection="tagList" item="tagId" open="(" sparator="," close=")"> #{tagId} </foreach> GROUP BY `file_id` HAVING COUNT(`tag_id`) = #{tagListSize}; </select>
Method evaluation
This method is mainly used for intersection query within the same table, but it is not suitable for a wide range of intersection operations.