[lucene-plus] initialize index

Posted by bhavin_85 on Sat, 09 Oct 2021 18:53:03 +0200

* lucene-plus relies on spring-boot 2.xx implementation, and students using spring or other spring-boot projects can adjust the source code, source coordinates according to their needs: lucene-plus: Enjoy lucene's silky operation by encapsulating common CRUD s based on lucene.

In principle, it is not recommended to download the source code directly because students who need it can adjust and develop the master branch of the fork project. Lucene does not have the concept of "initializing the index" itself and is new everywhere. This gives me a very uncomfortable experience, so the first function to implement when creating lucene-plus is "initializing the index", why is "lucene-plus"? Because I just did top-level encapsulation based on Lucene without adjusting any source code for lucene. Here's how lucene-plus initializes indexing.

1. Introduce maven coordinates:

   <dependency>
      <groupId>cn.juque</groupId>
      <artifactId>lucene-plus</artifactId>
      <version>1.0-SNAPSHOT</version>
    </dependency>

2. application-${profile}.yml file add configuration:

# Specify the directory in which the index is located
lucene:
  index:
    directory: D:\DOC\multiFile\index\

3. Add a scanned directory for lucene-plus (lucene-plus relies on the bean operation of the hutool, so "cn.hutool.extra.spring" must be scanned by the project):

@ComponentScans(value = {@ComponentScan("cn.juque.luceneplus"), @ComponentScan("cn.hutool.extra.spring")})

4. At this point, the dependency introduction of lucene-plus has been completed. For example, we need to initialize an index on file information. First, we need to implement the interface: IndexTemplate, which defines the json form of the entire index:

@Component("indexFileHandler")
public class IndexFileHandler implements IndexTemplate {

    @Override
    public String getTemplate() {
        return "{\n"
            + "    \"indexName\":\"file_info\",\n"
            + "    \"fieldDTOList\":[\n"
            + "        {\n"
            + "            \"fieldName\":\"module_id\",\n"
            + "            \"fieldType\":\"STRING\",\n"
            + "            \"value\":\"${module_id}\",\n"
            + "            \"store\":\"YES\"\n"
            + "        },\n"
            + "        {\n"
            + "            \"fieldName\":\"server_path\",\n"
            + "            \"fieldType\":\"STRING\",\n"
            + "            \"value\":\"${server_path}\",\n"
            + "            \"store\":\"YES\"\n"
            + "        },\n"
            + "        {\n"
            + "            \"fieldName\":\"server_file_name\",\n"
            + "            \"fieldType\":\"STRING\",\n"
            + "            \"value\":\"${server_file_name}\",\n"
            + "            \"store\":\"YES\"\n"
            + "        },\n"
            + "        {\n"
            + "            \"fieldName\":\"client_file_name\",\n"
            + "            \"fieldType\":\"STRING\",\n"
            + "            \"value\":\"${client_file_name}\",\n"
            + "            \"store\":\"YES\"\n"
            + "        },\n"
            + "        {\n"
            + "            \"fieldName\":\"file_size\",\n"
            + "            \"fieldType\":\"DOUBLE\",\n"
            + "            \"value\":\"${file_size}\"\n"
            + "        }\n"
            + "    ]\n"
            + "}";
    }
}

Index parameters:

Parameters:describe
indexNameIndex Name
fieldDTOListList of field information
fieldDTOList.fieldNameField Name
fieldDTOList.fieldTypeField type
fieldDTOList.valueCurrent reserved parameters, not practical
fieldDTOList.storeWhether or not to store. YES: Storage; NO: No Storage
fieldDTOList.pointSupports range queries only for fieldType:{INTEGER, LONG, DOUBLE}
fieldDTOList.analyzerWord breaker, supported word breaker reference: AnalyzerEnum

Complete the json definition of the index, the index information will be initialized automatically after starting the service, and subsequent additions and edits will be instantiated based on the initialized index information. (Note: Index fields only allow incremental operations, changes and deletions are invalid, and index name changes are invalid)

5. Query operation (supports querying a single Document, normal paging, scrolling paging):

BooleanQuery.Builder builder = new BooleanQuery.Builder();
        Term term = new Term(IndexFileEnum.SERVER_FILE_NAME.getName(), serverName);
        TermQuery termQuery = new TermQuery(term);
        builder.add(termQuery, Occur.MUST);
        Document document = this.documentPlusService.searchDocument(IndexFileEnum.FILE_INFO.getName(), builder);

Ordinary paging queries implement logical paging and currently support paging of up to 200,000 data volumes. If students can achieve physical paging, they can share it.

6. New Document Operation:

// Preservation
        Map<String, Object> params = CollUtil.newHashMap(10);
        params.put(IndexFileEnum.MODULE_ID.getName(), multiFileBo.getModuleId());
        params.put(IndexFileEnum.CLIENT_FILE_NAME.getName(), multiFileBo.getFileName());
        params.put(IndexFileEnum.SERVER_FILE_NAME.getName(), serverName);
        params.put(IndexFileEnum.SERVER_PATH.getName(), serverPath);
        params.put(IndexFileEnum.DOWNLOAD_TIMES.getName(), 0);
        params.put(IndexFileEnum.UPLOAD_USER_ID.getName(), multiFileBo.getUserId());
        try {
            this.documentPlusService.addDocument(IndexFileEnum.FILE_INFO.getName(), params);
        } catch (IOException e) {
            log.error("save file error", e);
            throw new AppException(InternetStorageMsgEnum.SYSTEM_ERROR);
        }

lucene-plus instantiates a Field based on the fieldName map, so if you add a field, you need to adjust the index template first, otherwise adding it to the Map alone will not work.

7. Update operation:

BooleanQuery.Builder builder = new BooleanQuery.Builder();
        serverNames.forEach(t->{
            Term term = new Term(IndexFileEnum.SERVER_FILE_NAME.getName(), t);
            builder.add(new TermQuery(term), Occur.SHOULD);
        });
        Map<String, Object> params = CollUtil.newHashMap(1);
        params.put(IndexFileEnum.IS_VALID.getName(), Boolean.TRUE.toString());
        this.documentPlusService.updateDocument(IndexFileEnum.FILE_INFO.getName(), builder, params);

The api finds the relevant Document based on the passed condition, and then executes delete-insert. The implementation logic of lucene's updateDocument is also delete-insert, the difference is that lucene-plus implements a local update of the Document, while the native update method is a full field update;

8. Delete operation is not much different from the original delete operation. So far, the introduction of lucene-plus based add-delete inspection is also a long way off.

At the end of the article, write down the problems encountered during the development process. If you have any answers, please let us know

1. The update operation of lucene-plus is to find out first, then merge Document with Maps, and then delete the merged Document according to the condition, and save the merged Document again. In practice, the last step to call the native addDocument save has been unsuccessful, or to delete the original updateDocument successfully without saving the new Document.I called addDocument of lucene-plus instead and saved it successfully. The sauce has not been completely understood.

Topics: Java Spring Spring Boot lucene