Quick start Java crawler, full-text search engine Elasticsearch, analysis of actual combat project: imitation Jingdong search

Posted by Fergusfer on Thu, 30 Dec 2021 12:03:01 +0100

hi, everyone, today I continue to finish last week's content!!

Next to the last part, today we share the Elasticsearch full-text search engine, which is encapsulated and enhanced based on Lucene.

First of all, let's introduce elastic search, or es for short. It is an open source and highly extended distributed full-text search engine, which can store and retrieve data in near real time; It has good scalability and can be extended to hundreds of servers to process PB level data (in the era of big data). Es also uses java to develop and use Lucene as its core to realize all indexing and search functions, but its purpose is to hide the complexity of Lucene through a simple Restful API, so as to make full-text search simple.

First of all, let's download Elasticsearch, visit Elasticsearch's official website and download the installation package Download Elasticsearch | Elastichttps://www.elastic.co/cn/downloads/elasticsearch After obtaining the downloaded file, open cmd in the bin directory, start elasticsearch, and then open port 9200 with a browser. The following interface indicates that the operation is successful.

We can use some extensions here, such as head and kibana. Kibana is an open source analysis and visualization platform for Elasticsearch, which is used to search and view the data interactively stored in the Elasticsearch index. In short, these extensions can be used to help us specifically learn Elaticsearch. After all, they are visualization.

If you want to specifically understand the functions of elaticearch, you can look at the development documents on the official website, such as the · IK word splitter and highlight, which need to be mastered.

Today is a presentation project, so I won't say much. First of all, we need to package the crawler in the previous article into a class to facilitate our call. After all, we must ensure that there is data in es before using es.

All files of the project are shown in the figure. The port configured by yaml file is 8081; Before using it, you must ensure that you have maven dependencies related to jsoup and es, which were provided last week. The following is the jsoupService under the crawler utils last week

package com.liu.utils;

import com.liu.pojo.goodJD;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

@Component
public class jsouputils {
//    public static void main(String[] args) throws Exception {
        https://search.jd.com/Search?keyword=java
//        new jsouputils().parsJD("java").forEach(System.out::println);
//    }
    public List<goodJD> parsJD(String keywords) throws Exception {
            String url ="https://search.jd.com/Search?keyword="+keywords;
            Document document = Jsoup.parse(new URL(url), 900000000);
            Element element = document.getElementById("J_goodsList");
            Elements elements = element.getElementsByTag("li");
        ArrayList<goodJD> goodJDlist = new ArrayList<>();
        for (Element el : elements) {
           String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
           String price = el.getElementsByClass("p-price").eq(0).text();
           String title = el.getElementsByClass("p-name").eq(0).text();
           String shopnum = el.getElementsByClass("p-shop").eq(0).text();
            goodJD goodJD = new goodJD();
                goodJD.setTitle(title);
                goodJD.setImg(img);
                goodJD.setPrice(price);
                goodJD.setShopnum(shopnum);
            goodJDlist.add(goodJD);
            }
        return goodJDlist;
    }
}


Next, we need to add an ESconfig class under the config directory to connect to our ES, 127.0 0.1:9200 represents local path plus port. The specific code is as follows:

package com.liu.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ESconfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return  client;
    }
}

Next is the most difficult service layer. The specific code is as follows:

package com.liu.service;

import com.alibaba.fastjson.JSON;
import com.liu.pojo.goodJD;
import com.liu.utils.jsouputils;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import javax.naming.directory.SearchResult;
import javax.swing.text.Highlighter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

@Service
public class JsoupService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    public Boolean parseContent(String keywords) throws Exception {
        List<goodJD> goodJDS = new jsouputils().parsJD(keywords);

        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i = 0; i < goodJDS.size(); i++) {
            bulkRequest.add((new IndexRequest("jd_good")
            .source(JSON.toJSONString(goodJDS.get(i)), XContentType.JSON)
            ));
        }
 BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        return !bulk.hasFailures();
    }

    public List<Map<String,Object>> seachPage(String keywords,
                                              int pageNo,
                                              int pageSize) throws IOException {
        if(pageNo<=1){
            pageNo=1;
        }

        SearchRequest searchRequest = new SearchRequest("jd_good");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keywords);

        searchSourceBuilder.query(termQueryBuilder);

        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        ArrayList<Map<String,Object>>list = new ArrayList<>();
        for (SearchHit documentFields : searchResponse.getHits()) {
            list.add(documentFields.getSourceAsMap());
        }
        return  list;
    }

    public List<Map<String,Object>> seachPagehighlighter(String keywords,
                                              int pageNo,
                                              int pageSize) throws IOException {
        if(pageNo<=1){
            pageNo=1;
        }

        SearchRequest searchRequest = new SearchRequest("jd_good");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        TermQueryBuilder termQueryBuilder =
                QueryBuilders.termQuery("title", keywords);

        searchSourceBuilder.query(termQueryBuilder);

        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("<span style='color:red'>");
        highlightBuilder.postTags("</span>");
        searchSourceBuilder.highlighter(highlightBuilder);

        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse =
                restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        ArrayList<Map<String,Object>>list = new ArrayList<>();




        for (SearchHit documentFields : searchResponse.getHits()) {
            Map<String, HighlightField> highlightFields =
                    documentFields.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();




            if(title!=null){
                Text[] fragments = title.fragments();
                String ntitle="";
                for (Text text : fragments) {
                    ntitle +=text;
                }
                sourceAsMap.put("title",ntitle);
            }
            list.add(sourceAsMap);
        }
        return  list;
    }
}

Let me make a simple analysis of the corresponding service layer,

BulkRequest is used when inserting multiple data. It is connected for 2 minutes

Highlight operation:

Next comes the controller layer

JsoupController class:

package com.liu.controller;

import com.liu.service.JsoupService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.CrossOrigin;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@RestController
public class JsoupController {

    @Autowired
    private JsoupService jsoupService;

    @CrossOrigin
    @GetMapping("/parse/{keywords}")
    public  Boolean parse(@PathVariable String keywords) throws Exception {
        return jsoupService.parseContent(keywords);
    }

    @CrossOrigin
    @GetMapping("/search/{keywords}/{pageNo}/{pageSize}")
    public  List<Map<String,Object>> search(
                                         @PathVariable  String keywords,
                                         @PathVariable   int pageNo,
                                         @PathVariable    int pageSize) throws IOException {

        return jsoupService.seachPagehighlighter(keywords,pageNo,pageSize);

    }
}

Indexzion class:

package com.liu.controller;


import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;

@Controller
public class indexjion {

    @GetMapping({"/", "/index"})
    public String toindex(){

        return "index";
    }
}

Front end interface

index.html:

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">

<head>
    <meta charset="utf-8"/>
    <title>Java-ES Imitation Jingdong actual combat</title>
    <link rel="stylesheet" th:href="@{/css/style.css}"/>

</head>

<body class="pg">
<div class="page" id="app">
    <div id="mallPage" class=" mallist tmall- page-not-market ">

        <!-- Head search -->
        <div id="header" class=" header-list-app">
            <div class="headerLayout">
                <div class="headerCon ">
                    <!-- Logo-->
                    <h1 id="mallLogo">
                        <img th:src="@{/images/jdlogo.png}" alt="">
                    </h1>

                    <div class="header-extra">

                        <!--search-->
                        <div id="mallSearch" class="mall-search">
                            <form name="searchTop" class="mallSearch-form clearfix">
                                <fieldset>
                                    <legend>Jingdong search</legend>
                                    <div class="mallSearch-input clearfix">
                                        <div class="s-combobox" id="s-combobox-685">
                                            <div class="s-combobox-input-wrap">
                                                <input v-model="keyword" type="text" autocomplete="off" value="dd" id="mq"
                                                       class="s-combobox-input" aria-haspopup="true">
                                            </div>
                                        </div>
                                        <button  @click.prevent="searchkey" type="submit" id="searchbtn">search</button>
                                    </div>
                                </fieldset>
                            </form>
                            <ul class="relKeyTop">
                                <li><a>java</a></li>
                                <li><a>Vue</a></li>
                                <li><a>Rdeis</a></li>
                                <li><a>Docker</a></li>
                                <li><a>spring</a></li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>
        </div>

        <!-- Product details page -->
        <div id="content">
            <div class="main">
                <!-- Brand classification -->
                <form class="navAttrsForm">
                    <div class="attrs j_NavAttrs" style="display:block">
                        <div class="brandAttr j_nav_brand">
                            <div class="j_Brand attr">
                                <div class="attrKey">
                                    brand
                                </div>
                                <div class="attrValues">
                                    <ul class="av-collapse row-2">
                                        <li><a href="#"> affordable</a></li>
                                        <li><a href="#"> everyone</a></li>
                                    </ul>
                                </div>
                            </div>
                        </div>
                    </div>
                </form>

                <!-- Sorting rules -->
                <div class="filter clearfix">
                    <a class="fSort fSort-cur">comprehensive<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">popularity<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">New products<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">sales volume<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">Price<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a>
                </div>

                <!-- Product details -->
                <div class="view grid-nosku">

                    <div class="product" v-for="result in results">
                        <div class="product-iWrap">
                            <!--Product cover-->
                            <div class="productImg-wrap">
                                <a class="productImg">
                                    <img :src="result.img">
                                </a>
                            </div>
                            <!--Price-->
                            <p class="productPrice">
                                <em>{{result.price}}</em>
                            </p>
                            <!--title-->
                            <p class="productTitle">
                                <a v-html="result.title"></a>
                            </p>
                            <!-- Shop name -->
                            <div class="productShop">
                                <span>{{result.shopnum}}</span>
                            </div>
                            <!-- Transaction information -->
                            <p class="productStatus">
                                <span>Monthly transaction<em>999 pen</em></span>
                                <span>evaluate <a>3</a></span>
                            </p>
                        </div>
                    </div>
                </div>
                <div class="filter clearfix" >
                    <a class="fSort fSort-cur"><button @click.prevent="pageone" type="submit" >1</button ><i class="f-ico-arrow-d"></i></a>
                    <a class="fSort" ><button @click.prevent="pagetwo" type="submit" >2</button ><i class="f-ico-arrow-d"></i></a>
                    <a class="fSort"><button @click.prevent="pagethree" type="submit" >3</button ><i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">4<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">5<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a>
                </div>
            </div>
        </div>
    </div>
</div>


<script src="https://cdn.jsdelivr.net/npm/vue@2.5.21/dist/vue.min.js"></script>
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
<!--<script th:src="@{/js/vue.min.js}"></script>-->
<!--<script th:src="@{/js/axios.min.js}"></script>-->
<script>

    new Vue({
        el: '#app',
        data: {
            keyword: '',
            results: []
        },
        methods: {
            searchkey: function (){
               var keyword= this.keyword;
               axios.get('search/'+keyword+"/1/10").then(response=>{
                   console.log(response.data);
                   this.results=response.data;

               })
            },
            pageone :function (){
                var keyword= this.keyword;
                axios.get('search/'+keyword+"/1/10").then(response=>{
                    console.log(response.data);
                    this.results=response.data;
                })
            },
            pagetwo :function (){
                var keyword= this.keyword;
                axios.get('search/'+keyword+"/2/10").then(response=>{
                    console.log(response.data);
                    this.results=response.data;
                })
            },
            pagethree :function (){
                var keyword= this.keyword;
                axios.get('search/'+keyword+"/3/10").then(response=>{
                    console.log(response.data);
                    this.results=response.data;
                })
            }

        }
    })
</script>

</body>
</html>

Final results:

Home page:

After entering java: (java data crawled by crawler!!!)

It can be adjusted by pressing the next page:

 

Topics: Java ElasticSearch search engine