[Yugong series] January 2022 Django mall project 26 - realization of search engine function

Posted by barbatruc on Thu, 03 Feb 2022 08:43:52 +0100

Article catalogue

1, Principles of full-text retrieval and search engine

1. Product search demand

When the user enters the product keyword in the search box, we will provide users with relevant product search results.

2. Product search implementation

You can choose to use fuzzy query like bump word to realize.

But the efficiency of like keyword is very low.

The query needs to be carried out in multiple fields, and it is inconvenient to use the like keyword.

3. Full text search scheme

We introduce the scheme of full-text retrieval to realize commodity search.

Full text search is to search and query in any specified field.

The full-text retrieval scheme needs to cooperate with the search engine.

4. Principle of search engine

When the search engine carries out full-text retrieval, it will preprocess the data in the database and establish an index structure data separately.

The index structure data is similar to the index search page of Xinhua Dictionary, which contains the correspondence between keywords and entries, and records the location of entries.

When the search engine carries out full-text retrieval, it quickly compares and searches the keywords in the index data, and then finds the real storage location of the data.

2, Introduction to Elasticsearch

Elasticsearch is the preferred search engine for full-text retrieval.

  • Elasticsearch is an open source search engine implemented in Java.
  • It can quickly store, search and analyze massive data. Wikipedia, stack overflow, GitHub, etc. all use it.
  • At the bottom of Elasticsearch is the open source library Lucene. However, Lucene cannot be used directly. You must write your own code to call its interface.

Word segmentation description

  • Search engines need word segmentation when building indexes on data.
  • Word segmentation refers to the disassembly of a sentence into multiple words or words, which are the key words of the sentence. For example: I am Chinese
  • After word segmentation: I, yes, China, China, people, China and so on can be the key words of this sentence.
  • Elasticsearch does not support Chinese word segmentation and indexing. It needs to be combined with the extension of elasticsearch analysis IK to realize Chinese

3, Docker installation Elasticsearch

Obtain the image and pull it through the network

docker image pull delron/elasticsearch-ik:2.4.6-1.0

Or use the image file pulled by yourself:

docker load -i elasticsearch-ik-2.4.6_docker.tar

Modify the configuration file of elasticsearch elasticsearch-2.4.6/config/elasticsearch YML line 54, change the ip address to the local ip address

network.host: 127.0.0.1

Create docker container to run

docker run -dti --network=host --name=elasticsearch -v /home/python/elasticsearch-2.4.6/config:/usr/share/elasticsearch/config Desktop/elasticsearch-ik:2.4.6-1.0

The following message indicates that the service has run successfully

4, haystack extension indexing

1. Haystack introduction and installation configuration

1.1 introduction to haystack

  • Haystack is a framework for docking search engines in Django, and builds a communication bridge between users and search engines.
  • In Django, we can call the Elasticsearch search search engine by using Haystack.
  • Haystack can use different search back ends (such as elastic search, whoosh, Solr, etc.) without modifying the code.

1.2 Haystack installation

pip install django-haystack
pip install elasticsearch==2.4.6

1.3 Haystack registration application and routing

Add the following applications to the application configuration

INSTALLED_APPS = [
	"haystack',#Full text search
]

# Haystack
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200 / ', # here is the ip address of the server running elasticsearch, and the port number is fixed as 9200
        'INDEX_NAME': 'xxshopping',  # Specifies the name of the index library created by elasticsearch
    },
}
 
# When adding, modifying and deleting data, the index is automatically generated
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

Create search_indexes.py is in the corresponding commodity directory

from haystack import indexes

from apps.goods.models import SKU


class SKUIndex(indexes.SearchIndex,indexes.Indexable):
    # Each SearchIndex needs to have a (and only) field document=True.
    # This indicates to Haystack and the search engine which field is the primary field to search in.

    #Allows us to use data templates (rather than error prone concatenation) to build documents that search engines will index
    # 'name,caption,id'

    #The Convention is to name this field text
    text = indexes.CharField(document=True, use_template=True)


    def get_model(self):
        # Returns which model to retrieve
        return SKU

    def index_queryset(self, using=None):
        #What data is retrieved
        return self.get_model().objects.filter(is_launched=True)
        # return self.get_model().objects.all()
        # return SKU.objects.all()
        # pass

# class SPUIndex(indexes.SearchIndex, indexes.Indexable):
#     # Each SearchIndex needs to have a (and only) field document=True.
#     # This indicates to Haystack and the search engine which field is the primary field to search in.
#
#     # The Convention is to name this field text
#     text = indexes.CharField(document=True, use_template=True)

Create a new SKU in the template_ text. Txt file

# Here we specify which fields of the model to retrieve
# Object can be understood as an instance object of SKU

{{ object.name }}
{{ object.caption }}
{{ object.id }}

Add under global routing file

re_path('^search/', include('haystack.urls'))

Add view search html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
	<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
	<title>Xiaoxu mall-Product search</title>
    <link rel="stylesheet" type="text/css" href="{{ static('css/jquery.pagination.css') }}">
    <link rel="stylesheet" type="text/css" href="{{ static('css/reset.css') }}">
	<link rel="stylesheet" type="text/css" href="{{ static('css/main.css') }}">
    <script type="text/javascript" src="{{ static('js/jquery-1.12.4.min.js') }}"></script>
	<script type="text/javascript" src="{{ static('js/vue-2.5.16.js') }}"></script>
    <script type="text/javascript" src="{{ static('js/axios-0.18.0.min.js') }}"></script>
</head>
<body>
    <div id="app">
	<div class="header_con">
		<div class="header" v-cloak>
			<div class="welcome fl">Welcome to Xiaoxu mall!</div>
			<div class="fr">
                <div v-if="username" class="login_btn fl">
                    Welcome:<em>[[ username ]]</em>
                    <span>|</span>
                    <a href="#"> Exit</a>
                </div>
                <div v-else class="login_btn fl">
                    <a href="#"> login</a>
                    <span>|</span>
                    <a href="#"> registration</a>
                </div>
				<div class="user_link fl">
					<span>|</span>
					<a href="#"> User Center</a>
					<span>|</span>
					<a href="#"> my shopping cart</a>
					<span>|</span>
					<a href="#"> my order</a>
				</div>
			</div>
		</div>
	</div>
	<div class="search_bar clearfix">
		<a href="{{ url('contents:index') }}" class="logo fl"><img src="{{ static('images/logo.png') }}"></a>
		<div class="search_wrap fl">
			<form method="get" action="/search/" class="search_con">
                <input type="text" class="input_text fl" name="q" placeholder="Search for products">
                <input type="submit" class="input_btn fr" name="" value="search">
            </form>
			<ul class="search_suggest fl">
				<li><a href="#"> Sony micro order</a></li>
				<li><a href="#"> 15 yuan discount</a></li>
				<li><a href="#"> beauty care</a></li>
				<li><a href="#"> buy 2 free 1</a></li>
			</ul>
		</div>
	</div>
    <div class="main_wrap clearfix">
        <div class=" clearfix">
            <ul class="goods_type_list clearfix">
                {% for result in page %}
                <li>
                    {# object Getting is sku object #}
                    <a href="#"><img src="{{ result.object.default_image.url }}"></a>
                    <h4><a href="#">{{ result.object.name }}</a></h4>
                    <div class="operate">
                        <span class="price">¥{{ result.object.price }}</span>
                        <span>{{ result.object.comments }}evaluate</span>
                    </div>
                </li>
                {% else %}
                    <p>The item you want to query is not found.</p>
                {% endfor %}
            </ul>
            <div class="pagenation">
                <div id="pagination" class="page"></div>
            </div>
        </div>
    </div>
	<div class="footer">
		<div class="foot_link">
			<a href="#"> about us</a>
			<span>|</span>
			<a href="#"> contact us</a>
			<span>|</span>
			<a href="#"> recruit talent</a>
			<span>|</span>
			<a href="#"> links</a>
		</div>
		<p>CopyRight © 2016 Xiao Xu All Rights Reserved</p>
		<p>Tel: 010-****888    Beijing ICP prepare*******8 number</p>
	</div>
    </div>
    <script type="text/javascript" src="{{ static('js/common.js') }}"></script>
    <script type="text/javascript" src="{{ static('js/search.js') }}"></script>
    <script type="text/javascript" src="{{ static('js/jquery.pagination.min.js') }}"></script>
    <script type="text/javascript">
        $(function () {
            $('#pagination').pagination({
                currentPage: {{ page.number }},
                totalPage: {{ paginator.num_pages }},
                callback:function (current) {
                    window.location.href = '/search/?q=iphone&page=1';
                    window.location.href = '/search/?q={{ query }}&page=' + current;
                }
            })
        });
    </script>
</body>
</html>

Finally, create indexed data:

python manage.py rebuild_index    

Choose y

At this point, we have the data we index in our database;

1.4 testing

/search/?q=Query generation

5, Custom page access

1. Create index class

2. Create serializer

3. Finally, create the indexed data

python manage.py rebuild_index

Choose Y

4. Create a view

5. Create sequencer for index

6. Register in the routing of our application

The last step is to set our front-end search HTML page and corresponding js loading file;