Haystack
1. What is Haystack
Haystack is Django's open-source full-text search framework (full-text search is different from fuzzy queries in specific fields, and it is more efficient to use full-text search). The framework supports Solr,Elasticsearch,Whoosh, **Xapian search engines. It is a pluggable back end (much like Django's database layer), so almost all the code you write can be easily switched between different search engines
- Full text retrieval is different from the fuzzy query of specific fields. It is more efficient to use full-text retrieval, and can segment Chinese words
- haystack: a package of django, which can easily index and search the content in the model. It is designed to support four full-text retrieval engine backend: whoosh, Solr, xapian and elasticsearc. It belongs to a full-text retrieval framework
- whoosh: a full-text search engine written in pure Python. Although its performance is not as good as sphinx, xapian, Elasticsearc, etc., there is no binary package, and the program will not crash inexplicably. whoosh is enough for small sites
- jieba: a free Chinese word segmentation package. If you find it difficult to use, you can use some paid products
2. Installation
pip install django-haystack pip install whoosh pip install jieba
3. Configuration
###Add Haystack to INSTALLED_APPS
Like most Django applications, you should add Haystack to installed in your settings file (usually settings.py)_ APPS. Example:
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', # add to 'haystack', # Your app 'blog', ]
###Modify settings py
In your settings Py, you need to add a setting to indicate the backend being used by the site configuration file and other backend settings. HAYSTACK - CONNECTIONS is a required setting and should be at least one of the following:
Solr example
HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.solr_backend.SolrEngine', 'URL': 'http://127.0.0.1:8983/solr' # ...or for multicore... # 'URL': 'http://127.0.0.1:8983/solr/mysite', }, }
Elasticsearch example
HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', }, }
Whoosh example
#You need to set the PATH to the file system location of your Whoosh index import os HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine', 'PATH': os.path.join(os.path.dirname(__file__), 'whoosh_index'), }, } # Auto update index HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Xapian example
#First install the Xapian backend( http://github.com/notanumber/xapian-haystack/tree/master ) #You need to set the PATH to the file system location of your Xapian index. import os HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'xapian_backend.XapianEngine', 'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index'), }, }
4. Data processing
Create index
If you want to do full-text search for an app such as blog, you must establish a search under the blog directory_ indexes. Py file. The file name cannot be modified
from haystack import indexes from app01.models import Article class ArticleIndex(indexes.SearchIndex, indexes.Indexable): #The class name must be the model to retrieve_ Name + index, you need to retrieve Article here, so create ArticleIndex text = indexes.CharField(document=True, use_template=True)#Create a text field #Other fields desc = indexes.CharField(model_attr='desc') content = indexes.CharField(model_attr='content') def get_model(self):#Overload get_model method, must have! return Article def index_queryset(self, using=None): return self.get_model().objects.all()
Why create an index? An index is like a book directory, which can provide readers with faster navigation and search. The same is true here. When the amount of data is very large, it is almost impossible to find all that meet the search conditions from these data, which will bring a great burden to the server. Therefore, we need to add an index (directory) for the specified data. Here, we create an index for Note. We don't need to care about the implementation details of the index. As for which fields to create indexes and how to specify them, let's start to explain
There must be and only one field in each index with document=True, which means that haystack and search engines will use the content of this field as the index for retrieval (primary field). Other fields are only subsidiary properties, which are easy to call and are not used as retrieval data
Note: if a field is set document=True,It is generally agreed that this field is named text,This is in ArticleIndex Class is always named to prevent background confusion. Of course, you can change the name at will, but it is not recommended to change it.
In addition, we provide use in the text field_ template=True. This allows us to use a data template (rather than an error prone cascade) to build document search engine indexes. You should create a new template search / indexes / blog / article in the template directory_ text. Txt and put the following contents in it.
#Create a "model class name _text.txt" file under the directory "templates/search/indexes / application name /" {{ object.title }} {{ object.desc }} {{ object.content }}
This data template is used for note title, Note. user. get_ full_ name,Note. The three fields of body are indexed. When searching, full-text search matching will be done for these three fields
5. Set view
Add SearchView to your URLconf
Add the following line to your URLconf:
(r'^search/', include('haystack.urls')),
This will pull the default URLconf of Haystack, which consists of URLconf pointing to the SearchView instance separately. You can change the behavior of this class by passing a few key parameters or completely reworking it.
Search template
Your search template (default in search/search.html) will probably be very simple. The following is enough for your search to run (your template/block should be different)
<!DOCTYPE html> <html> <head> <title></title> <style> span.highlighted { color: red; } </style> </head> <body> {% load highlight %} {% if query %} <h3>The search results are as follows:</h3> {% for result in page.object_list %} {# <a href="/{{ result.object.id }}/">{{ result.object.title }}</a><br/>#} <a href="/{{ result.object.id }}/">{% highlight result.object.title with query max_length 2%}</a><br/> <p>{{ result.object.content|safe }}</p> <p>{% highlight result.content with query %}</p> {% empty %} <p>Nothing</p> {% endfor %} {% if page.has_previous or page.has_next %} <div> {% if page.has_previous %} <a href="?q={{ query }}&page={{ page.previous_page_number }}">{% endif %}« previous page {% if page.has_previous %}</a>{% endif %} | {% if page.has_next %}<a href="?q={{ query }}&page={{ page.next_page_number }}">{% endif %}next page » {% if page.has_next %}</a>{% endif %} </div> {% endif %} {% endif %} </body> </html>
Note that page object_ List is actually a list of SearchResult objects. These objects return all the data of the index. They can be accessed through {{result.object}}. So {{result.object.title}} actually uses the Article object in the database to access the title field.
Rebuild index
Now that you've configured everything, it's time to put the data in the database into the index. Haystack comes with a command-line management tool that makes it easy.
Simple operation/ manage.py rebuild_index. You will get statistics on how many models have been processed and put into the index.
6. Use jieba participle
#Establish Chinese analyzer Py file #Save it in the installation folder of haystack, such as "D: \ Python 3 \ lib \ site packages \ haystack \ backups" import jieba from whoosh.analysis import Tokenizer, Token class ChineseTokenizer(Tokenizer): def __call__(self, value, positions=False, chars=False, keeporiginal=False, removestops=True, start_pos=0, start_char=0, mode='', **kwargs): t = Token(positions, chars, removestops=removestops, mode=mode, **kwargs) seglist = jieba.cut(value, cut_all=True) for w in seglist: t.original = t.text = w t.boost = 1.0 if positions: t.pos = start_pos + value.find(w) if chars: t.startchar = start_char + value.find(w) t.endchar = start_char + value.find(w) + len(w) yield t def ChineseAnalyzer(): return ChineseTokenizer()
#Copy whoosh_backend.py file, renamed whoosh_cn_backend.py #Note: there will be a space at the end of the copied file name. Remember to delete this space from .ChineseAnalyzer import ChineseAnalyzer lookup analyzer=StemmingAnalyzer() Change to analyzer=ChineseAnalyzer()
7. Create a search bar in the template
<form method='get' action="/search/" target="_blank"> <input type="text" name="q"> <input type="submit" value="query"> </form>
8. Other configurations
Add more variables
from haystack.views import SearchView from .models import * class MySeachView(SearchView): def extra_context(self): #Overload extra_context to add additional context content context = super(MySeachView,self).extra_context() side_list = Topic.objects.filter(kind='major').order_by('add_date')[:8] context['side_list'] = side_list return context #Route modification url(r'^search/', search_views.MySeachView(), name='haystack_search'),
Highlight
{% highlight result.summary with query %} # Here, you can limit the length of the final {{result.summary}} after being highlighted {% highlight result.summary with query max_length 40 %} #html <style> span.highlighted { color: red; } </style>