introduce
Sphinx is an open source search engine that allows full-text search. As we all know, it can search big data very effectively. The data to be indexed usually comes from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.
Some of Sphinx's main features include:
- High indexing and search performance
- Advanced indexing and query tools
- Advanced result set postprocessing
- Proven scalability up to billions of documents, terabytes of data and thousands of queries per second
- Easy integration with SQL and XML data sources and SphinxQL, SphinxAPI or SphinxSE search interfaces
- Easily extend with distributed search
In this tutorial, we will set up Sphinx and MySQL servers using the sample SQL files included in the distribution package. It will give you the basics of how to use Sphinx for your project.
prepare
Before starting this guide, you need to:
- One CentOS 7 server. Students who don't have a server can buy it here, but I personally recommend that you use the free Tencent cloud developer lab to experiment and learn to install it before buying a server.
- A non root user with sudo privileges.
- Install MySQL on your server. If you use it in a production environment, I suggest you directly use the cloud relational database. The cloud relational database allows you to easily deploy, manage and expand the relational database in the cloud, providing safe, reliable, scalable and flexible on-demand cloud database services. Tencent cloud relational database provides mysql, SQL Server, MariaDB and PostgreSQL database engines, and optimizes the performance of the database engine.
Step 1 - install Sphinx
You can Sphinx website The latest version was found on.
Before installing Sphinx, you first need to install its dependencies.
sudo yum install -y postgresql-libs unixODBC
Move to the tmp directory and download Sphinx files in an inconspicuous place.
cd /tmp
Use wget to download the latest Sphinx version.
wget http://sphinxsearch.com/files/sphinx-2.2.11-1.rhel7.x86_64.rpm
Finally, install it using yum or up2date.
sudo yum install -y sphinx-2.2.11-1.rhel7.x86_64.rpm
You have now successfully installed Sphinx on the server. Before starting the Sphinx daemon, let's configure it.
Step 2 - create test database
Here, we will use the sample data in the SQL file provided with the package to set up the database. This will allow us to test whether Sphinx search will work later.
Let's import the sample SQL file into the database. First, log in to the MySQL server shell.
mysql -u root -p
Enter the password of MySQL root user when asked. Your prompt changes to MariaDB >.
Create a virtual database. Here, we call it a test, but you can name it as needed.
CREATE DATABASE test;
Import the sample SQL file.
SOURCE /usr/share/doc/sphinx-2.2.11/example.sql;
Then leave the MySQL shell.
quit
Now you have a database populated with sample data. Next, we will customize the configuration of Sphinx.
Step 3 - configure Sphinx
Sphinx should be configured in / etc/sphinx with a name Sphinx Conf file. The configuration consists of three main blocks: index, search and source.
The minimum configuration has been provided, but we will provide a new sample configuration file for you to use and explain each section so that you can customize it later.
First, move the existing Sphinx Conf file.
sudo mv /etc/sphinx/sphinx.conf /etc/sphinx/sphinx.conf2
Use vi or your favorite text editor to create a new file Sphinx conf.
sudo vi /etc/sphinx/sphinx.conf
Index, search and source block are described in turn below. Then, at the end of this step, include Sphinx The entire contents of conf will be available for you to copy and paste into the file.
The source block contains the source, user name and password of the MySQL server type. sql_ The first column of query should be a unique id. The SQL query will run on each index and dump the data to the Sphinx index file. The following is a description of each field and the source block itself.
- Type: the type of data source to index. In our example, this is mysql. Other supported types include pgsql, mssql, xmlpipe2, odbc, etc.
- sql_host: the host name of the MySQL host. In our example, this is localhost. This can be a domain or an IP address.
- sql_user: the user name of MySQL login. In our example, this is root.
- sql_pass: password of MySQL user. In our example, this is the password of the root MySQL user.
-sql_db: name of the database where the data is stored. In our example, this is a test. - sql_query: a query that dumps data from the database to the index.
This is the source block:
source src1 { type = mysql #SQL settings (for 'mysql' and 'pgsql' types) sql_host = localhost sql_user = root sql_pass = password sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added }
The index component includes a source and a path for storing data. stay
Source: the name of the source block. In our example, this is src1.
Path: the path where the index is saved.
index test1 { source = src1 path = /var/lib/sphinx/test1 docinfo = extern }
The searchd component contains ports and other variables to run the Sphinx daemon.
listen: the port on which the Sphinx daemon will run, followed by the protocol. In our example, this is 9306: mysql41. Known protocols are Sphinx (Sphinx API) and mysql41 (SphinxQL)
query_log: the path to save the query log.
pid_file: path to the PID file for Sphinx daemon.
seamless_rotate: prevents search pauses when rotating indexes with large amounts of data to the pre cache.
preopen_indexes: whether to force all indexes to be opened in advance at startup.
unlink_old: whether to delete the old index copy on successful rotation.
searchd { listen = 9312:sphinx #SphinxAPI port listen = 9306:mysql41 #SphinxQL port log = /var/log/sphinx/searchd.log query_log = /var/log/sphinx/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinx/searchd.pid seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 binlog_path = /var/lib/sphinx/ }
The complete configuration to copy and paste is as follows. The only variable you need to change below is the SQL variable in the source block_ Pass, the variable is as follows:
source src1 { type = mysql sql_host = localhost sql_user = root sql_pass = your_root_mysql_password sql_db = test sql_port = 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added } index test1 { source = src1 path = /var/lib/sphinx/test1 docinfo = extern } searchd { listen = 9306:mysql41 log = /var/log/sphinx/searchd.log query_log = /var/log/sphinx/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinx/searchd.pid seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 binlog_path = /var/lib/sphinx/ }
To explore more configurations, see / usr / share / Doc / sphinx-2.2 11/sphinx. Conf.dist file, which details all variables.
Step 4 - manage indexes
In this step, we will add data to the Sphinx index and ensure that the index remains up-to-date through cron.
First, add data to the index using the configuration we created earlier.
sudo indexer --all
You should get something like this.
Sphinx 2.2.11-id64-release (95ae9a6) Copyright (c) 2001-2016, Andrew Aksyonoff Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com) using config file '/etc/sphinx/sphinx.conf'... indexing index 'test1'... collected 4 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 4 docs, 193 bytes total 0.006 sec, 29765 bytes/sec, 616.90 docs/sec total 4 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
In a production environment, the index must be kept up to date. To do this, let's create a Cron job. First, open crontab.
crontab -e
The following Cron jobs will run hourly and add new data to the index using the profile we created earlier. Copy and paste it to the end of the file, then save and close the file.
@hourly /usr/bin/indexer --rotate --config /etc/sphinx/sphinx.conf --all
Now that Sphinx is fully set up and configured, we can start the service and try it out.
Step 5 - start Sphinx
Start the Sphinx daemon using systemctl.
sudo systemctl start searchd
To check that the Sphinx daemon is running correctly, run:
sudo systemctl status searchd
You should get something like this.
● searchd.service - SphinxSearch Search Engine Loaded: loaded (/usr/lib/systemd/system/searchd.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2016-08-19 17:48:39 UTC; 5s ago . . .
Sphinx is fully customized and running, so we'll check that it works properly.
Step 6 - test the search function
Now that everything is set up, let's test the search function. Connect to SphinxQL using the MySQL interface. Your prompt will change to MySQL >.
mysql -h0 -P9306
Let's search for a sentence.
SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;
You should get something like this.
+------+----------+------------+ | id | group_id | date_added | +------+----------+------------+ | 1 | 1 | 1465979047 | | 2 | 1 | 1465979047 | +------+----------+------------+ 2 rows in set (0.00 sec) +---------------+----------+ | Variable_name | Value | +---------------+----------+ | total | 2 | | total_found | 2 | | time | 0.000 | | keyword[0] | test | | docs[0] | 3 | | hits[0] | 5 | | keyword[1] | document | | docs[1] | 2 | | hits[1] | 2 | +---------------+----------+ 9 rows in set (0.00 sec)
In the above results, you can see that Sphinx found 2 matches in the index test1 of our test sentence. SHOW META; The command also displays the number of hits for each keyword in the sentence.
Let's search for some keywords.
CALL KEYWORDS ('test one three', 'test1', 1);
You should get something like this.
+------+-----------+------------+------+------+ | qpos | tokenized | normalized | docs | hits | +------+-----------+------------+------+------+ | 1 | test | test | 3 | 5 | | 2 | one | one | 1 | 2 | | 3 | three | three | 0 | 0 | +------+-----------+------------+------+------+ 3 rows in set (0.00 sec)
In the above results, you can see that Sphinx found in the test1 index:
- 5 matches in 3 documents of keyword "test"
- 2 matching keywords "one" in 1 document
- 0 matches in 0 documents are keyword 'three'
Now that you have tested Sphinx, you can delete the test database DROP DATABASE test; as needed;.
When finished, exit the MySQL shell.
quit
Conclusion:
In this tutorial, we showed you how to install Sphinx and do a simple search using SphinxQL and MySQL.
With Sphinx, you can easily add custom searches to your website.
Reference article: https://cloud.tencent.com/developer/article/1348835