How to install and configure Sphinx on CentOS 7

Posted by imarockstar on Wed, 22 Dec 2021 11:33:17 +0100

introduce

Sphinx is an open source search engine that allows full-text search. As we all know, it can search big data very effectively. The data to be indexed usually comes from very different sources: SQL databases, plain text files, HTML files, mailboxes, etc.

Some of Sphinx's main features include:

High indexing and search performance
Advanced indexing and query tools
Advanced result set postprocessing
Proven scalability up to billions of documents, terabytes of data and thousands of queries per second
Easy integration with SQL and XML data sources and SphinxQL, SphinxAPI or SphinxSE search interfaces
Easily extend with distributed search

In this tutorial, we will set up Sphinx and MySQL servers using the sample SQL files included in the distribution package. It will give you the basics of how to use Sphinx for your project.

prepare

Before starting this guide, you need to:

One CentOS 7 server. Students who don't have a server can buy it here, but I personally recommend that you use the free Tencent cloud developer lab to experiment and learn to install it before buying a server.
A non root user with sudo privileges.
Install MySQL on your server. If you use it in a production environment, I suggest you directly use the cloud relational database. The cloud relational database allows you to easily deploy, manage and expand the relational database in the cloud, providing safe, reliable, scalable and flexible on-demand cloud database services. Tencent cloud relational database provides mysql, SQL Server, MariaDB and PostgreSQL database engines, and optimizes the performance of the database engine.

Step 1 - install Sphinx
You can Sphinx website The latest version was found on.

Before installing Sphinx, you first need to install its dependencies.

sudo yum install -y postgresql-libs unixODBC

Move to the tmp directory and download Sphinx files in an inconspicuous place.

cd /tmp

Use wget to download the latest Sphinx version.

wget http://sphinxsearch.com/files/sphinx-2.2.11-1.rhel7.x86_64.rpm

Finally, install it using yum or up2date.

sudo yum install -y sphinx-2.2.11-1.rhel7.x86_64.rpm

You have now successfully installed Sphinx on the server. Before starting the Sphinx daemon, let's configure it.

Step 2 - create test database

Here, we will use the sample data in the SQL file provided with the package to set up the database. This will allow us to test whether Sphinx search will work later.

Let's import the sample SQL file into the database. First, log in to the MySQL server shell.

mysql -u root -p

Enter the password of MySQL root user when asked. Your prompt changes to MariaDB >.

Create a virtual database. Here, we call it a test, but you can name it as needed.

CREATE DATABASE test;

Import the sample SQL file.

SOURCE /usr/share/doc/sphinx-2.2.11/example.sql;

Then leave the MySQL shell.

quit

Now you have a database populated with sample data. Next, we will customize the configuration of Sphinx.

Step 3 - configure Sphinx
Sphinx should be configured in / etc/sphinx with a name Sphinx Conf file. The configuration consists of three main blocks: index, search and source.

The minimum configuration has been provided, but we will provide a new sample configuration file for you to use and explain each section so that you can customize it later.

First, move the existing Sphinx Conf file.

sudo mv /etc/sphinx/sphinx.conf /etc/sphinx/sphinx.conf2

Use vi or your favorite text editor to create a new file Sphinx conf.

sudo vi /etc/sphinx/sphinx.conf

Index, search and source block are described in turn below. Then, at the end of this step, include Sphinx The entire contents of conf will be available for you to copy and paste into the file.

The source block contains the source, user name and password of the MySQL server type. sql_ The first column of query should be a unique id. The SQL query will run on each index and dump the data to the Sphinx index file. The following is a description of each field and the source block itself.

Type: the type of data source to index. In our example, this is mysql. Other supported types include pgsql, mssql, xmlpipe2, odbc, etc.
sql_host: the host name of the MySQL host. In our example, this is localhost. This can be a domain or an IP address.
sql_user: the user name of MySQL login. In our example, this is root.
sql_pass: password of MySQL user. In our example, this is the password of the root MySQL user.
-sql_db: name of the database where the data is stored. In our example, this is a test.
sql_query: a query that dumps data from the database to the index.

This is the source block:

source src1
{
  type          = mysql

  #SQL settings (for 'mysql' and 'pgsql' types)

  sql_host      = localhost
  sql_user      = root
  sql_pass      = password
  sql_db        = test
  sql_port      = 3306 # optional, default is 3306

  sql_query     = \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint         = group_id
  sql_attr_timestamp    = date_added
}

The index component includes a source and a path for storing data. stay

Source: the name of the source block. In our example, this is src1.
Path: the path where the index is saved.

index test1
{
  source        = src1
  path          = /var/lib/sphinx/test1
  docinfo       = extern
}

The searchd component contains ports and other variables to run the Sphinx daemon.

listen: the port on which the Sphinx daemon will run, followed by the protocol. In our example, this is 9306: mysql41. Known protocols are Sphinx (Sphinx API) and mysql41 (SphinxQL)
query_log: the path to save the query log.
pid_file: path to the PID file for Sphinx daemon.
seamless_rotate: prevents search pauses when rotating indexes with large amounts of data to the pre cache.
preopen_indexes: whether to force all indexes to be opened in advance at startup.
unlink_old: whether to delete the old index copy on successful rotation.

searchd
{
  listen            = 9312:sphinx       #SphinxAPI port
  listen            = 9306:mysql41      #SphinxQL port
  log               = /var/log/sphinx/searchd.log
  query_log         = /var/log/sphinx/query.log
  read_timeout      = 5
  max_children      = 30
  pid_file          = /var/run/sphinx/searchd.pid
  seamless_rotate   = 1
  preopen_indexes   = 1
  unlink_old        = 1
  binlog_path       = /var/lib/sphinx/
}

The complete configuration to copy and paste is as follows. The only variable you need to change below is the SQL variable in the source block_ Pass, the variable is as follows:

source src1
{
  type          = mysql

  sql_host      = localhost
  sql_user      = root
  sql_pass      = your_root_mysql_password
  sql_db        = test
  sql_port      = 3306

  sql_query     = \
  SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
  FROM documents

  sql_attr_uint         = group_id
  sql_attr_timestamp    = date_added
}
index test1
{
  source            = src1
  path              = /var/lib/sphinx/test1
  docinfo           = extern
}
searchd
{
  listen            = 9306:mysql41
  log               = /var/log/sphinx/searchd.log
  query_log         = /var/log/sphinx/query.log
  read_timeout      = 5
  max_children      = 30
  pid_file          = /var/run/sphinx/searchd.pid
  seamless_rotate   = 1
  preopen_indexes   = 1
  unlink_old        = 1
  binlog_path       = /var/lib/sphinx/
}

To explore more configurations, see / usr / share / Doc / sphinx-2.2 11/sphinx. Conf.dist file, which details all variables.

Step 4 - manage indexes
In this step, we will add data to the Sphinx index and ensure that the index remains up-to-date through cron.

First, add data to the index using the configuration we created earlier.

sudo indexer --all

You should get something like this.

Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinx/sphinx.conf'...
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.006 sec, 29765 bytes/sec, 616.90 docs/sec
total 4 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

In a production environment, the index must be kept up to date. To do this, let's create a Cron job. First, open crontab.

crontab -e

The following Cron jobs will run hourly and add new data to the index using the profile we created earlier. Copy and paste it to the end of the file, then save and close the file.

@hourly /usr/bin/indexer --rotate --config /etc/sphinx/sphinx.conf --all

Now that Sphinx is fully set up and configured, we can start the service and try it out.

Step 5 - start Sphinx
Start the Sphinx daemon using systemctl.

sudo systemctl start searchd

To check that the Sphinx daemon is running correctly, run:

sudo systemctl status searchd

You should get something like this.

●  searchd.service - SphinxSearch Search Engine
   Loaded: loaded (/usr/lib/systemd/system/searchd.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-08-19 17:48:39 UTC; 5s ago
   . . .

Sphinx is fully customized and running, so we'll check that it works properly.

Step 6 - test the search function

Now that everything is set up, let's test the search function. Connect to SphinxQL using the MySQL interface. Your prompt will change to MySQL >.

mysql -h0 -P9306

Let's search for a sentence.

SELECT * FROM test1 WHERE MATCH('test document'); SHOW META;

You should get something like this.

+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
|    1 |        1 | 1465979047 |
|    2 |        1 | 1465979047 |
+------+----------+------------+
2 rows in set (0.00 sec)

+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 2        |
| total_found   | 2        |
| time          | 0.000    |
| keyword[0]    | test     |
| docs[0]       | 3        |
| hits[0]       | 5        |
| keyword[1]    | document |
| docs[1]       | 2        |
| hits[1]       | 2        |
+---------------+----------+
9 rows in set (0.00 sec)

In the above results, you can see that Sphinx found 2 matches in the index test1 of our test sentence. SHOW META; The command also displays the number of hits for each keyword in the sentence.

Let's search for some keywords.

CALL KEYWORDS ('test one three', 'test1', 1);

You should get something like this.

+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | test      | test       | 3    | 5    |
| 2    | one       | one        | 1    | 2    |
| 3    | three     | three      | 0    | 0    |
+------+-----------+------------+------+------+
3 rows in set (0.00 sec)

In the above results, you can see that Sphinx found in the test1 index:

5 matches in 3 documents of keyword "test"
2 matching keywords "one" in 1 document
0 matches in 0 documents are keyword 'three'

Now that you have tested Sphinx, you can delete the test database DROP DATABASE test; as needed;.

When finished, exit the MySQL shell.

quit

Conclusion:

In this tutorial, we showed you how to install Sphinx and do a simple search using SphinxQL and MySQL.
With Sphinx, you can easily add custom searches to your website.

Reference article: https://cloud.tencent.com/developer/article/1348835

Programmer Think

How to install and configure Sphinx on CentOS 7

introduce

prepare

Hot Topics