Big data ELK in 2021: using VSCode to operate job search case of liepin.com

In this case, to realize a case similar to liepin.com, users can search relevant jobs by searching relevant job keywords. We have prepared some data in advance. These data are crawled by crawlers and stored in CSV text files. We need to build an index based on these data for users to search and query.

Data set introduction

Field name	explain	data
doc_id	Unique identification (as document ID)	29097
area	Position area	Working area: Shenzhen Nanshan District
exp	Required working experience	1 year experience
edu	Educational requirements	College degree or above
salary	salary range	¥ 6-8000 / month
job_type	Position type (full-time / Part-time)	internship
cmp	Company name	Happy home
pv	Views	616000 people have browsed it / 14 person evaluation / 113 people are watching
title	Post name	Taoyuan Shenda sales internship pre job training
jd	Job description	[salary] 7500 undergraduate salaries, 6800 junior college salaries, no performance requirements, and 55% ~ 80% of the performance accounting proportion, with a per capita monthly income of more than 13000 [Job Responsibilities] 1. Love to learn and be patient: be familiar with the basic real estate business and relevant legal and financial knowledge through the company's systematic training, and serve customers without utility, Patiently solve all kinds of problems encountered by customers in real estate transactions; 2. Be able to listen and ask questions: understand customers' core demands in detail, accurately match appropriate product information, have good communication skills with users, and have a sense of teamwork and service; 3. Love thinking,

2, Create index

In order to search position data, we need to create an index in Elasticsearch in advance, and then we can retrieve keywords. Let's first review the process of creating tables in MySQL. In mysql, if we want to create a table, we need to specify the name of the table, the columns in the table and the type of columns. Similarly, in elastic search, indexes can be defined in a similar way.

1. Create index with mapping

In elastic search, we can use the RESTful API (http request) to perform various indexing operations. When creating a MySQL table, we use DDL to describe the table structure, fields, field types, constraints, etc. In Elasticsearch, we use the DSL of Elasticsearch to define -- using JSON to describe. For example:

PUT /my-index
{
    "mapping": {
        "properties": {
            "employee-id": {
                "type": "keyword",
                "index": false
            }
        }
    }
}

2. Type of field

In elastic search, each field has a type. The following are the types that can be used in Elasticsearch:

classification	Type name	explain
Simple type	text	The fields requiring full-text retrieval usually use text type to correspond to unstructured text data such as mail body, product description or short text. The word splitter will first segment the text and convert it into an entry list. In the future, it can be retrieved based on entries. Text fields cannot be sorted or aggregated by users.
	keyword	Use keywords to correspond to structured data, such as ID, e-mail address, host name, status code, zip code, or label. You can use keyword to sort or aggregate calculations. Note: keyword cannot be segmented.
	date	Save formatted date data, such as 2015-01-01 or 2015 / 01 / 01 12:10:30. In elastic search, dates are displayed as strings. You can specify the format for date: "format": "yyyy MM DD HH: mm: SS"
	long/integer/short/byte	64 bit integer / 32-bit integer / 16 bit integer / 8-bit integer
	double/float/half_float	64 bit double precision floating point / 32-bit single precision floating point / 16 bit half progress floating point
	boolean	"true"/"false"
	ip	IPV4(192.168.1.110)/IPV6(192.168.0.0/16)
JSON hierarchical nested types	object	Used to save JSON objects
	nested	Used to save JSON arrays
Special type	geo_point	Used to save latitude and longitude coordinates
	geo_shape	Used to save polygon coordinates on the map

3. Create an index to save position information

Send PUT request using PUT
The index name is / job_idx
Whether to use text or keyword mainly depends on whether word segmentation is required

field	type
area	text
exp	text
edu	keyword
salary	keyword
job_type	keyword
cmp	text
pv	keyword
title	text
jd	text

Create index:

PUT /job_idx
{
    "mappings": {
        "properties" : {
            "area": { "type": "text", "store": true},
            "exp": { "type": "text", "store": true},
            "edu": { "type": "keyword", "store": true},
            "salary": { "type": "keyword", "store": true},
            "job_type": { "type": "keyword", "store": true},
            "cmp": { "type": "text", "store": true},
            "pv": { "type": "keyword", "store": true},
            "title": { "type": "text", "store": true},
            "jd": { "type": "text", "store": true}
        }
    }
}

4. View index mapping

View index mappings using GET requests

//  View index mapping
GET /job_idx/_mapping

You can also view the index mapping information using the head plug-in

5. View all indexes in Elasticsearch

GET _cat/indices

6. Delete index

delete /job_idx

7. Specify to use IK word breaker

Because the data stored in the index database is stored in Chinese. Therefore, in order to have a better word segmentation effect, we need to use IK word splitter for word segmentation. In this way, the search will be more accurate in the future.

PUT /job_idx
{
    "mappings": {
        "properties" : {
            "area": { "type": "text", "store": true, "analyzer": "ik_max_word"},
            "exp": { "type": "text", "store": true, "analyzer": "ik_max_word"},
            "edu": { "type": "keyword", "store": true},
            "salary": { "type": "keyword", "store": true},
            "job_type": { "type": "keyword", "store": true},
            "cmp": { "type": "text", "store": true, "analyzer": "ik_max_word"},
            "pv": { "type": "keyword", "store": true},
            "title": { "type": "text", "store": true, "analyzer": "ik_max_word"},
            "jd": { "type": "text", "store": true, "analyzer": "ik_max_word"}
        }
    }
}

III. add a position data

1. Demand

We now have a piece of position data that needs to be added to Elasticsearch, and we need to be able to search these data in Elasticsearch later.

29097,

Working area: Shenzhen Nanshan District,

1 year experience,

College degree or above,

¥ 6-8000 / month,

internship,

Happy home,

616000 people have browsed it / 14 person evaluation / 113 people are watching,

Taoyuan Shenda sales internship pre job training,

[salary] 7500 undergraduate salaries, 6800 junior college salaries, no performance requirements, and 55% ~ 80% of the performance accounting proportion, with a per capita monthly income of more than 13000 [Job Responsibilities] 1. Love to learn and be patient: be familiar with the basic real estate business and relevant legal and financial knowledge through the company's systematic training, and serve customers without utility, Patiently solve all kinds of problems encountered by customers in real estate transactions; 2. Be able to listen and ask questions: understand customers' core demands in detail, accurately match appropriate product information, have good communication skills with users, and have a sense of teamwork and service; 3. Pondering and Thinking: keen on user psychological research, good at refining user needs from user data, and using personalized and refined operation means to improve user experience. [job requirements] 1.18-26 years old, college degree or above; 2. Have good affinity, understanding, logical coordination and communication skills; 3. Be positive, optimistic and cheerful, be honest and trustworthy, work actively and pay attention to teamwork; 4. Willing to serve high-end customers, and willing to improve their comprehensive ability through face-to-face communication with high-end customers; 5. Willing to participate in public welfare activities, with love and gratitude. [training path] 1. Thousands of courses; Real estate knowledge, marketing knowledge, transaction knowledge, laws and regulations, customer maintenance, target management, negotiation skills, psychology and economics; 2. Growth companionship: one-to-one mentoring 3. Online self-learning platform: leyoujia college, produced by professional teams and shared by big coffee every week 4. Reserve and management classes: cadre training camp, monthly / quarterly management training [promotion and development] marketing [elite] development planning: A1 Property Consultant - A6 senior property expert marketing [management] development planning: (you can compete after the next month of employment) real estate consultant - Real Estate Manager - Store Manager - deputy general manager of marketing - vice president of marketing - President of marketing internal [competitive] company functional positions: such as market, channel development center, legal department, mortgage manager, etc. are all internal competitive [contact] Xiao Ming, who is in charge of 15888888888 (the same number of wechat)

2. PUT request

We have created the index earlier. Next, we can add some documents to the index library. This operation can be completed directly through the PUT request. In Elasticsearch, each document has a unique ID. the data is also described in JSON format. For example:

PUT /customer/_doc/1
{
  "name": "John Doe"
}

If there is no document with ID 1 in customer, Elasticsearch will automatically create it

3. Add position information request

PUT /job_idx/_doc/29097
{

    "area": "Shenzhen-Nanshan District",
    "exp": "1 Years of experience",
    "edu": "College degree or above",
    "salary": "6-8 thousand/month",
    "job_type": "internship",
    "cmp": "Happy home",
    "pv": "61.6 Ten thousand people have browsed it  / 14 Human evaluation  / 113 People are paying attention",
    "title": "Taoyuan Shenda sales internship pre job training",
    "jd": "Salary] 7500 undergraduate salaries, 6800 junior college salaries, no performance requirements, and 55% of performance accounting%~80% Per capita monthly income exceeds 1.3 [Job Responsibilities] 1.Love to learn and be patient: be familiar with the basic business of real estate and relevant legal and financial knowledge through systematic training of the company, do not serve customers utilitarian, and be patient with various problems encountered by customers in real estate transactions; 2.Be able to listen and ask questions: understand customers' core demands in detail, accurately match appropriate product information, have good communication skills with users, and have a sense of teamwork and service; 3.Love thinking, good at thinking: Keen on user psychology research, good at refining user needs from user data, and using personalized and refined operation means to improve user experience. [job requirements] 1.18-26 One year old, college degree or above; 2.Have good affinity, understanding, logical coordination and communication skills; 3.Positive, optimistic and cheerful, honest and trustworthy, proactive, and pay attention to teamwork; 4.Willing to serve high-end customers, and willing to improve their comprehensive ability through face-to-face communication with high-end customers; 5.Willing to participate in public welfare activities, with love and gratitude. [training path] 1.Thousands of courses;Real estate knowledge, marketing knowledge, transaction knowledge, laws and regulations, customer maintenance, target management, negotiation skills, psychology and economics; 2.Growth companionship: one-on-one mentoring 3.Online independent learning platform: leyoujia college, produced by professional teams, and shared by big coffee 4 times a week.Reserve and management class: cadre training camp, monthly/Quarterly management training meeting [promotion and development] marketing [elite] development plan: A1 Property consultant -A6 Senior real estate expert marketing [management] development planning: (you can compete for employment after the next month) real estate consultant-Real estate manager-shopowner-Deputy general manager of marketing-Vice president of marketing-Marketing president internal [competition] company functional posts: such as market, channel development center, legal department, mortgage manager, etc. are internal competition for [contact] director Xiao Ming 15888888888 (same number on wechat)"
}

Elasticsearch response result:

{
    "_index": "job_idx",
    "_type": "_doc",
    "_id": "29097",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

To browse data using the ES head plug-in:

4, Modify position salary

1. Demand

Because the company can't recruit people, the original salary of 6000-8000 / month needs to be changed to 15000-20000 / month

2. Perform the update operation

POST /job_idx/_update/29097
{
    "doc": {
        "salary": "15-20k/month"
    }
}

5, Delete a position data

1. Demand

The position with ID 29097 has been cancelled. Therefore, we need to delete the position in the index library.

2. DELETE operation

DELETE /job_idx/_doc/29097

6, Batch import JSON data

1. bulk import

In order to facilitate the following tests, we need to import some test data into ES in advance. There is a job_info.json data file in the data folder. We can use the bulk interface in Elasticsearch to import data.

Upload JSON data files to Linux
Execute Import command

curl -H "Content-Type: application/json" -XPOST "node1:9200/job_idx/_bulk?pretty&refresh" --data-binary "@job_info.json"

2. View index status

GET _cat/indices?index=job_idx

By executing the above request, Elasticsearch returns the following data:

[
    {
        "health": "green",
        "status": "open",
        "index": "job_idx",
        "uuid": "LS0fkOS3SWGlOCp5u28yIA",
        "pri": "1",
        "rep": "1",
        "docs.count": "6764",
        "docs.deleted": "0",
        "store.size": "23.2mb",
        "pri.store.size": "11.6mb"
    }
]

7, Retrieve the specified position data according to the ID

1. Demand

The user submits a document ID, and Elasticsearch returns the document corresponding to the ID directly to the user.

2. Realize

In elastic search, you can query documents by sending GET requests.

GET /job_idx/_search
{
    "query": {
        "ids": {
            "values": ["46313"]
        }
    }
}

8, Search data by keyword

1. Demand

Search for positions with "sales" keyword in positions

2. Realize

Search jd for sales related positions

GET  /job_idx/_search 
{
    "query": {
        "match": {
            "jd": "sale"
        }
    }
}

In addition to retrieving the position description field, we also need to retrieve the positions related to sales contained in the title. Therefore, we need to perform a combined query of multiple fields.

GET  /job_idx/_search
{
    "query": {
        "multi_match": {
            "query": "Overall planning of various materials in the early stage of sales",
            "fields": [
                "title",
                "jd"
            ]
        }
    }
}

More queries:

Official address: Start using elastic search | elastic videos

IX. pagination search according to keywords

1. Use from and size for paging

When executing a query, you can specify from (from which data to start) and size (how many data to return per page) data to easily complete paging.

from = (page – 1) * size

GET  /job_idx/_search
{
    "from": 0,
    "size": 6,
    "query": {
        "multi_match": {
            "query": "sale",
            "fields": [
                "title",
                "jd"
            ]
        }
}

2. Paging using scroll mode

Using the from and size methods, it is OK to query within 1W-5W data. However, if there is a large amount of data, there will be performance problems. Elasticsearch makes a restriction that it is not allowed to query data after 10000 items. If you want to query data after 1W items, you need to use the scroll cursor provided in elasticsearch to query.

When a large number of pages are paged, each page needs to reorder the data to be queried, which is a waste of performance. Using scroll is to sort the data to be used at one time, and then take it out in batches. Performance is much better than from + size. After using the scroll query, the sorted data will remain for a certain period of time, and subsequent paging queries can take data from the snapshot.

1) use scroll paging query for the first time

Here, we keep the sorted data for 1 minute, so set the scroll to 1m

GET /job_idx/_search?scroll=1m
{
    "query": {
        "multi_match": {
        "query": "sale",
        "fields": ["title", "jd"]
        }
    },
    "size": 100
}

After execution, we noticed that there was one item in the response result:

"_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAA0WUmQ4SjlTTGxSTldrSkt3dDg1eHRuQQ=="

In the follow-up, we need to according to this_ scroll_id to query

2) for the second time, directly use scroll id for query

GET _search/scroll?scroll=1m
{
    "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAA4WUmQ4SjlTTGxSTldrSkt3dDg1eHRuQQ=="
}

📢 Blog home page: https://lansonli.blog.csdn.net
📢 Welcome to praise 👍 Collection ⭐ Leaving a message. 📝 Please correct any errors!
📢 This article was originally written by Lansonli and started on CSDN blog 🙉
📢 Big data series articles will be updated every day. When you stop to rest, don't forget that others are still running. I hope you will seize the time to study and make every effort to go to a better life ✨

Topics: ElasticSearch Visual Studio Code

Programmer Think

Big data ELK in 2021: using VSCode to operate job search case of liepin.com

Job search case of liepin.com

1, Demand

Data set introduction

2, Create index

1. Create index with mapping

2. Type of field

3. Create an index to save position information

4. View index mapping

5. View all indexes in Elasticsearch

6. Delete index

7. Specify to use IK word breaker

III. add a position data

1. Demand

2. PUT request

3. Add position information request

4, Modify position salary

1. Demand

2. Perform the update operation

5, Delete a position data

1. Demand

2. DELETE operation

6, Batch import JSON data

1. bulk import

2. View index status

7, Retrieve the specified position data according to the ID

1. Demand

2. Realize

8, Search data by keyword

1. Demand

2. Realize

IX. pagination search according to keywords

1. Use from and size for paging

2. Paging using scroll mode

Hot Topics