nodejs crawler http, cheerio, mysql module

Posted by quikone on Tue, 29 Oct 2019 20:27:51 +0100

nodejs related modules

Get web content (httprequestsuperior, etc.)

Filter web information (cheerio)

Output or store information (consolesmongodbmysql, etc.)

1. Use the request module to obtain the web page content
var request = require('request');
    // Read the content of http://cnodejs.org/ through GET request
    request('http://cnodejs.org/', function (error, response, body) {
        if (!error && response.statusCode == 200) {
            // Output web content
            console.log(body);
        }
    });

If it is another request method, or you need to specify the request first-class information, you can pass in an object in the first parameter to specify, for example:

var request = require('request');
request({
    url:    'http://cnodejs.org / ', / / requested URL
    method: 'GET',                   // Request method
    headers: {                       // Specify request header
        'Accept-Language': 'zh-CN,zh;q=0.8',         // Specify accept language
        'Cookie': '__utma=4454.11221.455353.21.143;' // Specify cookies
    }
}, function (error, response, body) {
    if (!error && response.statusCode == 200) {
        console.log(body) // Output web content
    }
});
2. Use the cheerio module to extract the data in the web page

cheerio is a subset of jQuery Core, which implements browser independent DOM operation API in jQuery Core. Here is a simple example:

var cheerio = require('cheerio');

// Transforming HTML code into a jQuery object through the load method
var $ = cheerio.load('<h2 class="title">Hello world</h2>');

// You can operate with the same syntax as jQuery
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

console.log($.html());
// Output < H2 class = "title welcome" > hello there!</h2>
3. Use mysql module to save data to database

The mysql module has a built-in connection pool mechanism. Here is a simple example:

var mysql = require('mysql');

// Create database connection pool
var pool  = mysql.createPool({
  host:           'localhost', // Database address
  user:           'root',      // Database users
  password:        '',         // Corresponding password
  database:        'example',  // Database name
  connectionLimit: 10          // Maximum connections, default is 10
});

// Before using SQL queries, you need to call pool.getConnection() to get a connection
pool.getConnection(function(err, connection) {
  if (err) throw err;

  // Connection is the current available database connection
});
Reference document
jquery Selector summary https://www.cnblogs.com/xiaxuexiaoab/p/7091527.html 
nodejs Reptile https://www.cnblogs.com/xiaxuexiaoab/p/7124956.html

Welcome comments

Topics: node.js Database JQuery MySQL SQL