PHP Script - millions of data script execution optimization notes (PHP Script)

Posted by Iokina on Fri, 28 Jan 2022 09:14:58 +0100

Mysql - million level data query optimization notes (PHP Script) ②

Note: what we need to deal with is the annual bill counted in a script, which is completely different from the processing idea in the previous article
 Different, take out the whole quantity first, and then splice it in the form of dictionary, 10 w Only 668 pieces of data are needed s!

Data:

Test suit: 17 w   Formal service:280w

1. Full query, reduce the number of link disconnections, and use PHP for processing, with higher performance

Take out 1000 pieces of data at one time and put them in the loop for faster processing, or process 100 at a time and use the array for faster processing?

Take out the data in advance and splice it in the form of dictionary in the use of array. The consumption of connection and disconnection should be avoided as much as possible in the use of script. The performance improvement is very obvious. 17w test data is increased from 8 pieces / s to 140 pieces / S!

1.1 first, take out the required data in full

Array
(
    [0] => Array
        (
            [novel_id] => 67
            [novel_name] => Huanyang
            [author_id] => 9
            [novel_wordnumber] => 144334
        )

    [1] => Array
        (
            [novel_id] => 69
            [novel_name] => be a matchmaker
            [author_id] => 19
            [novel_wordnumber] => 135685
        )
    

1.2 fetch the required data. The key is novel_id, and the value is the corresponding level_ name

$novelNameDict = array_column($novelData, 'novel_name', 'novel_id');

//Execution result:
<pre>Array
(
    [67] => Huanyang
    [69] => be a matchmaker
    [70] => Sense of joint employment
    [71] => Flow heart rainbow sugar
    [72] => Light a lamp in the snow
    [73] => throne
    [74] => It goes without saying
    [75] => Wang Yijun
    [76] => Snow and mud
    [77] => Bandit gentleman
)

1.3 get the smallest data in Mysql, which mainly counts user behavior, so group users

SELECT	`user_id` , `novel_id` , `chapter_id` , MIN(DATETIME) AS `datetime`
FROM `wm_novel_reward`
WHERE `user_id` BETWEEN 1 AND 1006
GROUP BY `user_id`

2. The MySQL string needs to be escaped

For security reasons, when mysql is a string, you need to escape the string to ensure the normal operation of sql statements.

#Object oriented style
mysqli::real_escape_string ( string $escapestr ) : string
#Process style
mysqli_real_escape_string ( mysqli $link , string $escapestr ) : string

3.UNION ALL handles table splitting business

When there is a large amount of data, tables may be divided. UNION ALL is often used to solve the problem. The code is as follows:

$commentMostSqlArr = [];
for ($i = 0; $i < 128; $i ++) {
    $table = 'cp_comment_msg_uid_'.$i;
    $commentMostSqlArr[] = "SELECT `uid`,`nid` ,`module_name` ,`aid` ,`author_name` ,count(module_id) AS count_comment_number  FROM {$table}
    Where  `uid` BETWEEN {$minUid} AND {$maxUid} 
    AND `gift_id` = 0 AND `create_time` > {$total_start_time} AND `create_time` <= {$total_end_time}  
    Group by nid, uid ";

}
$commentMostSql = "SELECT * FROM (" . implode(" UNION ALL ", $commentMostSqlArr) . ") t";

4. Avoid duplicate data

4.1 data duplication has two aspects: one is the duplication of PHP array data, and the other is the duplication of Insert statement. The solutions are as follows:

PHP data reuse array_ Unique ($array). Note that the key name remains unchanged_ Unique () sorts the values as strings, then keeps only the first encountered key name for each value, and then ignores all subsequent key names. This does not mean that the first key name of the same value in an unordered array will be retained.

4.2 Mysql Insert de duplication

If we set a unique index, the SQL statement will fail to execute successfully and throw an error when inserting duplicate data.

The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE INTO ignores the existing data in the database. If there is no data in the database, insert new data. If there is data, skip this data. In this way, the existing data in the database can be retained to achieve the purpose of inserting data in the gap.

The following example uses INSERT IGNORE INTO. No error will occur after execution, and duplicate data will not be inserted into the data table:

INSERT IGNORE INTO person_tbl (last_name, first_name)  VALUES( 'Jay', 'Thomas');

REPLACE INTO if there are records with the same primary or unique, delete them first. Insert a new record.

5.insert optimization

INSERT INTO can be spliced and inserted into the database at one time, which is much faster than a single insert. Especially when there are multiple inserts, it also has a small disadvantage. When there is a problem in one of the fields, the whole statement fails together.

INSERT INTO `annual_report_2020`(
	`user_id` ,
	`user_regtime` ,
)
VALUES (   2 ,1500341346 ) ,
	   (   5 ,1502195321 ) ,
       (   6 ,1502242604 )

5. Implementation results

Test suit: 173662 1296 m  Read 1000 entries at a time 133 row/s