Index
Indexing can usually greatly improve the efficiency of queries. Without indexing, MongoDB must scan every file in the collection and select those records that meet the query criteria when reading data. The efficiency of this kind of query is very low, especially when dealing with a large amount of data, the query can take tens of seconds or even minutes, which is very fatal to the performance of the website.
Index is a special data structure. Index is stored in a data set that is easy to read and traverse. Index is a structure that sorts the values of one or more columns in a database table.
1. Type of index
B+ Tree, hash, spatial index, full-text index
Indexes supported by MongoDB:
Word index, combination index (multi-field index),
Multi-key index: Index creates an index on a pair of values and keys
Spatial Index: Location-based Search
Text index: equivalent to full-text index
hash index: precise search, not scope search
2. Index Management
Establish:
db.mycoll.ensureIndex(keypattern[,options])
View help information:
db.mycoll.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
db.COLLECTION_NAME.ensureIndex({KEY:1})
The Key value in the grammar is the index field you want to create, 1 is to create the index in ascending order, if you want to create the index in descending order, you can specify - 1. In the ensureIndex() method, you can also set up to create an index using multiple fields (called composite indexes in relational databases). db.col.ensureIndex({"title":1,"description":-1})
The ensureIndex() receives the optional parameters, which are listed as follows:
Parameter | Type | Description |
---|---|---|
background | Boolean | The indexing process can block other database operations, and backgroundcan specify a background way to create the index, that is, to add "background" optional parameters. The default value of "background" is false. |
unique | Boolean | Is the index created unique? Specifies that a unique index is created for true. The default value is false. |
name | string | The name of the index. If not specified, MongoDB generates an index name by joining the field name and sorting order of the index. |
dropDups | Boolean | Whether to delete duplicate records when creating a unique index, specify true to create a unique index. The default value is false. |
sparse | Boolean | Indexing is not enabled for field data that does not exist in the document; this parameter requires special attention. If set to true, documents that do not contain corresponding fields will not be queried in the index field. The default value is false. |
expireAfterSeconds | integer | Specify a value in seconds, complete TTL settings, and set the lifetime of the collection. |
v | index version | The version number of the index. The default index version depends on the version that mongod runs when it creates the index. |
weights | document | The index weight value, which ranges from 1 to 99,999, represents the score weight of the index relative to other index fields. |
default_language | string | For text indexing, this parameter determines the list of rules for stop words and stemmers and organs. Default English |
language_override | string | For text indexing, this parameter specifies the field name contained in the document. The language overrides the default language and the default value is language. |
Query:
db.mycoll.getIndex()
Delete:
db.mycoll.dropIndexes() Delete all indexes of the current collection
db.mycoll.dropIndexes("index") Deletes the specified index
db.mycoll.reIndex(). Rebuild the index.
> db.students.find() > for (i=1;i<=100;i++) db.students.insert({name:"student"+i, age:(i%100)}) # Use the for loop > db.students.find().count() 100 > db.students.find() { "_id" : ObjectId("58d613021e8383d30814f846"), "name" : "student1", "age" : 1 } { "_id" : ObjectId("58d613021e8383d30814f847"), "name" : "student2", "age" : 2 } { "_id" : ObjectId("58d613021e8383d30814f848"), "name" : "student3", "age" : 3 } { "_id" : ObjectId("58d613021e8383d30814f849"), "name" : "student4", "age" : 4 } { "_id" : ObjectId("58d613021e8383d30814f84a"), "name" : "student5", "age" : 5 } { "_id" : ObjectId("58d613021e8383d30814f84b"), "name" : "student6", "age" : 6 } { "_id" : ObjectId("58d613021e8383d30814f84c"), "name" : "student7", "age" : 7 } { "_id" : ObjectId("58d613021e8383d30814f84d"), "name" : "student8", "age" : 8 } { "_id" : ObjectId("58d613021e8383d30814f84e"), "name" : "student9", "age" : 9 } { "_id" : ObjectId("58d613021e8383d30814f84f"), "name" : "student10", "age" : 10 } { "_id" : ObjectId("58d613021e8383d30814f850"), "name" : "student11", "age" : 11 } { "_id" : ObjectId("58d613021e8383d30814f851"), "name" : "student12", "age" : 12 } { "_id" : ObjectId("58d613021e8383d30814f852"), "name" : "student13", "age" : 13 } { "_id" : ObjectId("58d613021e8383d30814f853"), "name" : "student14", "age" : 14 } { "_id" : ObjectId("58d613021e8383d30814f854"), "name" : "student15", "age" : 15 } { "_id" : ObjectId("58d613021e8383d30814f855"), "name" : "student16", "age" : 16 } { "_id" : ObjectId("58d613021e8383d30814f856"), "name" : "student17", "age" : 17 } { "_id" : ObjectId("58d613021e8383d30814f857"), "name" : "student18", "age" : 18 } { "_id" : ObjectId("58d613021e8383d30814f858"), "name" : "student19", "age" : 19 } { "_id" : ObjectId("58d613021e8383d30814f859"), "name" : "student20", "age" : 20 } Type "it" for more # Show only the top 20, it shows more > db.students.ensureIndex({name:1}) #Build an index on the name key, 1 for ascending order and - 1 for descending order > show collections students system.indexes t1 > db.students.getIndexes() [ { # Default index "v" : 1, "name" : "_id_", "key" : { "_id" : 1 }, "ns" : "students.students" # Databases. Collections }, { "v" : 1, "name" : "name_1", #Automatically generated index names "key" : { "name" : 1 # Index created on the name key }, "ns" : "students.students" } ] > db.students.dropIndexes("name_1") #Delete the specified index { "nIndexesWas" : 2, "msg" : "non-_id indexes dropped for collection", "ok" : 1 } > db.students.getIndexes() [ { "v" : 1, "name" : "_id_", "key" : { "_id" : 1 }, "ns" : "students.students" } ] > db.students.dropIndexes() # The default index cannot be deleted. { "nIndexesWas" : 1, "msg" : "non-_id indexes dropped for collection", "ok" : 1 } > db.students.getIndexes() [ { "v" : 1, "name" : "_id_", "key" : { "_id" : 1 }, "ns" : "students.students" } > db.students.find({age:"90"}).explain() # Display the query process { "cursor" : "BtreeCursor t1", "isMultiKey" : false, "n" : 0, "nscannedObjects" : 0, "nscanned" : 0, "nscannedObjectsAllPlans" : 0, "nscannedAllPlans" : 0, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 17, "indexBounds" : { #Index used "age" : [ [ "90", "90" ] ] }, "server" : "Node7:27017" }
MongoDB configuration
The configuration items in the mongodb configuration file / etc/mongodb.conf are all mongod startup options (just like memcached)
[root@Node7 ~]# mongod --help Allowed options: General options: -h [ --help ] show this usage information --version show version information -f [ --config ] arg configuration file specifying additional options -v [ --verbose ] be more verbose (include multiple times for more verbosity e.g. -vvvvv) --quiet quieter output --port arg specify port number - 27017 by default --bind_ip arg comma separated list of ip addresses to listen on - all local ips by default --maxConns arg max number of simultaneous connections - 20000 by default --logpath arg log file to send write to instead of stdout - has to be a file, not directory --logappend append to logpath instead of over-writing --pidfilepath arg full path to pidfile (if not set, no pidfile is created) --keyFile arg private key for cluster authentication --setParameter arg Set a configurable parameter --nounixsocket disable listening on unix sockets --unixSocketPrefix arg alternative directory for UNIX domain sockets (defaults to /tmp) --fork fork server process --syslog log to system's syslog facility instead of file or stdout --auth run with security --cpu periodically show cpu and iowait utilization # --dbpath arg directory for datafiles - defaults to /data/db/ --diaglog arg 0=off 1=W 2=R 3=both 7=W+some reads --directoryperdb each database will be stored in a separate directory --ipv6 enable IPv6 support (disabled by default) --journal enable journaling # Whether transaction log is enabled by default --journalCommitInterval arg how often to group/batch commit (ms) --journalOptions arg journal diagnostic options --jsonp allow JSONP access via http (has security implications) --noauth run without security --nohttpinterface disable http interface --nojournal disable journaling (journaling is on by default for 64 bit) --noprealloc disable data file preallocation - will often hurt performance --noscripting disable scripting engine --notablescan do not allow table scans --nssize arg (=16) .ns file size (in MB) for new databases --profile arg 0=off 1=slow, 2=all #Performance analysis --quota limits each database to a certain number of files (8 default) --quotaFiles arg number of files allowed per db, requires --quota --repair run repair on all dbs #When unexpectedly closed, this should be enabled to repair data --repairpath arg root directory for repair files - defaults to dbpath --rest turn on simple rest api --shutdown kill a running server (for init scripts) --slowms arg (=100) value of slow for profile and console log #Set a slow query, in ms, over the set time for a slow query --smallfiles use a smaller default file size --syncdelay arg (=60) seconds between disk syncs (0=never, but not recommended) --sysinfo print some diagnostic system information --upgrade upgrade db if needed Replication options: --oplogSize arg size to use (in MB) for replication op log. default is 5% of disk space (i.e. large is good) Master/slave options (old; use replica sets instead): --master master mode --slave slave mode --source arg when slave: specify master as <server:port> --only arg when slave: specify a single database to replicate --slavedelay arg specify delay (in seconds) to be used when applying master ops to slave --autoresync automatically resync if slave data is stale Replica set options: --replSet arg arg is <setname>[/<optionalseedhostlist>] --replIndexPrefetch arg specify index prefetching behavior (if secondary) [none|_id_only|all] Sharding options: --configsvr declare this is a config db of a cluster; default port 27019; default dir /data/configdb --shardsvr declare this is a shard db of a cluster; default port 27018 SSL options: --sslOnNormalPorts use ssl on configured ports --sslPEMKeyFile arg PEM file for ssl --sslPEMKeyPassword arg PEM file password --sslCAFile arg Certificate Authority file for SSL --sslCRLFile arg Certificate Revocation List file for SSL --sslWeakCertificateValidation allow client to connect without presenting a certificate --sslFIPSMode activate FIPS 140-2 mode at startup
Common configuration parameters:
Whether fork={true | false} mongod runs in the background
Bid_ip=IP) Specify the listener address
port=PORT. Specifies the port to listen on, default is 27017
maxConns=N Specifies the maximum number of concurrent connections
syslog=/PATH/TO/SAME_FILE Specified Log File
httpinterface=true. Whether to start the web monitoring function, the port is mongod port + 1000
3. Replication of MongoDB
1. Introduction to mongodb replication
There are two types of mongodb replication implementations:
master/slave: Very similar to mysql master-slave replication, which is rarely used.
Replica set: a replica set or replica set that automatically fails over
Replication set
A set of mongodb instances serving the same data set
A replication set can only have one master node, which can read and write, while other slave nodes can only read.
The master node saves the data modification operation to oplog (operation log), and each slave node replicates the data through oplog and applies it locally.
The replication of mongodb requires at least two nodes, usually three or more nodes. Even if only two nodes are enough, one node should be used as an arbitration device and data can not be saved (if there is only one master and one slave, it is not known whether the main fault or the slave fault is in the event of failure).
Each node in the replica set continuously judges its health status by heartbeat information transmission. The default heartbeat information is transmitted every 2S. Once the communication with the master node and other nodes is interrupted for more than 10S, the replica set will trigger re-election and elect a slave node to become a new master node.
Duplicate set characteristics:
Clusters with odd nodes should have at least three nodes.
Any node can be the primary node, and only one primary node can be used.
All write operations are on the primary node
Automatic failover, automatic recovery
Node classification in replica set:
0 priority node:
Cold standby node, not elected as the main node, but can participate in the election process, and hold data sets, can be accessed by the client; often used in disaster recovery in other places
Hidden slave nodes:
First, it has to be a zero priority node, which is not visible to the client.
Delayed replication nodes:
First, it must be a zero priority node, and the replication time lags behind a fixed time of the primary node.
arbiter:
Arbitration node, does not hold data sets
2. mongodb replication set architecture
hearbeat:
Achieve heartbeat information transmission and trigger elections
oplog:
Save data modification operation is the basic tool for replication
Files of fixed size are stored in local databases. Each node in the replication set has an oplog, but only the master node writes the oplog and synchronizes it to the slave node.
oplog is idempotent and runs many times with the result unchanged
Because the size of the oplog is fixed, it is impossible to keep all the operations of the master node, so adding the slave node into the replication assembly initializes first: the slave node replicates the data from the master node's data set, and keeps up with the master node, then replicates the data from the oplog.
> show dbs local 0.078125GB sb (empty) studnets (empty) test (empty) > use local switched to db local > show collections # Copy sets need to be enabled to generate related sets startup_log >
After a new slave node joins the replication set, the operation procedure is as follows:
Initial sync
Post-rollback catch-up
Sharding chunk migrations
local database:
local databases themselves do not participate in replication (they do not replicate to other nodes)
All metadata and oplog of the replica set are stored; a collection named oplog.rs is used to store oplog (which is automatically created when the node first starts after adding the replica set).
The size of oplog.rs depends on OS and file system, defaulting to 5% of disk space (less than 1G for 1G); but you can customize its size: use oplog Size = N in M.
3. Data Synchronization Type of Mongo
1) Initial synchronization
When the slave node has no data, but the master node has data
When a copy history is lost from a node
Initial synchronization steps:
a. Cloning all databases
b. All changes to the application dataset: copy oplog and apply it locally
c. Index all collection s
2) Reproduction
4. Elections
Influencing conditions for re-election of replica sets:
heartbeat message
priority
optime
network connections
Network partition
Electoral mechanism:
Events triggering elections:
New replica set initialization
When the slave node cannot reach the master node
When the primary node "steps down"
When the master node receives the step Down () command
A slave node has higher priority and meets all other conditions of becoming a master node.
The primary node cannot contact the "majority" of the replica set