[MongoDB] 02, MongoDB index and replication

Posted by dolce on Tue, 16 Jul 2019 02:52:22 +0200


Index

      Indexing can usually greatly improve the efficiency of queries. Without indexing, MongoDB must scan every file in the collection and select those records that meet the query criteria when reading data. The efficiency of this kind of query is very low, especially when dealing with a large amount of data, the query can take tens of seconds or even minutes, which is very fatal to the performance of the website.

      Index is a special data structure. Index is stored in a data set that is easy to read and traverse. Index is a structure that sorts the values of one or more columns in a database table.


1. Type of index

 B+ Tree, hash, spatial index, full-text index

  Indexes supported by MongoDB:

    Word index, combination index (multi-field index),

    Multi-key index: Index creates an index on a pair of values and keys

    Spatial Index: Location-based Search

    Text index: equivalent to full-text index

    hash index: precise search, not scope search


2. Index Management

Establish:

  db.mycoll.ensureIndex(keypattern[,options])

View help information:

   db.mycoll.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups


db.COLLECTION_NAME.ensureIndex({KEY:1})

    The Key value in the grammar is the index field you want to create, 1 is to create the index in ascending order, if you want to create the index in descending order, you can specify - 1. In the ensureIndex() method, you can also set up to create an index using multiple fields (called composite indexes in relational databases). db.col.ensureIndex({"title":1,"description":-1})


The ensureIndex() receives the optional parameters, which are listed as follows:

ParameterTypeDescription
backgroundBooleanThe indexing process can block other database operations, and backgroundcan specify a background way to create the index, that is, to add "background" optional parameters. The default value of "background" is false.
uniqueBooleanIs the index created unique? Specifies that a unique index is created for true. The default value is false.
namestringThe name of the index. If not specified, MongoDB generates an index name by joining the field name and sorting order of the index.
dropDupsBooleanWhether to delete duplicate records when creating a unique index, specify true to create a unique index. The default value is false.
sparseBooleanIndexing is not enabled for field data that does not exist in the document; this parameter requires special attention. If set to true, documents that do not contain corresponding fields will not be queried in the index field. The default value is false.
expireAfterSecondsintegerSpecify a value in seconds, complete TTL settings, and set the lifetime of the collection.
vindex versionThe version number of the index. The default index version depends on the version that mongod runs when it creates the index.
weightsdocumentThe index weight value, which ranges from 1 to 99,999, represents the score weight of the index relative to other index fields.
default_languagestringFor text indexing, this parameter determines the list of rules for stop words and stemmers and organs. Default English
language_overridestringFor text indexing, this parameter specifies the field name contained in the document. The language overrides the default language and the default value is language.


Query:

  db.mycoll.getIndex()


Delete:

  db.mycoll.dropIndexes() Delete all indexes of the current collection

  db.mycoll.dropIndexes("index") Deletes the specified index

  db.mycoll.reIndex(). Rebuild the index.


> db.students.find()
> for (i=1;i<=100;i++) db.students.insert({name:"student"+i, age:(i%100)}) 
                                                                       #  Use the for loop 
> db.students.find().count()
100
> db.students.find()
{ "_id" : ObjectId("58d613021e8383d30814f846"), "name" : "student1", "age" : 1 }
{ "_id" : ObjectId("58d613021e8383d30814f847"), "name" : "student2", "age" : 2 }
{ "_id" : ObjectId("58d613021e8383d30814f848"), "name" : "student3", "age" : 3 }
{ "_id" : ObjectId("58d613021e8383d30814f849"), "name" : "student4", "age" : 4 }
{ "_id" : ObjectId("58d613021e8383d30814f84a"), "name" : "student5", "age" : 5 }
{ "_id" : ObjectId("58d613021e8383d30814f84b"), "name" : "student6", "age" : 6 }
{ "_id" : ObjectId("58d613021e8383d30814f84c"), "name" : "student7", "age" : 7 }
{ "_id" : ObjectId("58d613021e8383d30814f84d"), "name" : "student8", "age" : 8 }
{ "_id" : ObjectId("58d613021e8383d30814f84e"), "name" : "student9", "age" : 9 }
{ "_id" : ObjectId("58d613021e8383d30814f84f"), "name" : "student10", "age" : 10 }
{ "_id" : ObjectId("58d613021e8383d30814f850"), "name" : "student11", "age" : 11 }
{ "_id" : ObjectId("58d613021e8383d30814f851"), "name" : "student12", "age" : 12 }
{ "_id" : ObjectId("58d613021e8383d30814f852"), "name" : "student13", "age" : 13 }
{ "_id" : ObjectId("58d613021e8383d30814f853"), "name" : "student14", "age" : 14 }
{ "_id" : ObjectId("58d613021e8383d30814f854"), "name" : "student15", "age" : 15 }
{ "_id" : ObjectId("58d613021e8383d30814f855"), "name" : "student16", "age" : 16 }
{ "_id" : ObjectId("58d613021e8383d30814f856"), "name" : "student17", "age" : 17 }
{ "_id" : ObjectId("58d613021e8383d30814f857"), "name" : "student18", "age" : 18 }
{ "_id" : ObjectId("58d613021e8383d30814f858"), "name" : "student19", "age" : 19 }
{ "_id" : ObjectId("58d613021e8383d30814f859"), "name" : "student20", "age" : 20 }
Type "it" for more      # Show only the top 20, it shows more

> db.students.ensureIndex({name:1})   #Build an index on the name key, 1 for ascending order and - 1 for descending order
> show collections
students
system.indexes
t1

> db.students.getIndexes()
[
	{                               # Default index
		"v" : 1,              
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"  # Databases. Collections
	},
	{
		"v" : 1,
		"name" : "name_1",      #Automatically generated index names
		"key" : {   
			"name" : 1   # Index created on the name key
		},
		"ns" : "students.students"  
	}
]

> db.students.dropIndexes("name_1")      #Delete the specified index
{
	"nIndexesWas" : 2,
	"msg" : "non-_id indexes dropped for collection",
	"ok" : 1
}
> db.students.getIndexes()
[
	{
		"v" : 1,
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"
	}
]
> db.students.dropIndexes()        # The default index cannot be deleted.
{
	"nIndexesWas" : 1,
	"msg" : "non-_id indexes dropped for collection",
	"ok" : 1
}
> db.students.getIndexes()
[
	{
		"v" : 1,
		"name" : "_id_",
		"key" : {
			"_id" : 1
		},
		"ns" : "students.students"
	}
	
	 
> db.students.find({age:"90"}).explain()       # Display the query process
{
	"cursor" : "BtreeCursor t1",
	"isMultiKey" : false,
	"n" : 0,
	"nscannedObjects" : 0,     
	"nscanned" : 0,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 0,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 17,
	"indexBounds" : {               #Index used
		"age" : [
			[
				"90",
				"90"
			]
		]
	},
	"server" : "Node7:27017"
}


MongoDB configuration

 The configuration items in the mongodb configuration file / etc/mongodb.conf are all mongod startup options (just like memcached) 

[root@Node7 ~]# mongod --help
Allowed options:

General options:
  -h [ --help ]               show this usage information
  --version                   show version information
  -f [ --config ] arg         configuration file specifying additional options
  -v [ --verbose ]            be more verbose (include multiple times for more 
                              verbosity e.g. -vvvvv)
  --quiet                     quieter output
  --port arg                  specify port number - 27017 by default
  --bind_ip arg               comma separated list of ip addresses to listen on
                              - all local ips by default
  --maxConns arg              max number of simultaneous connections - 20000 by
                              default
  --logpath arg               log file to send write to instead of stdout - has
                              to be a file, not directory
  --logappend                 append to logpath instead of over-writing
  --pidfilepath arg           full path to pidfile (if not set, no pidfile is 
                              created)
  --keyFile arg               private key for cluster authentication
  --setParameter arg          Set a configurable parameter
  --nounixsocket              disable listening on unix sockets
  --unixSocketPrefix arg      alternative directory for UNIX domain sockets 
                              (defaults to /tmp)
  --fork                      fork server process
  --syslog                    log to system's syslog facility instead of file 
                              or stdout
  --auth                      run with security
  --cpu                       periodically show cpu and iowait utilization   # 
  --dbpath arg                directory for datafiles - defaults to /data/db/
  --diaglog arg               0=off 1=W 2=R 3=both 7=W+some reads
  --directoryperdb            each database will be stored in a separate 
                              directory
  --ipv6                      enable IPv6 support (disabled by default)
  --journal                   enable journaling     # Whether transaction log is enabled by default
  --journalCommitInterval arg how often to group/batch commit (ms)
  --journalOptions arg        journal diagnostic options
  --jsonp                     allow JSONP access via http (has security 
                              implications)
  --noauth                    run without security
  --nohttpinterface           disable http interface
  --nojournal                 disable journaling (journaling is on by default 
                              for 64 bit)
  --noprealloc                disable data file preallocation - will often hurt
                              performance
  --noscripting               disable scripting engine
  --notablescan               do not allow table scans
  --nssize arg (=16)          .ns file size (in MB) for new databases
  --profile arg               0=off 1=slow, 2=all   #Performance analysis
  --quota                     limits each database to a certain number of files
                              (8 default)
  --quotaFiles arg            number of files allowed per db, requires --quota
  --repair                    run repair on all dbs    
                         #When unexpectedly closed, this should be enabled to repair data
  --repairpath arg            root directory for repair files - defaults to 
                              dbpath
  --rest                      turn on simple rest api
  --shutdown                  kill a running server (for init scripts)
  --slowms arg (=100)         value of slow for profile and console log 
                   #Set a slow query, in ms, over the set time for a slow query
  --smallfiles                use a smaller default file size
  --syncdelay arg (=60)       seconds between disk syncs (0=never, but not 
                              recommended)
  --sysinfo                   print some diagnostic system information
  --upgrade                   upgrade db if needed

Replication options:
  --oplogSize arg       size to use (in MB) for replication op log. default is 
                        5% of disk space (i.e. large is good)

Master/slave options (old; use replica sets instead):
  --master              master mode
  --slave               slave mode
  --source arg          when slave: specify master as <server:port>
  --only arg            when slave: specify a single database to replicate
  --slavedelay arg      specify delay (in seconds) to be used when applying 
                        master ops to slave
  --autoresync          automatically resync if slave data is stale

Replica set options:
  --replSet arg           arg is <setname>[/<optionalseedhostlist>]
  --replIndexPrefetch arg specify index prefetching behavior (if secondary) 
                          [none|_id_only|all]

Sharding options:
  --configsvr           declare this is a config db of a cluster; default port 
                        27019; default dir /data/configdb
  --shardsvr            declare this is a shard db of a cluster; default port 
                        27018

SSL options:
  --sslOnNormalPorts              use ssl on configured ports
  --sslPEMKeyFile arg             PEM file for ssl
  --sslPEMKeyPassword arg         PEM file password
  --sslCAFile arg                 Certificate Authority file for SSL
  --sslCRLFile arg                Certificate Revocation List file for SSL
  --sslWeakCertificateValidation  allow client to connect without presenting a 
                                  certificate
  --sslFIPSMode                   activate FIPS 140-2 mode at startup

Common configuration parameters:

   Whether fork={true | false} mongod runs in the background

   Bid_ip=IP) Specify the listener address

   port=PORT. Specifies the port to listen on, default is 27017

   maxConns=N Specifies the maximum number of concurrent connections

   syslog=/PATH/TO/SAME_FILE Specified Log File

   httpinterface=true. Whether to start the web monitoring function, the port is mongod port + 1000


3. Replication of MongoDB

1. Introduction to mongodb replication

There are two types of mongodb replication implementations:

  master/slave: Very similar to mysql master-slave replication, which is rarely used.

  Replica set: a replica set or replica set that automatically fails over

Replication set

   A set of mongodb instances serving the same data set

   A replication set can only have one master node, which can read and write, while other slave nodes can only read.

   The master node saves the data modification operation to oplog (operation log), and each slave node replicates the data through oplog and applies it locally. 

        The replication of mongodb requires at least two nodes, usually three or more nodes. Even if only two nodes are enough, one node should be used as an arbitration device and data can not be saved (if there is only one master and one slave, it is not known whether the main fault or the slave fault is in the event of failure).

         Each node in the replica set continuously judges its health status by heartbeat information transmission. The default heartbeat information is transmitted every 2S. Once the communication with the master node and other nodes is interrupted for more than 10S, the replica set will trigger re-election and elect a slave node to become a new master node.


Duplicate set characteristics:

  • Clusters with odd nodes should have at least three nodes.

  • Any node can be the primary node, and only one primary node can be used.

  • All write operations are on the primary node

  • Automatic failover, automatic recovery


Node classification in replica set:

  0 priority node:

     Cold standby node, not elected as the main node, but can participate in the election process, and hold data sets, can be accessed by the client; often used in disaster recovery in other places

 Hidden slave nodes:

     First, it has to be a zero priority node, which is not visible to the client.

 Delayed replication nodes: 

     First, it must be a zero priority node, and the replication time lags behind a fixed time of the primary node.

 arbiter:

     Arbitration node, does not hold data sets


2. mongodb replication set architecture

hearbeat:

  Achieve heartbeat information transmission and trigger elections


oplog:

  Save data modification operation is the basic tool for replication

   Files of fixed size are stored in local databases. Each node in the replication set has an oplog, but only the master node writes the oplog and synchronizes it to the slave node.

   oplog is idempotent and runs many times with the result unchanged

   Because the size of the oplog is fixed, it is impossible to keep all the operations of the master node, so adding the slave node into the replication assembly initializes first: the slave node replicates the data from the master node's data set, and keeps up with the master node, then replicates the data from the oplog.

> show dbs
local	0.078125GB
sb	(empty)
studnets	(empty)
test	(empty)
> use local
switched to db local
> show collections     # Copy sets need to be enabled to generate related sets
startup_log
>

After a new slave node joins the replication set, the operation procedure is as follows:

    Initial sync

    Post-rollback catch-up

    Sharding chunk migrations


local database: 

   local databases themselves do not participate in replication (they do not replicate to other nodes)

   All metadata and oplog of the replica set are stored; a collection named oplog.rs is used to store oplog (which is automatically created when the node first starts after adding the replica set).

   The size of oplog.rs depends on OS and file system, defaulting to 5% of disk space (less than 1G for 1G); but you can customize its size: use oplog Size = N in M.


3. Data Synchronization Type of Mongo

 1) Initial synchronization

    When the slave node has no data, but the master node has data

    When a copy history is lost from a node


Initial synchronization steps:

 a. Cloning all databases

 b. All changes to the application dataset: copy oplog and apply it locally

 c. Index all collection s


 2) Reproduction


4. Elections

Influencing conditions for re-election of replica sets:

  heartbeat message

  priority

  optime

  network connections

  Network partition


Electoral mechanism:

   Events triggering elections:

 New replica set initialization

 When the slave node cannot reach the master node

 When the primary node "steps down"

    When the master node receives the step Down () command

    A slave node has higher priority and meets all other conditions of becoming a master node.

    The primary node cannot contact the "majority" of the replica set





Topics: MongoDB Database SSL Unix