This article, referring to the blogs of two big men, has made some modifications for reference only.
1, MongoDB deployment
1.1 software version
CDH6.2.1
MongoDB3.4.24
CentOS7
1.2 download
Click the link https://www.mongodb.com/try/download/community , go to the official website to download the rpm file:
We need to download four package s:
1. server: mongoDB server program
2. utilities: additional tools, such as data import and export (optional)
3. mongos: deploy cluster (optional)
4. shell: connect mongoDB with command line
1.3 installation
Enter the following command to install the above four package s:
rpm -ivh mongodb-org-server-3.4.24-1.el7.x86_64.rpm rpm -ivh mongodb-org-tools-3.4.24-1.el7.x86_64.rpm rpm -ivh mongodb-org-mongos-3.4.24-1.el7.x86_64.rpm rpm -ivh mongodb-org-shell-3.4.24-1.el7.x86_64.rpm
Here, I put the RPM package in the / opt/mongodb directory. The corresponding path can be viewed by rpm -qpl rpm package name,
For example:
rpm -qpl mongodb-org-mongos-3.4.24-1.el7.x86_64.rpm
1.4 startup
systemctl start mongod
1.5 connection
mongo --port 27017
1.6 configuration
After the above steps, a / etc / mongod.html file will be generated Conf, which configures the default configuration of mongoDB. The default configuration file is as follows:
# mongod.conf # for documentation of all options, see: # http://docs.mongodb.org/manual/reference/configuration-options/ # where to write logging data. systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log # Where and how to store data. storage: dbPath: /var/lib/mongo journal: enabled: true # engine: # mmapv1: # wiredTiger: # how the process runs processManagement: fork: true # fork and run in background pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile timeZoneInfo: /usr/share/zoneinfo # network interfaces net: port: 27017 bindIp: 127.0.0.1 # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting. #security: #operationProfiling: #replication: #sharding: ## Enterprise-Only Options #auditLog: #snmp:
This is a configuration file in yaml format, where systemlog Path indicates the path of mongoDB's system operation log, storage Dbpath specifies the storage path of mongoDB data file, processmanagement Pidfilepath indicates the storage path of mongoDB runtime process files, net Port indicates the network port number of mongodDB service, net Bindip specifies the ip address of mongoDB network service binding. Using these default configurations directly, mongoDB can run. However, there are the following problems:
1. The above paths are scattered in various directories of the operating system. In practice, we want to store mongoDB data in a path with large storage space, and put some relevant files in the same directory. For example, this path is / data/mongodb // data/mongodb stores the data stored in mongoDB, and the log file directory uses the default directory;
2. The default configured network service binding address is 127.0.0.1, which means that only mongoDB can be connected locally by default.
3. The default port 27017 may be attacked by the outside in some cases. You need to change a less commonly used port.
Therefore, we have modified the default configuration file as follows:
# mongod.conf # for documentation of all options, see: # http://docs.mongodb.org/manual/reference/configuration-options/ # where to write logging data. systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log # Where and how to store data. storage: dbPath: /data/mongodb journal: enabled: true # engine: # mmapv1: # wiredTiger: # how the process runs processManagement: fork: true # fork and run in background pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile # network interfaces net: port: 27017 #Test cluster without modification bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces. #security: #operationProfiling: #replication: #sharding: ## Enterprise-Only Options #auditLog:
Execute the following command to prepare the environment
[root@Mysql_Master ~]# mkdir /data/mongodb [root@Mysql_Master bin]# chown -R mongod:mongod /data/mongodb/
If the corresponding log file also modifies the directory, you can create the corresponding directory.
1.7 startup
If mongoDB's default file directory and default port are not used, we need to set selinux's state to Permissive before starting
setenforce 0
Execute and start mongodb service:
systemctl start mongod
1.8 set account and password (this step is temporarily omitted)
Step 1: turn on authentication
Go to / bin path
./mongod --auth
Step 2: create an administrator user
> use admin switched to db admin > db.createUser({user:"admin",pwd:"password",roles:["root"]}) Successfully added user: { "user" : "admin", "roles" : [ "root" ] }
Step 3: authentication login
> db.auth("admin", "password")
2, Hive docking MongoDB
2.1 configuration
View HIV Env SH path:
I have previously configured it for other services without modification.
Download the jar package and put it into the above Directory:
https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.6.3
https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-hive/2.0.2
https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-core/2.0.2
Note: the jar package version of Mongo Java driver cannot be lower than the mongodb component version
Other software version download address:
https://repo1.maven.org/maven2/org/mongodb/
After entering the above website, click the jar package selected in the corresponding red box, download it to the windows side, and then upload the corresponding HIV Env of linux SH Directory:
Modify file permissions:
chmod -R 777 /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib
2.2 simple test
Execute on Mongodb server:
[root@Mysql_Master bin]# mongo --port 27017
> use test; > db.user.insert({name:'lisi',age:22})
Enter hive to test:
hive> use text; hive> CREATE TABLE user_tmp ( > name STRING, > age INT > ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"name":"name","age":"age"}') > TBLPROPERTIES('mongo.uri'='mongodb://192.168.99.41:27017/test.user'); hive> select * from user_tmp;
Insert data into Mongodb:
> db.user.insert({name:'zhangsan',age:11}) > db.user.insert({name:'liqiushi',age:33}) > db.user.insert({name:'jiuer',age:19}) > db.user.insert({name:'lanlan',age:24})
Then enter hive to query and view:
At the same time, you can try other operations of CRUD, and the data is synchronized in real time.
Explanation: if it is an internal table, the data in mongodb will also be deleted when the table is deleted.
Mongo. columns. Mapping: the mapping between the hive table and the mongodb field. The field names are exactly the same and can not be written,
('mongo.columns.mapping'='{"id":"_id","adam":"Adam","create_time":"createTime"}') mongo.uri'='mongodb://User name: password @ IP: port / library surface
be careful:
The user name and password are not added in this test because they have not been configured before. When setting the password, it is recommended that the tour guide @ will not contact Mongo Uri causes unrecognized.
Reference documents:
https://blog.csdn.net/shujuelin/article/details/106372341
https://blog.csdn.net/weixin_37569048/article/details/103110047
Great writing!!!