MongoDB deploys and interfaces with Hive of CDH

Posted by shneoh on Sun, 20 Feb 2022 06:55:58 +0100

This article, referring to the blogs of two big men, has made some modifications for reference only.

1, MongoDB deployment

1.1 software version

CDH6.2.1
MongoDB3.4.24
CentOS7

1.2 download

Click the link https://www.mongodb.com/try/download/community , go to the official website to download the rpm file:

We need to download four package s:
1. server: mongoDB server program
2. utilities: additional tools, such as data import and export (optional)
3. mongos: deploy cluster (optional)
4. shell: connect mongoDB with command line

1.3 installation

Enter the following command to install the above four package s:

rpm -ivh mongodb-org-server-3.4.24-1.el7.x86_64.rpm 
rpm -ivh mongodb-org-tools-3.4.24-1.el7.x86_64.rpm
rpm -ivh mongodb-org-mongos-3.4.24-1.el7.x86_64.rpm
rpm -ivh mongodb-org-shell-3.4.24-1.el7.x86_64.rpm

Here, I put the RPM package in the / opt/mongodb directory. The corresponding path can be viewed by rpm -qpl rpm package name,
For example:

rpm -qpl mongodb-org-mongos-3.4.24-1.el7.x86_64.rpm

1.4 startup

systemctl start mongod

1.5 connection

mongo --port 27017

1.6 configuration

After the above steps, a / etc / mongod.html file will be generated Conf, which configures the default configuration of mongoDB. The default configuration file is as follows:

# mongod.conf
 
# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/
 
# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
 
# Where and how to store data.
storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:
 
# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile
  timeZoneInfo: /usr/share/zoneinfo
 
# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1  # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting.
 
 
#security:
 
#operationProfiling:
 
#replication:
 
#sharding:
 
## Enterprise-Only Options
 
#auditLog:
 
#snmp:

This is a configuration file in yaml format, where systemlog Path indicates the path of mongoDB's system operation log, storage Dbpath specifies the storage path of mongoDB data file, processmanagement Pidfilepath indicates the storage path of mongoDB runtime process files, net Port indicates the network port number of mongodDB service, net Bindip specifies the ip address of mongoDB network service binding. Using these default configurations directly, mongoDB can run. However, there are the following problems:
1. The above paths are scattered in various directories of the operating system. In practice, we want to store mongoDB data in a path with large storage space, and put some relevant files in the same directory. For example, this path is / data/mongodb // data/mongodb stores the data stored in mongoDB, and the log file directory uses the default directory;
2. The default configured network service binding address is 127.0.0.1, which means that only mongoDB can be connected locally by default.
3. The default port 27017 may be attacked by the outside in some cases. You need to change a less commonly used port.
Therefore, we have modified the default configuration file as follows:

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Where and how to store data.
storage:
  dbPath: /data/mongodb
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:

# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile

# network interfaces
net:
  port: 27017  #Test cluster without modification
  bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.


#security:

#operationProfiling:

#replication:

#sharding:

## Enterprise-Only Options

#auditLog:

Execute the following command to prepare the environment

[root@Mysql_Master ~]# mkdir /data/mongodb
[root@Mysql_Master bin]# chown -R mongod:mongod /data/mongodb/

If the corresponding log file also modifies the directory, you can create the corresponding directory.

1.7 startup

If mongoDB's default file directory and default port are not used, we need to set selinux's state to Permissive before starting

setenforce 0

Execute and start mongodb service:

systemctl start mongod

1.8 set account and password (this step is temporarily omitted)

Step 1: turn on authentication
Go to / bin path

./mongod --auth

Step 2: create an administrator user

> use admin
switched to db admin
> db.createUser({user:"admin",pwd:"password",roles:["root"]})
Successfully added user: { "user" : "admin", "roles" : [ "root" ] }

Step 3: authentication login

> db.auth("admin", "password")

2, Hive docking MongoDB

2.1 configuration

View HIV Env SH path:

I have previously configured it for other services without modification.
Download the jar package and put it into the above Directory:
https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.6.3

https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-hive/2.0.2

https://mvnrepository.com/artifact/org.mongodb.mongo-hadoop/mongo-hadoop-core/2.0.2

Note: the jar package version of Mongo Java driver cannot be lower than the mongodb component version

Other software version download address:
https://repo1.maven.org/maven2/org/mongodb/

After entering the above website, click the jar package selected in the corresponding red box, download it to the windows side, and then upload the corresponding HIV Env of linux SH Directory:

Modify file permissions:

chmod -R 777 /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib

2.2 simple test

Execute on Mongodb server:

[root@Mysql_Master bin]# mongo --port 27017
> use test;

> db.user.insert({name:'lisi',age:22})

Enter hive to test:

hive> use text;

hive> CREATE TABLE user_tmp ( 
    >  name STRING, 
    >  age INT
    >  ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'  
    > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"name":"name","age":"age"}')
    > TBLPROPERTIES('mongo.uri'='mongodb://192.168.99.41:27017/test.user');

hive> select * from user_tmp;

Insert data into Mongodb:

> db.user.insert({name:'zhangsan',age:11})
> db.user.insert({name:'liqiushi',age:33})
> db.user.insert({name:'jiuer',age:19})
> db.user.insert({name:'lanlan',age:24})

Then enter hive to query and view:

At the same time, you can try other operations of CRUD, and the data is synchronized in real time.

Explanation: if it is an internal table, the data in mongodb will also be deleted when the table is deleted.
Mongo. columns. Mapping: the mapping between the hive table and the mongodb field. The field names are exactly the same and can not be written,

('mongo.columns.mapping'='{"id":"_id","adam":"Adam","create_time":"createTime"}')
mongo.uri'='mongodb://User name: password @ IP: port / library surface

be careful:
The user name and password are not added in this test because they have not been configured before. When setting the password, it is recommended that the tour guide @ will not contact Mongo Uri causes unrecognized.
Reference documents:
https://blog.csdn.net/shujuelin/article/details/106372341
https://blog.csdn.net/weixin_37569048/article/details/103110047
Great writing!!!

Topics: hive MongoDB cloudera