[big data platform construction] Greenplum 6.17 cluster construction

Posted by noginn on Thu, 20 Jan 2022 05:19:02 +0100

Cluster planning

HostIPfunction
centos01192.168.52.221master
centos02192.168.52.222segment
centos03192.168.52.223segment

It is a cluster composed of three machines in total, and the standby node is not set up

System environment

nameexplain
operating systemCentos7
GreenplumGreenplum 6.17
JavaJDK8
GCCGCC 4.8.5

Modify system files

Modify the system resource limit, / etc / security / limits Add the following information to the conf file:

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

Modify the system kernel information, / etc / sysctl Add the following information to the conf file:

# kernel.shmall calculates echo $(expr $(getconf _phys_pages) / 2 with the following command
kernel.shmall = 357475    
# kernel.shmmax calculates echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE)) with the following command
kernel.shmmax = 1464217600         
kernel.shmmni = 4096
vm.overcommit_memory = 2
vm.overcommit_ratio = 95

kernel.sem = 500 2048000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 3
vm.dirty_ratio = 10

After modification, execute the command to take effect immediately

sysctl -p

Modify / etc / security / limits * * nproc. Under D / The conf file is (the file name may be 20-nproc.conf or 90-nproc.conf):

*          soft    nproc     131072

Close selinux and modify / etc/sysconfig/selinux:

SELINUX=disabled

Before setting up the mapping of all nodes of / etc/hosts and closing the firewall, hadoop and spark clusters have been configured and will not be repeated

Add user group

This step cannot be omitted!!! Because the root user cannot be used for subsequent initialization of greenplus!!!

Add gpadmin user group and grant corresponding permissions

# add group 
groupadd -g 530 gpadmin
# Add user
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
# Change owner
chown -R gpadmin:gpadmin /home/gpadmin
# Change Password
passwd gpadmin

Configure ssh password free login of gpadmin users on three machines. The steps here cannot be omitted!!! ( For details, please refer to Centos 7 cluster configuration SSH password free login )Otherwise, you need to enter the gpadmin login password of the three machines n times during subsequent initialization of greenplug (when the initial iron is too lazy to configure, it is very tired to repeatedly enter the passwords of the three machines. Finally, an exception occurs when initializing the intermediate batch to create gpseg, which cannot be carried out)

Installing Greenplum

Switch the gpadmin user and create the configuration folder:

su gpadmin 
mkdir -p /home/gpadmin/conf

Create the hostlist and edit the file:

vim /home/gpadmin/conf/hostlist

centos01
centos02
centos03

Create seg_hosts file and edit:

vim /home/gpadmin/conf/seg_hosts
 
centos02
centos03

The three machines are installed in the specified directory respectively, Download address

rpm -ivh --prefix=/usr/local/services/greenplum/  open-source-greenplum-db-6.17.0-rhel7-x86_64.rpm

If an error is reported in this step, it is generally caused by the lack of required dependencies. Follow the prompts to install the required dependencies

# For example, my machine lacks apr and apr util, just install them one by one according to the prompts
yum install -y apr-util

Switch to the root user and configure the secret free connection for greenplus

source /usr/local/services/greenplum/greenplum-db/greenplum_path.sh 
gpssh-exkeys -f /home/gpadmin/conf/hostlist

Batch create data directory and authorize

# Secret free connection, batch operation of three machines
gpssh -f /home/gpadmin/conf/hostlist

mkdir -p /opt/greenplum/data/master
mkdir -p /opt/greenplum/data/primary
mkdir -p /opt/greenplum/data/mirror
mkdir -p /opt/greenplum/data2/primary
mkdir -p /opt/greenplum/data2/mirror

# Authorize users
chown -R gpadmin:gpadmin /usr/local
chown -R gpadmin:gpadmin /opt

Configure the environment variables for the gpadmin user

# Open file
vim /home/gpadmin/.bash_profile

# Added content
source /usr/local/services/greenplum/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
export GPPORT=5432
export PGDATABASE=gp_sydb

# Apply changes with immediate effect
source .bash_profile

Initialize database

Create a new initialization configuration file initgp_config

cd /usr/local/services/greenplum/greenplum-db/docs/cli_help/gpconfigs
cp gpinitsystem_config initgp_config

Modify configuration file initgp_config:

declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary  /opt/greenplum/data2/primary /opt/greenplum/data2/primary)
declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror  /opt/greenplum/data2/mirror /opt/greenplum/data2/mirror)
 
ARRAY_NAME="gp_sydb"                                           #Initialize database name
MASTER_HOSTNAME=centos01                                       #Master node name
MASTER_DIRECTORY=/opt/greenplum/data/master                    #The resource directory is a previously created resource directory
MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1       
DATABASE_NAME=gp_sydb                                          #Configured initialization database name
MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts                 

Switch the gpadmin user and perform initialization (you must use the user operation created earlier, not under the root user)

source /usr/local/services/greenplum/greenplum-db/greenplum_path.sh
gpinitsystem -c initgp_config -D

If there is an error in this step, you need to delete all the gpseg files generated by initialization and restart initialization (/ opt / greenplus / data / / opt / greenplus / data2 primary, gpseg-1 created in master, etc.). The specific error information can be viewed in the log in / home/gpadmin/gpAdminLogs

Handling of errors reported by Greenplum connecting to external clients

Error message: no pg_hba.conf entry for host

Modify / opt / greenplus / data / Master / gpseg-1 / PG under the master node_ hba. Conf configuration file

# Add a line to indicate that any user is allowed to connect
host   all   all   0.0.0.0/0    trust

Database operation

commandsignificance
gpstartStart database
gpstop -rrestart
gpstop -uReload only configuration file changes
gpstopStop database
psql -d gp_sydbLog in to gp_sydb database

Topics: Linux Database Big Data