下载安装包并解压,得到一个bin文件,从bin文件提取rpm文件

1
2
# 前960行为安装脚本
[root@hadoop2 ~]# tail -n +961 greenplum-db-5.16.0-rhel7-x86_64.bin > gpdb.tar.gz

如果集群没有gpadmin用户,可以利用gpssh批量创建gpadmin用户。首先以root用户登陆master节点,解压gpdb.tar.gz:

1
2
[root@hadoop2 ~]# mkdir gpdb-5.16.0
[root@hadoop2 ~]# tar zxf gpdb.tar.gz -C gpdb-5.16.0

修改gpdb-5.16.0/greenplum_path.sh文件:

1
GPHOME=~/gpdb-5.16.0

使gpdb环境变量生效:

1
2
3
[root@hadoop2 ~]# source gpdb-5.16.0/greenplum_path.sh
[root@hadoop2 ~]# which gpssh
/root/gpdb-5.16.0/bin/gpssh

创建hosts文件,包含集群所有节点的主机名,主机名需要添加到/etc/hosts文件中:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[root@hadoop2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.203.12 hadoop2.lw hadoop2
192.168.203.13 hadoop3.lw hadoop3
192.168.203.14 hadoop4.lw hadoop4
[root@hadoop2 ~]# cat hosts
hadoop2
hadoop3
hadoop4

交换各节点之间的ssh key:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[root@hadoop2 ~]# gpssh-exkeys -f hosts
[STEP 1 of 5] create local ID and authorize on local host
[STEP 2 of 5] keyscan all hosts and update known_hosts file
[STEP 3 of 5] authorize current user on remote hosts
  ... send to hadoop3
  *** Enter password for hadoop3:
  ... send to hadoop4
[STEP 4 of 5] determine common authentication file content
[STEP 5 of 5] copy authentication files to all remote hosts
  ... finished key exchange with hadoop3
  ... finished key exchange with hadoop4
[INFO] completed successfully

现在可以开始批量创建gpadmin用户了:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
[root@hadoop2 ~]# gpssh -f hosts
=> groupadd -g 530 gpadmin
[hadoop2]
[hadoop4]
[hadoop3]
=> useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
[hadoop2]
[hadoop4]
[hadoop3]
=> echo gpadmin:gpadmin | chpasswd
[hadoop2]
[hadoop4]
[hadoop3]

修改每个节点上的文件打开数量限制,这里只展示一个节点,其他节点类似操作。因为之前有交换过key,所以可以无需密码直接通过ssh hostname连接到其他节点修改配置,这个步骤也可以用gpssh处理,但是稳妥起见,还是手动修改每个节点吧。

1
2
3
4
5
6
[root@hadoop2 ~]# vi /etc/security/limits.conf
# End of file
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

将数据库安装目录移动到/home/gpadmin

1
2
3
4
[root@hadoop2 ~]# mv gpdb-5.16.0 /home/gpadmin
[root@hadoop2 ~]# chown -R gpadmin:gpadmin /home/gpadmin/gpdb-5.16.0
[root@hadoop2 ~]# mv hosts /home/gpadmin
[root@hadoop2 ~]# chown gpadmin:gpadmin /home/gpadmin/hosts

接下来切换到gpadmin用户下安装gpdb:

1
[root@hadoop2 ~]# su - gpadmin

修改.bash_profile,添加如下内容:

1
2
3
4
5
6
7
GREENPLUM_PATH=~/gpdb-5.16.0/greenplum_path.sh
source $GREENPLUM_PATH
#DATA_DIR目录用于存放master数据,如果你想把数据放到别的地方,不要忘了修改这个路径
DATA_DIR=~/data/master/gpseg-1/
export MASTER_DATA_DIRECTORY=$DATA_DIR
export PGPORT=5432
export PGDATABASE=postgres

使.bash_profile生效:

1
2
3
[gpadmin@hadoop2 ~]$ source .bash_profile
[gpadmin@hadoop2 ~]$ which gpssh
~/gpdb-5.16.0/bin/gpssh

因为现在处于gpadmin用户下,需要再次交换一下各节点之间的ssh key:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[gpadmin@hadoop2 ~]$ gpssh-exkeys -f hosts
[STEP 1 of 5] create local ID and authorize on local host
[STEP 2 of 5] keyscan all hosts and update known_hosts file
[STEP 3 of 5] authorize current user on remote hosts
  ... send to hadoop3
  *** Enter password for hadoop3:
  ... send to hadoop4
[STEP 4 of 5] determine common authentication file content
[STEP 5 of 5] copy authentication files to all remote hosts
  ... finished key exchange with hadoop3
  ... finished key exchange with hadoop4
[INFO] completed successfully

创建segs文件,该文件包含所有子节点的主机名,不包括主节点的主机名:

1
2
3
[gpadmin@hadoop2 ~]$ cat segs
hadoop3
hadoop4

将数据库安装目录分发到各个子节点:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[gpadmin@hadoop2 ~]$ gpseginstall -f segs
20190411:13:52:19:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-Installation Info:
link_name None
binary_path /home/gpadmin/gpdb-5.16.0
binary_dir_location /home/gpadmin
binary_dir_name gpdb-5.16.0
20190411:13:52:19:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-check cluster password access
20190411:13:52:19:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-de-duplicate hostnames
20190411:13:52:19:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-master hostname: hadoop2
20190411:13:52:20:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-rm -f /home/gpadmin/gpdb-5.16.0.tar; rm -f /home/gpadmin/gpdb-5.16.0.tar.gz
...
20190411:13:54:42:027100 gpseginstall:hadoop2:gpadmin-[INFO]:-SUCCESS -- Requested commands completed

批量创建存放数据的目录:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
[gpadmin@hadoop2 ~]$ gpssh -f hosts
=> mkdir data
[hadoop2]
[hadoop4]
[hadoop3]
=> cd data
[hadoop2]
[hadoop4]
[hadoop3]
=> mkdir p1 p2 m1 m2 master
[hadoop2]
[hadoop4]
[hadoop3]

从数据库安装目录中拷贝一个初始化数据库的配置文件模板:

1
cp gpdb-5.16.0/docs/cli_help/gpconfigs/gpinitsystem_config .

修改初始化数据库的配置文件,主要修改以下选项:

1
2
3
4
5
PORT_BASE=40000 #这个端口不要设置太小了,可能会端口冲突
declare -a DATA_DIRECTORY=(/home/gpadmin/data/p1 /home/gpadmin/data/p2)
MASTER_HOSTNAME=hadoop2
MASTER_DIRECTORY=/home/gpadmin/data/master
declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/data/m1 /home/gpadmin/data/m2)

最后初始化数据库:

1
gpinitsystem -c gpinitsystem_config -h hosts

修改数据库用户gpadmin密码:

1
postgres=# ALTER USER gpadmin PASSWORD 'gpadmin';

修改master数据目录中的配置文件pg_hba.conf,在最后添加以下内容:

1
host    all     all     192.168.0.0/16  md5

重新加载配置文件:

1
gpstop -ua

至此,gpdb数据库可以接受内网的访问了。