尚硅谷大数据项目数据仓库,电商数仓V1.2新版

    技术2024-07-19  70

    尚硅谷大数据项目数据仓库,电商数仓V1.2新版

    数据仓库 Data WareHouse

    为企业所有决策制定过程,提供所有系统数据支持的战略集合。通过对数据仓库中数据的分析,可以帮助企业,改进业务流程、控制成本、提高产品质量等。不是数据的最终目的,而是为数据最终的目的地做好准备。这些准备包括数据的:清洗、转义、分类、重组、合并、拆分、统计等等

    输入

    日志采集系统业务系统数据库爬虫系统

    输出

    报表系统用户画像推荐系统机器学习风控系统

    项目需求

    用户行为数据采集平台搭建(存储形式:文件)业务数据采集平台搭建(存储形式:关系型数据库)数据仓库维度建模分析指标:用户、流量、会员、商品、销售、地区、活动等电商核心主题,统计的报表指标近100个。采用即席查询工具,随时进行指标分析对集群性能进行监控,发生异常需要报警元数据管理质量监控

    技术选型

    考虑因素

    数据量大小业务需求行业内经验技术成熟度开发维护成本总成本预算

    数据采集传输

    FlumeKafkaSqoopLogstashDataX

    数据存储

    MySQLHDFSHBaseRedisMongoDB

    数据计算

    HiveTezSparkFlinkStorm

    数据查询

    PrestoDruidImpalaKylin

    数据可视化

    EchartsSupersetQuickBIDataV

    任务调度

    AxkabanOozie

    集群监控

    Zabbix

    元数据管理

    Atlas

    数据质量监控

    Griffin

    数仓架构

    P07

    Zabbix

    1、Zabbix入门

    Zabbix 是一款能够监控各种网络参数以及服务器健康性和完整性的软件。 Zabbix 使用 灵活的通知机制, 允许用户为几乎任何事件配置基于邮件的告警。 这样可以快速反馈服务器的问题。 基于已存储的数据, Zabbix 提供了出色的报告和数据可视化功能。

    2、Zabbix安装之server节点

    2.1 集群规划

    节点服务chenhao01zabbix-server、zabbix-agent、MySQLchenhao02zabbix-agentchenhao03zabbix-agent

    2.2 准备工作

    2.2.1 关闭防火墙

    启动: systemctl start firewalld 查状态:systemctl status firewalld 停止: systemctl stop firewalld 禁用: systemctl disable firewalld

    2.2.2 关闭SELinux

    修改配置文件/etc/selinux/config

    [root@chenhao01 ~]# vim /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of three values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection.

    重启服务器

    2.3 Zabbix-server/agent编译及安装

    2.3.1 创建用户

    #程序使用的系统用户 sudo groupadd --system zabbix sudo useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix

    2.3.2 上传zabbix安装包并解压

    # 安装lrzsz 使用 rz sz命令 yum install lrzsz -y MySQL-client-5.6.24-1.el6.x86_64.rpm MySQL-devel-5.6.24-1.el6.x86_64.rpm MySQL-embedded-5.6.24-1.el6.x86_64.rpm MySQL-server-5.6.24-1.el6.x86_64.rpm MySQL-shared-5.6.24-1.el6.x86_64.rpm MySQL-shared-compat-5.6.24-1.el6.x86_64.rpm MySQL-test-5.6.24-1.el6.x86_64.rpm zabbix-4.2.8.tar.gz tar -xvf zabbix-4.2.8.tar.gz

    mysql安装

    yum -y remove mariadb-libs-1:5.5.64-1.el7.x86_64 yum -y install autoconf rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm # 查看mysql随机密码 cat /root/.mysql_secret # 连接mysql 修改root密码 SET PASSWORD = PASSWORD('982292'); # 查看mysql配置文件生效顺序 mysql --help | grep -E '*.cnf' # mysql远程连接 mysql -h 192.168.174.131 -P 3306 -u root -p982292 # 授权数据库远程访问 grant all privileges on *.* to 'root'@'%' identified by '982292' WITH GRANT OPTION; flush privileges; service mysql restart

    创建配置文件/etc/my.cnf 启用数据库远程连接

    [client] default-character-set=utf8 [mysql] default-character-set=utf8 [mysqld] init_connect='SET collation_connection = utf8_unicode_ci' init_connect='SET NAMES utf8' character-set-server=utf8 skip-character-set-client-handshake skip-name-resolve

    mysql目录权限配置

    chmod 700 /var/lib/mysql/ -R chown mysql:mysql /var/lib/mysql/ -R service mysql restart

    2.3.3 创建zabbix数据库和表

    访问zabbix-4.2.8/database/mysql/

    连接mysql执行建库建表语句

    mysql -u root -p mysql> create database zabbix default character set utf8 collate utf8_bin; mysql> use zabbix mysql> source schema.sql; mysql> source data.sql; mysql> source images.sql;

    2.3.4 编译环境准备

    安装MySQL相关rpm包

    rpm -ivh MySQL-devel-5.6.24-1.el6.x86_64.rpm rpm -ivh MySQL-embedded-5.6.24-1.el6.x86_64.rpm rpm -ivh MySQL-shared-5.6.24-1.el6.x86_64.rpm rpm -ivh MySQL-shared-compat-5.6.24-1.el6.x86_64.rpm

    安装依赖

    sudo yum install -y libcurl libcurl-devel libxml2 libxml2-devel net-snmp-devel libevent-devel pcre-devel gcc-c++

    2.3.5 编译及安装

    进入解压目录zabbix-4.2.8

    编译安装

    ./configure --enable-server --enable-agent --with-mysql --enable-ipv6 --with-net-snmp --with-libcurl --with-libxml2 make install

    2.3.6 修改配置文件

    修改zabbix-server配置文件

    vim /usr/local/etc/zabbix_server.conf DBHost=chenhao01 DBName=zabbix DBUser=root DBPassword=XXXXXX

    修改zabbix-agent配置文件

    vim /usr/local/etc/zabbix_agentd.conf Server=hadoop102 #ServerActive=127.0.0.1 #Hostname=Zabbix server

    2.3.7 编写系统服务脚本

    编辑zabbix-server文件

    vim /etc/init.d/zabbix-server

    内容如下

    #!/bin/sh # # chkconfig: - 85 15 # description: Zabbix server daemon # config: /usr/local/etc/zabbix_server.conf # ### BEGIN INIT INFO # Provides: zabbix # Required-Start: $local_fs $network # Required-Stop: $local_fs $network # Default-Start: # Default-Stop: 0 1 2 3 4 5 6 # Short-Description: Start and stop Zabbix server # Description: Zabbix server ### END INIT INFO # Source function library. . /etc/rc.d/init.d/functions if [ -x /usr/local/sbin/zabbix_server ]; then exec=/usr/local/sbin/zabbix_server else exit 5 fi prog=zabbix_server conf=/usr/local/etc/zabbix_server.conf pidfile=/tmp/zabbix_server.pid timeout=10 if [ -f /etc/sysconfig/zabbix-server ]; then . /etc/sysconfig/zabbix-server fi lockfile=/var/lock/subsys/zabbix-server start() { echo -n $"Starting Zabbix server: " daemon $exec -c $conf rv=$? echo [ $rv -eq 0 ] && touch $lockfile return $rv } stop() { echo -n $"Shutting down Zabbix server: " killproc -p $pidfile -d $timeout $prog rv=$? echo [ $rv -eq 0 ] && rm -f $lockfile return $rv } restart() { stop start } case "$1" in start|stop|restart) $1 ;; force-reload) restart ;; status) status -p $pidfile $prog ;; try-restart|condrestart) if status $prog >/dev/null ; then restart fi ;; reload) action $"Service ${0##*/} does not support the reload action:" /bin/false exit 3 ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}" exit 2 ;; esac

    添加执行权限

    chmod +x /etc/init.d/zabbix-server

    编辑zabbix-agent文件

    vim /etc/init.d/zabbix-agent

    内容如下

    #!/bin/sh # #chkconfig: - 86 14 # description: Zabbix agent daemon # processname: zabbix_agentd # config: /usr/local/etc/zabbix_agentd.conf # ### BEGIN INIT INFO # Provides: zabbix-agent # Required-Start: $local_fs $network # Required-Stop: $local_fs $network # Should-Start: zabbix zabbix-proxy # Should-Stop: zabbix zabbix-proxy # Default-Start: # Default-Stop: 0 1 2 3 4 5 6 # Short-Description: Start and stop Zabbix agent # Description: Zabbix agent ### END INIT INFO # Source function library. . /etc/rc.d/init.d/functions if [ -x /usr/local/sbin/zabbix_agentd ]; then exec=/usr/local/sbin/zabbix_agentd else exit 5 fi prog=zabbix_agentd conf=/usr/local/etc/zabbix_agentd.conf pidfile=/tmp/zabbix_agentd.pid timeout=10 if [ -f /etc/sysconfig/zabbix-agent ]; then . /etc/sysconfig/zabbix-agent fi lockfile=/var/lock/subsys/zabbix-agent start() { echo -n $"Starting Zabbix agent: " daemon $exec -c $conf rv=$? echo [ $rv -eq 0 ] && touch $lockfile return $rv } stop() { echo -n $"Shutting down Zabbix agent: " killproc -p $pidfile -d $timeout $prog rv=$? echo [ $rv -eq 0 ] && rm -f $lockfile return $rv } restart() { stop start } case "$1" in start|stop|restart) $1 ;; force-reload) restart ;; status) status -p $pidfile $prog ;; try-restart|condrestart) if status $prog >/dev/null ; then restart fi ;; reload) action $"Service ${0##*/} does not support the reload action:" /bin/false exit 3 ;; *) echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}" exit 2 ;; esac

    添加执行权限

    chmod +x /etc/init.d/zabbix-agent

    2.4 部署Zabbix-web

    2.4.1 部署httpd

    安装httpd

    yum -y install httpd

    修改httpd配置文件

    vim /etc/httpd/conf/httpd.conf 317 <Directory "/var/www/html"> 318 319 # 320 # Possible values for the Options directive are "None", "All", 321 # or any combination of: 322 # Indexes Includes FollowSymLinks SymLinksifOwnerMatch ExecCGI MultiViews 323 # 324 # Note that "MultiViews" must be named *explicitly* --- "Options All" 325 # doesn't give it to you. 326 # 327 # The Options directive is both complicated and important. Please see 328 # http://httpd.apache.org/docs/2.2/mod/core.html#options 329 # for more information. 330 # 331 Options Indexes FollowSymLinks 332 333 # 334 # AllowOverride controls what directives may be placed in .htaccess files. 335 # It can be "All", "None", or any combination of the keywords: 336 # Options FileInfo AuthConfig Limit 337 # 338 AllowOverride None 339 340 # 341 # Controls who can get stuff from this server. 342 # 343 Order allow,deny 344 Allow from all 345 <IfModule mod_php5.c> 346 php_value max_execution_time 300 347 php_value memory_limit 128M 348 php_value post_max_size 16M 349 php_value upload_max_filesize 2M 350 php_value max_input_time 300 351 php_value max_input_vars 10000 352 php_value always_populate_raw_post_data -1 353 php_value date.timezone Asia/Shanghai 354 </IfModule> 355 356 </Directory>

    拷贝zabbix-web的php文件到httpd的指定目录

    mkdir /var/www/html/zabbix cp -a /chenhaosoft/zabbix-4.2.8/frontends/php/* /var/www/html/zabbix/

    2.4.2 安装php5.6

    yum -y install php yum install -y php php-bcmath php-mbstring php-xmlwriter php-xmlreader php-mcrypt php-cli php-gd php-curl php-mysql php-ldap php-zip php-fileinfo

    2.5 zabbix启动

    2.5.1 启动zabbix-server

    启动

    service zabbix-server start

    开机自启

    chkconfig --add zabbix-server chkconfig zabbix-server on

    2.5.2 启动zabbix-agent

    启动

    service zabbix-agent start

    开机自启

    chkconfig --add zabbix-agent chkconfig zabbix-agent on

    2.5.3 启动zabbix-web(httpd)

    启动

    service httpd start

    开机自启

    chkconfig httpd on

    2.6 zabbix访问

    浏览器访问https://chenhao01/zabbix

    上传文件zabbix.conf.php到/var/www/html/zabbix/conf/

    3、Zabbix安装之agent节点

    3.1 创建用户

    #程序使用的系统用户 sudo groupadd --system zabbix sudo useradd --system -g zabbix -d /usr/lib/zabbix -s /sbin/nologin -c "Zabbix Monitoring System" zabbix

    3.2 编译环境准备

    sudo yum -y install gcc-c++ pcre-devel yum install lrzsz -y

    3.3 编译安装

    上传zabbix安装包,解压,执行编译命令

    ./configure --enable-agent make install

    修改zabbix-agent配置文件

    vim /usr/local/etc/zabbix_agentd.conf Server=hadoop102 #ServerActive=127.0.0.1 #Hostname=Zabbix server

    3.4 拷贝系统服务文件

    scp zabbix-agent root@chenhao02:/etc/init.d/

    3.6 启动zabbix-agent

    启动

    service zabbix-agent start

    开机自启

    chkconfig --add zabbix-agent chkconfig zabbix-agent on
    Processed: 0.013, SQL: 9