Hive, 大数据

hive-10 hive3.1.3源码编译适配spark3.3.3

一、环境准备

1)虚拟机准备

准备一台虚拟机,并安装Centos7系统(带桌面)

2)安装JDK

(1)卸载现有JDK

sudo rpm -qa | grep -i java | xargs -n1 sudo rpm -e --nodeps

(2)将JDK上传到虚拟机的/opt/software 文件夹下面

(3)解压JDK到/opt/module 目录下

tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/

(4)配置JDK环境变量

1、新建/etc/profile.d/my_env.sh

sudo vim /etc/profile.d/my_env.sh

添加如下内容,然后保存(:wq)退出

#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

2、让环境变量生效

source /etc/profile.d/my_env.sh

(5)测试JDK是否安装成功

java -version

3)安装Maven

(1)将maven安装包上传到虚拟机/opt/software 目录

(2)解压Maven到/opt/module 目录下

tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module/

(3)配置Maven环境变量

1、编辑/etc/profile.d/my_env.sh文件

sudo vim /etc/profile.d/my_env.sh

追加以下内容

# MAVEN_HOME
export MAVEN_HOME=/opt/module/apache-maven-3.6.3
export PATH=$PATH:$MAVEN_HOME/bin

2、让环境变量生效

source /etc/profile.d/my_env.sh

(4)检测Maven是否安装成功

mvn -version

(5)配置仓库镜像

1、修改Maven配置文件

/opt/module/apache-maven-3.6.3/conf/settings.xml

2、在节点中添加以下内容

    <mirror>
      <id>alimaven</id>
      <name>aliyun maven</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>


    <mirror>
        <id>aliyunmaven</id>
        <mirrorOf>*</mirrorOf>
        <name>spring-plugin</name>
        <url>https://maven.aliyun.com/repository/spring-plugin</url>
    </mirror>

    <mirror>
        <id>aliyunmaven</id>
        <mirrorOf>*</mirrorOf>
        <name>阿里云公共仓库</name>
        <url>https://maven.aliyun.com/repository/public</url>
    </mirror>

    <mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>*,!cloudera</mirrorOf>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
    </mirror>

4)安装Git

(1)安装第三方仓库

sudo yum install https://repo.ius.io/ius-release-el7.rpm 

(2)安装git

sudo yum install -y git236

5)安装IDEA

(1)将IDEA安装包上传到虚拟机/opt/software 目录

(2)解压IDEA到/opt/module目录下

tar -zxvf ideaIU-2021.1.3.tar.gz -C /opt/module/

(3)启动IDEA(图像化界面启动)

nohup /opt/module/idea-IU-211.7628.21/bin/idea.sh 1>/dev/null 2>&1 &

(4)配置Maven

二、修改并编译Hive源码

1)在IDEA中新建项目拉取Hive源码

Hive 源码的远程仓库地址:

https://github.com/apache/hive.git

国内镜像地址:

https://gitee.com/apache/hive.git

2)测试编译环境

在修改依赖和代码之前,先测试一下打包是否能成功,用来检测环境是否正常。

(1)打开终端

(2)输入打包命令

mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

注:打包命令参考官网

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStartedCompileHiveonmaster

打包成功的标志如下:

3)修改项目pom.xml

1、修改Hadoop版本

Hive3.1.3支持的Hadoop版本是3.1.10,但是Hive与Hadoop之间记得有个范围支持,故与Hadoop相关的操作看需求是否更改

<hadoop.version>3.1.0</hadoop.version>
# 变更为
<hadoop.version>3.1.3</hadoop.version>

清楚的记得Hadoop3.1.3使用日志版本是1.7.25

<slf4j.version>1.7.10</slf4j.version>
# 变更为
<slf4j.version>1.7.25</slf4j.version>

2、修改guava版本

由于Hive运行时会加载Hadoop依赖,因此需要修改Hive中guava版本为Hadoop中的guava版本。这里即使不更改,实则在使用Hive时也可能会进行更换guava版本操作(版本差异不大可以不用更换)

<guava.version>19.0</guava.version>
# 变更为
<guava.version>27.0-jre</guava.version>

3、修改spark版本

<spark.version>2.3.0</spark.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.version>2.11.8</scala.version>
# 变更为
<spark.version>3.3.3</spark.version>
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.17</scala.version>

4)修改hive源码

修改hive源码这个操作是核心操作,具体修改哪些源代码,参考:https://github.com/gitlbo/hive/commits/3.1.2

(1)修改standalone-metastore模块

具体修改参考:https://github.com/gitlbo/hive/commit/c073e71ef43699b7aa68cad7c69a2e8f487089fd

创建ColumnsStatsUtils类

代码如下:

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.hadoop.hive.metastore.columnstats;

import org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj;
import org.apache.hadoop.hive.metastore.columnstats.cache.DateColumnStatsDataInspector;
import org.apache.hadoop.hive.metastore.columnstats.cache.DecimalColumnStatsDataInspector;
import org.apache.hadoop.hive.metastore.columnstats.cache.DoubleColumnStatsDataInspector;
import org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector;
import org.apache.hadoop.hive.metastore.columnstats.cache.StringColumnStatsDataInspector;

/**
 * Utils class for columnstats package.
 */
public final class ColumnsStatsUtils {

    private ColumnsStatsUtils(){}

    /**
     * Convertes to DateColumnStatsDataInspector if it's a DateColumnStatsData.
     * @param cso ColumnStatisticsObj
     * @return DateColumnStatsDataInspector
     */
    public static DateColumnStatsDataInspector dateInspectorFromStats(ColumnStatisticsObj cso) {
        DateColumnStatsDataInspector dateColumnStats;
        if (cso.getStatsData().getDateStats() instanceof DateColumnStatsDataInspector) {
            dateColumnStats =
                    (DateColumnStatsDataInspector)(cso.getStatsData().getDateStats());
        } else {
            dateColumnStats = new DateColumnStatsDataInspector(cso.getStatsData().getDateStats());
        }
        return dateColumnStats;
    }

    /**
     * Convertes to StringColumnStatsDataInspector
     * if it's a StringColumnStatsData.
     * @param cso ColumnStatisticsObj
     * @return StringColumnStatsDataInspector
     */
    public static StringColumnStatsDataInspector stringInspectorFromStats(ColumnStatisticsObj cso) {
        StringColumnStatsDataInspector columnStats;
        if (cso.getStatsData().getStringStats() instanceof StringColumnStatsDataInspector) {
            columnStats =
                    (StringColumnStatsDataInspector)(cso.getStatsData().getStringStats());
        } else {
            columnStats = new StringColumnStatsDataInspector(cso.getStatsData().getStringStats());
        }
        return columnStats;
    }

    /**
     * Convertes to LongColumnStatsDataInspector if it's a LongColumnStatsData.
     * @param cso ColumnStatisticsObj
     * @return LongColumnStatsDataInspector
     */
    public static LongColumnStatsDataInspector longInspectorFromStats(ColumnStatisticsObj cso) {
        LongColumnStatsDataInspector columnStats;
        if (cso.getStatsData().getLongStats() instanceof LongColumnStatsDataInspector) {
            columnStats =
                    (LongColumnStatsDataInspector)(cso.getStatsData().getLongStats());
        } else {
            columnStats = new LongColumnStatsDataInspector(cso.getStatsData().getLongStats());
        }
        return columnStats;
    }

    /**
     * Convertes to DoubleColumnStatsDataInspector
     * if it's a DoubleColumnStatsData.
     * @param cso ColumnStatisticsObj
     * @return DoubleColumnStatsDataInspector
     */
    public static DoubleColumnStatsDataInspector doubleInspectorFromStats(ColumnStatisticsObj cso) {
        DoubleColumnStatsDataInspector columnStats;
        if (cso.getStatsData().getDoubleStats() instanceof DoubleColumnStatsDataInspector) {
            columnStats =
                    (DoubleColumnStatsDataInspector)(cso.getStatsData().getDoubleStats());
        } else {
            columnStats = new DoubleColumnStatsDataInspector(cso.getStatsData().getDoubleStats());
        }
        return columnStats;
    }

    /**
     * Convertes to DecimalColumnStatsDataInspector
     * if it's a DecimalColumnStatsData.
     * @param cso ColumnStatisticsObj
     * @return DecimalColumnStatsDataInspector
     */
    public static DecimalColumnStatsDataInspector decimalInspectorFromStats(ColumnStatisticsObj cso) {
        DecimalColumnStatsDataInspector columnStats;
        if (cso.getStatsData().getDecimalStats() instanceof DecimalColumnStatsDataInspector) {
            columnStats =
                    (DecimalColumnStatsDataInspector)(cso.getStatsData().getDecimalStats());
        } else {
            columnStats = new DecimalColumnStatsDataInspector(cso.getStatsData().getDecimalStats());
        }
        return columnStats;
    }
}

接着修改以下内容,具体修改参考以下截图说明

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DateColumnStatsAggregator.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DecimalColumnStatsAggregator.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/DoubleColumnStatsAggregator.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/LongColumnStatsAggregator.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/DateColumnStatsDataInspector.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/DecimalColumnStatsDataInspector.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/DoubleColumnStatsDataInspector.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/LongColumnStatsDataInspector.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/cache/StringColumnStatsDataInspector.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DateColumnStatsMerger.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DecimalColumnStatsMerger.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DoubleColumnStatsMerger.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/LongColumnStatsMerger.java

standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/StringColumnStatsMerger.java

(2)修改ql模块

ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUtils.java

ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java

ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java

(3)修改spark-client模块

spark-client/src/main/java/org/apache/hive/spark/client/metrics/ShuffleWriteMetrics.java

spark-client/src/main/java/org/apache/hive/spark/counter/SparkCounter.java

(4)修改druid-handler模块

druid-handler/src/java/org/apache/hadoop/hive/druid/serde/DruidScanQueryRecordReader.java

(5)修改llap-server模块

llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/AMReporter.java

llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/LlapTaskReporter.java

llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java

(6)修改llap-tez模块

llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java

(7)修改llap-common模块

llap-common/src/java/org/apache/hadoop/hive/llap/AsyncPbRpcProxy.java

三、编译打包

对Hive源码修改完成后,执行编译打包命令:

mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true

mvn clean package -Pdist -DskipTests

在执行编译打包命令过程中,肯定会有各种问题的,这些问题是需要解决的,期间遇到的各种异常请参考下方异常集合对比解决。
编译后的文件存储路径

注意点:

1、有时本地仓库中的缓存可能会引起依赖项解析错误。可以尝试清理该项目依赖的本地仓库中的maven包,这个命令会清理pom.xml中的包,并重新下载,执行以下命令:

mvn dependency:purge-local-repository

2、修改Pom.xml文件版本号,或更改代码、安装Jar到本地仓库后,建议关闭IDEA重新打开进入,防止缓存、或者更新不及时

四、异常集合

异常1

1、maven会提示无法找到、无法下载某个Jar包、或者下载Jar耗时长(即使开启魔法也是)

例如:maven仓库找不到hive-upgrade-acid-3.1.3.jar与pentaho-aggdesigner-algorithm-5.1.5-jhyde_2.jar

具体异常如下,仅供参考:

[ERROR] Failed to execute goal on project hive-upgrade-acid: Could not resolve dependencies for project org.apache.hive:hive-upgrade-acid:jar:3.1.3: Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in https://maven.aliyun.com/repository/central was cached in the local repository, resolution will not be reattempted until the update interval of aliyun-central has elapsed or updates are forced -> [Help 1]

解决方案:

到以下仓库搜索需要的Jar包,手动下载,并安装到本地仓库

仓库地址1:https://mvnrepository.com/

仓库地址2:https://central.sonatype.com/

仓库地址3:https://developer.aliyun.com/mvn/search

可以将仓库地址添加到maven的mirrors

或者将一个JAR安装到本地仓库,示例命令的语法:

mvn install:install-file -Dfile=<path-to-jar> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=<packaging>

<path-to-jar>:JAR文件的路径,可以是本地文件系统的绝对路径。
<group-id>:项目组ID,通常采用反向域名格式,例如com.example。
<artifact-id>:项目的唯一标识符,通常是项目名称。
<version>:项目的版本号。
<packaging>:JAR文件的打包类型,例如jar。
mvn install:install-file -Dfile=./hive-upgrade-acid-3.1.3.jar -DgroupId=org.apache.hive -DartifactId=hive-upgrade-acid -Dversion=3.1.3 -Dpackaging=jar

mvn install:install-file -Dfile=./pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar -DgroupId=org.pentaho -DartifactId=pentaho-aggdesigner-algorithm -Dversion=5.1.5-jhyde -Dpackaging=jar

mvn install:install-file -Dfile=./hive-metastore-2.3.3.jar -DgroupId=org.apache.hive -DartifactId=hive-metastore -Dversion=2.3.3 -Dpackaging=jar

mvn install:install-file -Dfile=./hive-exec-3.1.3.jar -DgroupId=org.apache.hive -DartifactId=hive-exec -Dversion=3.1.3 -Dpackaging=jar

异常2

如果是windows系统编译会提示bash相关command问题

解决方案:

使用gitbash,即在git的bash窗口执行编译打包命令

异常3

当前进度在Hive Llap Server失败

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-llap-server: Compilation failure
[ERROR] /C:/Users/JackChen/Desktop/apache-hive-3.1.3-src/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java:[30,32]
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hive-llap-server

解决方案:

public class QueryTracker extends AbstractService {

    // private static final Marker QUERY_COMPLETE_MARKER = new Log4jMarker(new Log4jQueryCompleteMarker());
    
    private static final Marker QUERY_COMPLETE_MARKER = MarkerFactory.getMarker("MY_CUSTOM_MARKER");
  }

异常4

编译执行到Hive HCatalog Webhcat模块失败

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hive-webhcat: Compilation failure
[ERROR] /root/apache-hive-3.1.3-src/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java:[258,31] 对于FilterHolder(java.lang.Class<org.apache.hadoop.hdfs.web.AuthFilter>), 找不到合适的构造器
[ERROR]     构造器 org.eclipse.jetty.servlet.FilterHolder.FilterHolder(org.eclipse.jetty.servlet.BaseHolder.Source)不适用
[ERROR]       (参数不匹配; java.lang.Class<org.apache.hadoop.hdfs.web.AuthFilter>无法转换为org.eclipse.jetty.servlet.BaseHolder.Source)
[ERROR]     构造器 org.eclipse.jetty.servlet.FilterHolder.FilterHolder(java.lang.Class<? extends javax.servlet.Filter>)不适用
[ERROR]       (参数不匹配; java.lang.Class<org.apache.hadoop.hdfs.web.AuthFilter>无法转换为java.lang.Class<? extends javax.servlet.Filter>)
[ERROR]     构造器 org.eclipse.jetty.servlet.FilterHolder.FilterHolder(javax.servlet.Filter)不适用
[ERROR]       (参数不匹配; java.lang.Class<org.apache.hadoop.hdfs.web.AuthFilter>无法转换为javax.servlet.Filter)
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hive-webhcat

看源码发现AuthFilter是继承AuthenticationFilter,AuthenticationFilter又实现Filter,应该不会出现此异常信息才对,于是手动修改源码进行强制转换试试,发现任然不行。

  public FilterHolder makeAuthFilter() throws IOException {
    FilterHolder authFilter = new FilterHolder(AuthFilter.class);
    UserNameHandler.allowAnonymous(authFilter);

解决方案:

在IDEA中单独编译打包此模块,发现是能构建成功的

于是乎产生了一个想法:

1.因为项目使用Maven进行打包(执行mvn package),再次执行相同的命令将不会重新打包项目

2.所以先针对项目执行clean命令,然后对该Webhcat模块打包,最后在整体编译打包时,不执行clean操作,直接运行 mvn package -Pdist -DskipTests。

异常5

编译提示Failure to find org.apache.directory.client.ldap:ldap-client-api:pom:0.1-SNAPSHOT

在apacheds-server-integ 里排除掉此依赖。

    <dependency>
      <groupId>org.apache.directory.server</groupId>
      <artifactId>apacheds-server-integ</artifactId>
      <version>${apache-directory-server.version}</version>
      <scope>test</scope>
        <exclusions>
            <exclusion>
                    <groupId>org.apache.directory.client.ldap</groupId>
                    <artifactId>ldap-client-api</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

参考文章:

https://juejin.cn/post/7265967979338416185

https://blog.csdn.net/weixin_38906364/article/details/128687090

https://blog.csdn.net/qq_39035267/article/details/126608808

https://blog.csdn.net/weixin_52918377/article/details/117123969

About 蓝染君

喜爱编程开发的程序猿