hadoop安装按教程就行.
hive安装:
1:下载软件包,地址:http://mirrors.hust.edu.cn/apache/ 版本:apache-hive-2.1.1-bin.tar.gz
2:tar -zxvf apache-hive-2.1.1-bin.tar.gz -C /usr/local/hive
3:vim /etc/profile.d/java.sh 加入;soure一下
export JAVA_HOME=/usr/local/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
4:在目录$HIVE_HOME/conf/下,执行命令 cp hive-log4j2.properties.template hive-log4j2.properties拷贝一份重命名 修改property.hive.log.dir = /usr/local/hive/logs/
5:Hadoop集群要先启动,执行schematool -dbType derby -initSchema进行初始化。
6:安装mysql5.6
7:MySQL的驱动包放置到$HIVE_HOME/lib目录下 本机用的安装包mysql-connector-Java-5.1.31-bin.jar
8:在目录$HIVE_HOME/conf/下,执行命令cp hive-default.xml.template hive-site.xml
9:>hive-site.xml 加入
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="http://ask.apelearn.com/configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.0.70:9000/user/hive/warehouse</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.0.70:9083</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.0.70:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
10:cp hive-env.sh.template hive-env.sh
HADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=/usr/local/hive/conf
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
11:
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -mkdir -p /tmp/hive/
hadoop fs -chmod 777 /user/hive/warehouse
hadoop fs -chmod 777 /tmp/hive
同步hive和hadoop的jline版本
cp /usr/local/hive/lib/jline-2.12.jar /usr/local/hadoop/share/hadoop/yarn/lib
12:mysql -uroot -p123456
grant all privileges on *.* to root@'%' identified by '123456';
flush privileges;
13:schematool -dbType mysql -initSchema 初始化
14:nohup /usr/local/hive/bin/hive --service metastore &> metastore.log & 启动hive
输入hive,连接成功。
15:进入mysql,可以看到hive库
16:测试,hive
(1)创建数据库
create database db_hive_test;
(2)创建测试表
use db_hive_test;
create table student(id int,name string) row format delimited fields terminated by '\t';
(3)
新建student.txt 文件写入数据(id,name 按tab键分隔)
vi student.txt
1001 zhangsan
1002 lisi
1003 wangwu
1004 zhaoli
(4)load data local inpath '/home/hadoop/student.txt' into table db_hive_test.student
(5)select * from student;
(6)desc formatted student;
(7)通过UI来看:http://192.168.0.70:50070/explorer.html#/user/hive/warehouse/db_hive_test.db
(8)通过Mysql查看创建的表
use hive;
select * from TBLS; 可以查看到新建的表student
spark安装
1:安装Scala
下载地址:http://www.scala-lang.org/download/2.11.7.html (scala-2.11.7.tgz)
tar zxvf /usr/local/src/scala-2.11.7.tgz -C /usr/local/scala
2: vim /etc/profile.d/java.sh
增加:export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
3:source /etc/profile
4:scala -version
5:下载spark 地址:http://mirrors.hust.edu.cn/apache/spark/
6:tar zxvf /usr/local/src/spark-2.0.1-bin-hadoop2.7.tgz -C /usr/local/spark
7:vim /etc/profile.d/java.sh#追加如下内容
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
8:source /etc/profile.d/java.sh
9:spark-shell --version
10:run-example org.apache.spark.examples.SparkPi 10
11:cd /usr/local/spark/conf/
cp spark-env.sh.template spark-env.sh
vi spark-env.sh# 追加如下内容
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk1.7.0_79
export SPARK_MASTER_IP=192.168.0.70
export SPARK_WORKER_MEMORY=1024m
12:$SPARK_HOME/sbin/start-all.sh
13:提交任务到Spark集群
spark-submit --master spark://192.168.0.70:7077 --class org.apache.spark.examples.SparkPi --name Spark-Pi
/usr/local/spark/examples/jars/spark-examples_2.11-2.0.1.jar
15:与Hadoop结合使用,分别开启Hadoop集群和Spark集群。
$HDOOP_HOME/sbin/start-all.sh
$SPARK_HOME/sbin/start-all.sh
16:在Yarn中运行Spark任务,编辑spark-env.sh:
vim /usr/local/spark/conf/spark-env.sh
#追加如下内容
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
17:提交Spark任务到yarn中
spark-submit --master yarn-cluster --class org.apache.spark.examples.SparkLR --name SparkLR /usr/local/spark/examples/jars/spark-examples_2.11-2.0.1.jar
19:结合HDFS,Spark的输入是HDFS的文件
spark-submit --master yarn-cluster --class org.apache.spark.examples.JavaWordCount --name JavaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.0.1.jar hdfs://master:9000/tmp/
编辑回复