开源云计算技术系列(四)(Cloudera安装配置)

节省篇幅,直入正题。

首先用虚拟机virtualbox 配置一台debian 5.0.

debian在开源linux里面始终是最为纯正的linux血统,使用起来方便,运行起来高效,重新审视一下最新的5.0,别有一番似是故人来的感觉。

只需要下载debian-501-i386-CD-1.iso进行安装,剩下的基于debian强大的网络功能,可以很方便的进行软件包的配置。具体过程这里略去,可以在www.debian.org里面找到所有你需要的信息。

下面我们来体验一下稳定版0.183的方便和简洁。

step1.配置 Cloudera Repository

创建一个新的配置文件 vi /etc/apt/sources.list.d/cloudera.list

more /etc/apt/sources.list.d/cloudera.list
deb http://archive.cloudera.com/debian lenny contrib
deb-src http://archive.cloudera.com/debian lenny contrib

增加 Adding the Cloudera Key

debian:~# curl -s http://archive.cloudera.com/debian/archive.key | apt-key add -
OK

更新 APT Index

debian:~# apt-get update
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny Release.gpg
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny/main Translation-en_US
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny Release 
Ign cdrom://[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10] lenny/main Packages/DiffIndex
Get:1 http://archive.cloudera.com lenny Release.gpg [197B]                                            
Get:2 http://volatile.debian.org lenny/volatile Release.gpg [189B]                                    
Ign http://volatile.debian.org lenny/volatile/main Translation-en_US                                  
Hit http://ftp.us.debian.org lenny Release.gpg                                                        
Ign http://archive.cloudera.com lenny/contrib Translation-en_US                           
Hit http://security.debian.org lenny/updates Release.gpg                                  
Ign http://security.debian.org lenny/updates/main Translation-en_US 
Get:3 http://volatile.debian.org lenny/volatile Release [40.7kB]    
Ign http://ftp.us.debian.org lenny/main Translation-en_US                                       
Hit http://security.debian.org lenny/updates Release                                            
Get:4 http://archive.cloudera.com lenny Release [2391B]                                        
Hit http://ftp.us.debian.org lenny Release                                                      
Ign http://security.debian.org lenny/updates/main Packages/DiffIndex                           
Ign http://archive.cloudera.com lenny/contrib Packages                     
Ign http://security.debian.org lenny/updates/main Sources/DiffIndex        
Ign http://ftp.us.debian.org lenny/main Packages/DiffIndex                 
Ign http://ftp.us.debian.org lenny/main Sources/DiffIndex                                  
Hit http://security.debian.org lenny/updates/main Packages          
Hit http://ftp.us.debian.org lenny/main Packages                    
Ign http://archive.cloudera.com lenny/contrib Sources               
Ign http://volatile.debian.org lenny/volatile/main Packages/DiffIndex
Hit http://security.debian.org lenny/updates/main Sources           
Ign http://volatile.debian.org lenny/volatile/main Sources/DiffIndex
Hit http://ftp.us.debian.org lenny/main Sources                     
Get:5 http://archive.cloudera.com lenny/contrib Packages [4480B]
Get:6 http://volatile.debian.org lenny/volatile/main Packages [7471B]
Get:7 http://volatile.debian.org lenny/volatile/main Sources [2350B]     
Get:8 http://archive.cloudera.com lenny/contrib Sources [1431B]
Fetched 59.2kB in 4s (12.5kB/s)
Reading package lists... Done
debian:~#

查看 Cloudera packages

debian:~# apt-cache search hadoop
hadoop - A software platform for processing vast amounts of data
hadoop-conf-pseudo - Pseudo-distributed Hadoop configuration
hadoop-datanode - Data Node for Hadoop
hadoop-doc - Documentation for Hadoop
hadoop-jobtracker - Job Tracker for Hadoop
hadoop-namenode - Name Node for Hadoop
hadoop-native - Native libraries for Hadoop (e.g., compression)
hadoop-pipes - Interface to author Hadoop MapReduce jobs in C++
hadoop-secondarynamenode - Secondary Name Node for Hadoop
hadoop-tasktracker - Task Tracker for Hadoop
hive - A data warehouse infrastructure built on top of Hadoop
libhdfs0 - JNI Bindings to access Hadoop HDFS from C
pig - A platform for analyzing large data sets using Hadoop
debian:~#

ok,准备工作到此,下面开始正式安装,还是非常方便的。

我们选择安装Hadoop (Pseudo-Distributed Mode)的模式。能完整体验hadoop的功能。

昨天我们体验了hadoop-conf-pseudo 0.18.3-0cloudera0.3.0~intrepid,今天放出了基于最新版hadoop 0.20的cloudera软件试用包,既然如此,那就趁机尝一把鲜吧,这就是开源软件的速度,每天都有新感觉。

需要java6。

配置

debian:~/codeblue2/client/examples# more /etc/apt/sources.list
#
# deb cdrom:[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10]/ lenny main

deb cdrom:[Debian GNU/Linux 5.0.1 _Lenny_ - Official i386 CD Binary-1 20090413-00:10]/ lenny main

deb http://ftp.us.debian.org/debian/ lenny main contrib non-free
deb-src http://ftp.us.debian.org/debian/ lenny main contrib non-free

deb http://security.debian.org/ lenny/updates main contrib non-free
deb-src http://security.debian.org/ lenny/updates main contrib non-free

deb http://volatile.debian.org/debian-volatile lenny/volatile main contrib non-free
deb-src http://volatile.debian.org/debian-volatile lenny/volatile main contrib non-free

然后apt-get update一把。

debian:~# apt-get install sun-java6-jre

很傻瓜化的就安装好了,这里就略去输出了。

在体验0.20之前,在把0.18.3 的安装说一下,毕竟是稳定版本。

apt-get -y install hadoop-conf-pseudo
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  hadoop hadoop-native liblzo2-2
The following NEW packages will be installed:
  hadoop hadoop-conf-pseudo hadoop-native liblzo2-2
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.0MB/12.1MB of archives.
After this operation, 21.5MB of additional disk space will be used.
Get:1 http://archive.cloudera.com lenny/contrib hadoop 0.18.3-4cloudera0.3.0~lenny [11.9MB]
Get:2 http://archive.cloudera.com lenny/contrib hadoop-conf-pseudo 0.18.3-4cloudera0.3.0~lenny [93.1kB]
Get:3 http://archive.cloudera.com lenny/contrib hadoop-native 0.18.3-4cloudera0.3.0~lenny [92.7kB]    
Fetched 4336kB in 23s (184kB/s)                                                                       
Selecting previously deselected package liblzo2-2.
(Reading database ... 103556 files and directories currently installed.)
Unpacking liblzo2-2 (from .../lzo2/liblzo2-2_2.03-1_i386.deb) ...
Selecting previously deselected package hadoop.
Unpacking hadoop (from .../hadoop_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
Selecting previously deselected package hadoop-conf-pseudo.
Unpacking hadoop-conf-pseudo (from .../hadoop-conf-pseudo_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
Selecting previously deselected package hadoop-native.
Unpacking hadoop-native (from .../hadoop-native_0.18.3-4cloudera0.3.0~lenny_i386.deb) ...
Processing triggers for man-db ...
Setting up liblzo2-2 (2.03-1) ...
Setting up hadoop (0.18.3-4cloudera0.3.0~lenny) ...
Setting up hadoop-conf-pseudo (0.18.3-4cloudera0.3.0~lenny) ...
Setting up hadoop-native (0.18.3-4cloudera0.3.0~lenny) ...

查看一下安装到哪里了。

debian:~# dpkg -L hadoop-conf-pseudo
/.
/etc
/etc/hadoop
/etc/hadoop/conf.pseudo
/etc/hadoop/conf.pseudo/hadoop-default.xml
/etc/hadoop/conf.pseudo/configuration.xsl
/etc/hadoop/conf.pseudo/log4j.properties
/etc/hadoop/conf.pseudo/slaves
/etc/hadoop/conf.pseudo/sslinfo.xml.example
/etc/hadoop/conf.pseudo/hadoop-env.sh
/etc/hadoop/conf.pseudo/masters
/etc/hadoop/conf.pseudo/hadoop-metrics.properties
/etc/hadoop/conf.pseudo/commons-logging.properties
/etc/hadoop/conf.pseudo/hadoop-site.xml
/usr
/usr/share
/usr/share/doc
/usr/share/doc/hadoop-conf-pseudo
/usr/share/doc/hadoop-conf-pseudo/copyright
/usr/share/doc/hadoop-conf-pseudo/changelog.Debian.gz
/usr/share/doc/hadoop-conf-pseudo/changelog.gz
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/hadoop-conf-pseudo

debian:~# ls -l /var/lib/hadoop/cache/hadoop/dfs/name
total 8
drwxr-xr-x 2 hadoop hadoop 4096 2009-06-24 02:58 current
drwxr-xr-x 2 hadoop hadoop 4096 2009-06-24 02:58 image

启动hadoop的服务:

debian:~# /etc/init.d/hadoop-namenode start
Starting Hadoop namenode daemon: starting namenode, logging to /var/log/hadoop/hadoop-hadoop-namenode-debian.out
hadoop-namenode.

/etc/init.d/hadoop-datanode start
Starting Hadoop datanode daemon: starting datanode, logging to /var/log/hadoop/hadoop-hadoop-datanode-debian.out
hadoop-datanode.
debian:~# /etc/init.d/hadoop-jobtracker start
Starting Hadoop jobtracker daemon: starting jobtracker, logging to /var/log/hadoop/hadoop-hadoop-jobtracker-debian.out

hadoop-jobtracker.

查看一下进程是否正常

hadoop    7926     1  0 03:01 ?        00:00:12 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8007     1  1 03:02 ?        00:00:14 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8053     1  0 03:02 ?        00:00:13 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dcom.sun.man
hadoop    8108     1  0 03:02 ?        00:00:11 /usr/lib/jvm/java-6-sun//bin/java -Xmx100m -Dhadoop.log

hive和pig的安装也就一条命令搞定,方便实惠。

apt-get install hive

apt-get insall pig

ok,我们autoremove掉0.183,体验最新的0.20

debian:~# apt-get autoremove hadoop-conf-pseudo

debian:~# wget http://archive.cloudera.com/hadoop-summit-09/hadoop-20-debs/deb_lenny_i386/hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb

debian:~# dpkg -i hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb
Selecting previously deselected package hadoop-0.20.
(Reading database ... 103589 files and directories currently installed.)
Unpacking hadoop-0.20 (from hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb) ...
Setting up hadoop-0.20 (0.20.0-1cloudera0.5.0~lenny) ...
Processing triggers for man-db ...

关于0.20的新进展,关注中。

时间: 2015-03-12
Tags: http, name, java, nbsp

开源云计算技术系列(四)(Cloudera安装配置)的相关文章

开源云计算技术系列(五)(崛起的黑马Sector/Sphere 实战篇)

在基于java的hadoop如日中天的时代,开源云计算界有一匹基于C++的黑马,Sector/Sphere在性能方面对hadoop提出了挑战,在Open Cloud Consortium(OCC)开放云计算协会建立的Open Cloud Testbed开放云实验床的软件测试中, Sector is about twice as fast as Hadoop. 本篇先对这匹黑马做一次实战演习,先感受一下,下一篇深入其设计原理,探讨云计算的本质. OCT是一套跨核心10G带宽教育网的多个数据中心的计

开源云计算技术系列(四)(Cloudera安装配置hadoop 0.20最新版配置)

接上文,我们继续体验Cloudera 0.20最新版. wget hadoop-0.20-conf-pseudo_0.20.0-1cloudera0.5.0~lenny_all.deb wget hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb debian:~# dpkg –i hadoop-0.20-conf-pseudo_0.20.0-1cloudera0.5.0~lenny_all.deb dpkg –i hadoop-0.20_0.20.0

开源云计算技术系列三(10gen)安装配置

10gen 是一套云计算平台,可以为web应用提供可以扩展的高性能的数据存储解决方案.10gen的开源项目是mongoDB,主要功能是解决website的操作性数据存储,session对象的存储,数据缓存,高效率的实时计数(比如统计pv,uv),并支持ruby,python,java,c++,php等众多的页面语言. MongoDB主要特征是存储数据非常方便,不在是传统的object-relational mapping的模式,高性能,可以存储大对象数据,比如视频等,可以自动复制和failove

开源云计算技术系列(六)hypertable (HQL)

既然已经安装配置好hypertable,那趁热打铁体验一下HQL. 准备好实验数据 hadoop@hadoop:~$ gunzip access.tsv.gz hadoop@hadoop:~$ mv access.tsv ~/hypertable/0.9.2.5/examples/hql_tutorial/ hadoop@hadoop:~$ cd ~/hypertable/0.9.2.5/examples/hql_tutorial/ hadoop@hadoop:~/hypertable/0.9.

开源云计算技术系列(四)(Cloudera体验篇)

Cloudera  的定位在于 Bringing Big Data to the Enterprise with Hadoop Cloudera为了让Hadoop的配置标准化,可以帮助企业安装,配置,运行hadoop以达到大规模企业数据的处理和分析. 既然是给企业使用,Cloudera的软件配置不是采用最新的hadoop 0.20,而是采用了Hadoop 0.18.3-12.cloudera.CH0_3的版本进行封装,并且集成了facebook提供的hive,yahoo提供的pig等基于hado

开源云计算技术系列(六)hypertable(hadoop hdfs)

选择virtualbox建立ubuntu server 904 的虚拟机作为基础环境. hadoop@hadoop:~$ sudo apt-get install g++ cmake libboost-dev liblog4cpp5-dev git-core cronolog libgoogle-perftools-dev libevent-dev zlib1g-dev libexpat1-dev libdb4.6++-dev libncurses-dev libreadline5-dev ha

源云计算技术系列(七)Cloudera (hadoop 0.20)

虚拟一套centos 5.3 os. 下载 jdk-6u16-linux-i586-rpm.bin [root@hadoop ~]# chmod +x jdk-6u16-linux-i586-rpm.bin [root@hadoop ~]# ./jdk-6u16-linux-i586-rpm.bin [root@hadoop ~]#  java -version java version "1.6.0" OpenJDK  Runtime Environment (build 1.6.0

美国陆军为分布式通用地面系统寻求云计算技术

[据军事航空电子网站2012年12月23日报道]http://www.aliyun.com/zixun/aggregation/39424.html">美国陆军研究人员将于2013年1月18日召开行业交流会,主要关于美国陆军情报和信息作战处(I2WD)使用和正在开发的云计算技术,以及该机构的技术不足和未来需求.此次交流会由I2WD战术云集成实验室(TCIL)主办,会议时间从上午9时30分至下午2时30分,地点在阿伯丁试验场6000号楼. 陆军研究人员将概述当前已部署技术的情况,评估未来技术

从OpenStack Newton发布看开源云计算

不可否认,在目前的云计算市场中,开源云计算是一个非常重要的组成部分,特别是OpenStack Newton版本的发布,将开源云计算提升到了一个新的高度.据悉,此次推出的新功能包括:Ironic裸机开通服务,Magnum容器编排集群管理器,此外,Kuryr容器组网项目可将容器.虚拟和物理基础设施无缝集成于统一控制面板. 这些新功能,为异构环境下的组织机构提供了更多用例,助其利用最新容器技术获得更快更好开发体验,满足负载对虚机及更高可用性架构的需要.除了能够提高在容器集群管理和组网方面的用户体验之外