What Is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
The project includes these subprojects:
- : The common utilities that support the other Hadoop subprojects.
- : A distributed file system that provides high-throughput access to application data.
- : A software framework for distributed processing of large data sets on compute clusters.
Other Hadoop-related projects at Apache include:
- : A data serialization system.
- : A scalable multi-master database with no single points of failure.
- : A data collection system for managing large distributed systems.
- : A scalable, distributed database that supports structured data storage for large tables.
- : A data warehouse infrastructure that provides data summarization and ad hoc querying.
- : A Scalable machine learning and data mining library.
- : A high-level data-flow language and execution framework for parallel computation.
- : A high-performance coordination service for distributed applications.
1.快速入门:安装hadoop环境。
- 运行平台:GNU/Linux
- 所需软件:JDK1.6,ssh,sshd
- 下载安装hadoop稳定发行版
- 三种模式启动hadoop:单机、伪分布式、分布式。单机方便调试。
2.查看exmples代码,运行示例
- 通过vnc viewer运行eclipse,新建hadoop-examples工程,src位于{HADOOP_HOME}/src/examples,lib位于{HADOOP_HOME}/lib