GET HADOOP TECHNOLOGY STACK TO FIT IN YOUR ARCHITECTURE

image_printEmail Imageimage_printimage_print

Hadoop Core / Common Project

  • Distributed Storage : HDFS
  • Distributed Processing : MapReduce (MR1)
  • Distributed Scheduling : YARN (MR2) (its started in Hadoop v2)

How data can be accessed and processed  from Hadoop FrameWork without writing Map Reduce Job

  • PIG : http://pig.apache.org/
  • Hive : http://hive.apache.org/

How to Process Data Storage or DB in Hadoop

  • HBase : http://hbase.apache.org/
  • Cassandra : http://cassandra.apache.org/

Storage Management Services

  • HCatalog : http://incubator.apache.org/projects/hcatalog.html

RegEx and Search Tool

  • Lucene : http://lucene.apache.org/

Bulk Synchronous Parallel computing engine

  • Hama : http://hama.apache.org/

Managing MapReduce Pipelining

  • Crunch : http://crunch.apache.org/

Data Serialization to send data to another application in some format like JSON, XML

  • Avro : http://avro.apache.org/
  • Thrift : http://thrift.apache.org/

Data Intelligence

  • Drill : https://incubator.apache.org/drill/drill_overview.html
  • Mahout : http://mahout.apache.org/

Real Time Log Processing Tool

  • Flume : http://flume.apache.org/
  • Chukwa : http://chukwa.apache.org/

Data Integration to connect RDBMS to HDFS

  • Sqoop : http://sqoop.apache.org/

Distributed Service Coordinator

  • Zookeeper : http://zookeeper.apache.org/

Work Flow or Job Scheduler

  • Oozie : http://oozie.apache.org/

Centralized Service Management, monitoring and Orchestration

  • Ambari : http://ambari.apache.org/

 Centralized Security of Hadoop Project

  • Knox : http://knox.apache.org/

Eclipse IDE plugin for Development

  • HDT : http://hdt.incubator.apache.org/

Project that is 100x Times faster than MapReduce

  • Spark : http://spark.apache.org/

 To get the list of ALL apache Incubator project  : http://incubator.apache.org/projects/

One Response

  1. Himanshu Porwal November 6, 2014

Leave a Reply