We are excited to announce the contribution of our ApacheTM Hadoop® plug-in to the Gluster Community, the open software-defined storage community. Now, Gluster users can deploy the Apache Hadoop Plug-in from the Gluster Community and run MapReduce jobs on GlusterFS volumes, easily making the data available to other toolkits and programs. Conversely, data stored on general purpose filesystems is now available to Apache Hadoop operations without the need for brute force copying of data to the Hadoop Distributed File System (HDFS).

The Apache Hadoop Plug-in provides a new storage option for enterprise Hadoop deployments and delivers enterprise storage features while maintaining 100 percent Hadoop FileSystem API compatibility. The Apache Hadoop Plug-in delivers significant disaster recovery benefits, industry-leading data availability, and name node high availability with the ability to store data in POSIX compliant, general purpose filesystems.

The advantages of the Hadoop Plug-in in the Gluster Community include:

  • supporting data access through several different mechanisms/protocols – file access with NFS or SMB, object access with SWIFT and access via the Hadoop file system API;
  • eliminating the centralized metadata (name node) server;
  • compatibility with MapReduce and Hadoop-based applications;
  • eliminating code rewrites; and
  • providing a fault tolerant file system.

To download the Apache Hadoop Plug-in, users can go to https://forge.gluster.org/hadoop/. For the Apache Hadoop Ambari Project, users can visit the Apache Hadoop Community at http://hadoop.apache.org/.