Chapter 3. MRG Grid Benefits and Features
MRG Grid provides significant benefits and value for enterprises, including:
MRG Grid can process the largest computational workloads, from massively parallel High Performance Computing jobs to long-running High Throughput Computing jobs
MRG Grid can add on-demand computational power for handling peak loads through capabilities from cycle-stealing on Linux or Windows desktop computers and scheduling to remote grids
MRG Grid provides complete flexibility, from running high-burst to lengthy computations, in a centralized or distributed grid, and running jobs on various platforms including Linux and Windows. Furthermore, MRG Grid can schedule virtualized environments and workloads for the upmost flexibility in utilizing infrastructure.
Managing MRG Grid is simplified by leveraging the Red Hat Enterprise MRG unified, browser-based management console. The Red Hat Enterprise MRG integrated management tools enable administrators to manage, configure, provision, deploy, and monitor their grid deployments using the same tools they use for MRG Messaging and MRG Realtime.
MRG Grid provides a broad set of features across both High Throughput Computing and High Performance Computing, including:
Allows for submission of a virtual machine (VM) as a user job, supporting migration of the VM
Allows for dedicated resources (clusters) to be augmented with otherwise undedicated (desktops) using flexible policies
Web Service interface provides job submission and management functionality; CLI provides a highly scriptable, with consistent output, interface to all functionality
Authentication using multiple mechanisms
Privacy provided by network encryption
Integrity of network traffic
Authorization through flexible configuration policies
A mechanism known as flocking allows independent pools to use each others' resources, controllable by customizable policies
Powerful browser-based management tools for managing daemons and machines, security, compute jobs, scalability settings, priorities, and more. Also provides sophisticated monitoring capabilities.
The ability to specify job dependencies, via DAGMan, allows for construction and execution of complex workflows
The ability to schedule data placement, via Stork, assists in creation of workflows that intelligently handling data
User and group resource utilization is tracked and accessible to adminstrators
A flexible language for policy and meta-data description
Flexible, customizable policies specified by jobs and resources via ClassAds
The Negotiator and Collector, via HAD, and the Schedd, via Schedd Fail-over, can have their state replicated to allow for graceful fail-over upon service disruption
Through a multi-protocol storage management system, called NeST, the ability to manage (allocate, free, reserve, etc) disk space is exposed to a user's jobs
All data about jobs and resources can be stored in a database via Quill
The ability for a node or set of nodes to be claimed by a user in such a way that others may use the claimed nodes until the user needs them
Through a technology known as Glide-ins, nodes can be dynamically added to a pool to service user jobs
Priority scheduling is performed at the granularity of a user
Fair-share scheduling can be performed on groups of users
Priority management is controllable by adminstrators
Allows for execution across administrative domains
Enhance security by using a restructed pool of users to run jobs on execute machines
Only a single, specialized, audited component requires root/administrator permissions on execute nodes
Provides an extensible framework for running parallel (including MPI) jobs
Co-allocation of compute nodes is done automatically
Framework implementation for MPICH1, MPICH2, and LAM provided
Explicit support of jobs written in Java
Allows a job or multiple jobs to be started at specific times, with customizable policy for failures such as missed deadlines
Allows otherwise unused nodes to run jobs provided by BOINC
Support for automatic file staging, e.g. job input, and online file io (i.e. file streaming from submit to execute nodes) via Chirp and remote syscalls, in the absense of a shared filesystem
A C++ framework allowing a single master process to allocate and manage multiple worker processes, which process data based on master specified policies
Allows for jobs in one queue to be moved to another queue
Allows for automated monitoring of one or more pools