Product SiteDocumentation Site

Chapter 13. Low-latency scheduling

Low-latency scheduling allows jobs to begin execution immediately, without going through the standard scheduling process. This decreases the amount of time before a job can begin execution, but bypasses the scheduling process. This can increase the possibility that a job will not be able to execute on the first node that tries to run it. This type of scheduling is performed using the MRG Messaging component of Red Hat Enterprise MRG, instead of the Condor daemons. The machines in the pool capable of executing jobs - execute nodes - communicate directly with a MRG Messaging broker. The advantage of this is that any machine capable of sending messages to the broker can submit jobs to the pool.

Note

For more information on MRG Messaging, the broker, and the AMQP protocol, see the MRG Messaging User Guide
Installing the condor-low-latency packages
  1. Important

    You will require the MRG Messaging broker from the Red Hat Network in order to use low-latency scheduling. For instructions on downloading and configuring the MRG Messaging packages, see the MRG Messaging Installation Guide.
    You will require the following packages, in addition to the MRG Messaging components:
    • condor-low-latency
    • condor-job-hooks
    • condor-job-hooks-common
    Use yum to install these components:
    # yum install condor-low-latency
    
    # yum install condor-job-hooks
    
    # yum install condor-job-hooks-common
    
  2. Configure MRG Grid to use the new job hooks by opening the condor_config file in your preferred text editor and adding the following lines:
    # Startd hooks
    LOW_LATENCY_HOOK_FETCH_WORK = $(LIBEXEC)/hooks/hook_fetch_work.py
    LOW_LATENCY_HOOK_REPLY_FETCH = $(LIBEXEC)/hooks/hook_reply_fetch.py
    
    # Starter hooks
    LOW_LATENCY_JOB_HOOK_PREPARE_JOB = $(LIBEXEC)/hooks/hook_prepare_job.py
    LOW_LATENCY_JOB_HOOK_UPDATE_JOB_INFO =
    $(LIBEXEC)/hooks/hook_update_job_status.py
    LOW_LATENCY_JOB_HOOK_JOB_EXIT = $(LIBEXEC)/hooks/hook_job_exit.py
    
    STARTD_JOB_HOOK_KEYWORD = LOW_LATENCY
    
  3. Set the FetchWorkDelay setting. This setting controls how often the condor-low-latency feature will look for jobs to execute, in seconds:
    FetchWorkDelay = 10 * (Activity == "Idle")
    STARTER_UPDATE_INTERVAL = 30
    
  4. The daemon that controls the communication between MRG Messaging and MRG Grid is called the caro daemon. It can be configured by editing the file located at /etc/opt/grid/carod.conf. This file controls the active broker other options such as the exchange name, message queue and IP information.
    The Condor job hooks are configured by editing the file located at /etc/opt/grid/job-hooks.conf. This file specifies the port and IP information that the job hooks can use to contact the caro daemon. The IP and port information in this file must match the information used in the carod configuration file.
  5. When all the components are configured, start the MRG Messaging broker.
    # service qpidd start
    Starting qpidd daemon:                   [  OK  ]
    
  6. Start the Condor low latency daemon as a service:
    # service condor-low-latency start
    Starting condor-low-latency service:     [  OK  ]
    
  7. Submitting a job using condor-low-latency scheduling is similar to submitting a regular Condor job, with the main difference being that instead of using a file for submission the job's attributes are defined in the application headers field of a MRG Messaging message. There are however some differences between the job description fields. To ensure the fields are correct, a normal Condor job submission file can be translated into the appropriate fields for the application headers by using the condor_submit command with the -dump option:
    $ condor_submit myjob.submit -dump output_file
    
    This command would produce a file named output_file. This file contains the information contained in the myjob.submit in a format suitable for placing directly into the the application header of a message. This method only works when queuing a single message at a time.

    Important

    The myjob.submit should only have one queue command with no arguments. For example:
    executable = /bin/echo
    arguments = "Hello there!"
    queue
    
  8. When submitting jobs in messages using this method, it is only possible to submit one job for every message. To submit multiple jobs of the same type, multiple messages - each containing one job - will need to be sent to the broker.
    Any messages submitted this way must have a reply-to field set, or the jobs will not run. They must also include a unique message ID.
    If data needs to be submitted with the job, it will need to be compressed and the archive placed in the body of the message. Similarly, results of the job will be placed in the body of the message to signify completion.