[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Avocado-devel] [RFC] Improve job status



Hi folks,

This is the RFC for the rework in avocado job exit status. Some discussion have already happened on github, but still we should document the decisions and open the discussion for a broader audience as well.

Motivation
=======

Currently the job expects from the runner a list of tests that failed to determine the exit code Avocado will finish with. If the list is empty, the exit code is 0. Otherwise, 1. This implementation is very limited, given the number of possibilities of test ending status and the exit codes. The goal of this RFC is to determine the internal API between job and runner, the relationship between the tests status and the Avocado exit codes and the meaning of the exit codes.

Use cases/current issues:

- When all tests end with 'PASS' the avocado exit code is 0, which means "AVOCADO_ALL_OK". - When some or all tests end with 'FAIL', avocado exit code is 1, which is defined as "AVOCADO_TESTS_FAIL". - When the job is interrupted with CTRL+C: Current test is INTERRUPTED, avocado exit code is "AVOCADO_TESTS_FAIL". - When the job hits the timeout before finish the tests, we have 2 possible results: -- Timeout during a test: The test is interrupted, user sees the status ERROR (this status is buggy, it's being fixed, but it's not part of this RFC) for the test and next tests are skipped, avocado exit code is "AVOCADO_TESTS_FAIL". -- Timeout between tests: Next tests are skipped, avocado exit code is "AVOCADO_ALL_OK".


Internals
======

We have currently a dictionary with the status as key and True or False as value for each status:

mapping = {"SKIP": True, "ABORT": False, "ERROR": False, FAIL": False, "WARN": True, "PASS": True, "START": True, "ALERT": False, "RUNNING": False, "NOSTATUS": False, "INTERRUPTED": False}

That dictionary tells the runner is a status is good or bad:

...
if not status.mapping[test_state['status']]:
    failures.append(test_state['name'])
...
return failures
...

Based on that return, the job decides between 0 or 1 as the exit code:

...
tests_status = not bool(failures)
if tests_status:
    return exit_codes.AVOCADO_ALL_OK
else:
    return exit_codes.AVOCADO_TESTS_FAIL
...

Currently the exit codes available are:

AVOCADO_ALL_OK = 0
AVOCADO_TESTS_FAIL = 1
AVOCADO_JOB_FAIL = 2
AVOCADO_FAIL = 3
AVOCADO_JOB_INTERRUPTED = 4


Recommended Solution
===============

Runner should be able to provide a more accurate information to the job, better representing what actually happened to the tests. After some discussion in github, we are currently proposing the minimum enough information for the runner to report so the job can decide the best fit for the exit code:

On the runner:
- Instead of a list called 'failures', the proposal is to have a set, called 'summary'. - If the job hits the timeout, being the test reported as INTERRUPTED or SKIP, we add the string 'INTERRUPTED' to the 'summary'. - If the test finishes with a bad status ('False' in the mapping), we add the string FAIL to the 'summary'. - If the test finishes with a good status, we don't add anything to the 'summary'. - If the runner someway crashes, 'summary' will not be returned and the job should handle that.

On the job:
- Receive the summary and test:
-- If the string "INTERRUPTED" is there, exit with "AVOCADO_JOB_INTERRUPTED", regardless if any test failed. -- If we don't have "INTERRUPTED" in 'summary' but still we have something there, exit with "AVOCADO_TESTS_FAIL".
-- Empty 'summary' means job should exit with "AVOCADO_ALL_OK".
-- 'None' in 'summary' means runner crashed and job should exit with "AVOCADO_JOB_FAIL".


Additional Improvements
================

There is a request to the exit codes to be ORable. To do so, we have to use different codes of what we have currently, changing them to numbers that set only one bit to 1 when converted to binary:

AVOCADO_ALL_OK = 0
AVOCADO_TESTS_FAIL = 1
AVOCADO_JOB_FAIL = 2
AVOCADO_FAIL = 4
AVOCADO_JOB_INTERRUPTED = 8

That way, the test status should be a code that can be used to have more information about what happened to the group of tests. Example:

Test1: PASS
Test2: FAIL
Test3: INTERRUPTED
Test4: SKIP

On the example above, we have a FAILed test, making job to use the AVOCADO_TESTS_FAIL code, and an INTERRUPTED test, making job to use the AVOCADO_JOB_INTERRUPTED. PASS and SKIP are considered good statuses, so the final job exit code would be 9 (AVOCADO_ALL_OK | AVOCADO_TESTS_FAIL | AVOCADO_JOB_INTERRUPTED).

This request is quite well designed, but still there is room for discussion before it gains upstream.

Thanks,
--
apahim









[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]