Unexpected Failures
Let’s consider the following test case used in a first-year programming course that requires students to write C programs.
def test_mode():
"""Check if the student's program calculates the mode correctly."""
# Prepare arguments for subprocess
args = ["./mode", "1", "2", "2", "3", "3"]
# Run the process
result = subprocess.run(args, text=True, capture_output=True)
# Check for abnormal exit conditions
assert result.returncode == 0 and result.stdout != "", "Program exited abnormally"
# Verify the output
lines = result.stdout.splitlines()
# lines[0] has to contain the output
ans = list(map(int, lines[0].split()))
expected_mode = [2, 3]
assert ans == expected_mode, "Incorrect output"
# If all assertions pass
pass
The assessment creator has coded several novice-friendly messages to aid students in interpreting the feedback. Ideally, the test case should only fail at those assertions. However, this code can also fail at the line:
ans = list(map(int, lines[0].split()))
This failure can occur because the student’s program may output non-integer
characters, causing the map(int, ...)
to fail. The resulting feedback would
be:
ValueError: invalid literal for int() with base 10: 'mode'
This feedback is not novice-friendly and, even worse, it is unrelated to C programming.
The point I want to make here is that most of the time, we want the test case to fail only at assertions or at any place that is expected. If the test case fails outside of assertions, this is what I refer to as an unexpected failure.
Solution
The solution is actually quite simple: for any test case that contains non-assertion code, we code an outcome flipped version.
def test_a():
x = common_code()
assert x == 0
def test_a_flipped():
x = common_code()
assert x != 0
If one passes and the other fails, we know they both reached the assertion. This strategy helps ensure that the test cases fail only for the intended reasons.