Unit Testing Framework

A common misunderstanding about automated assessments is that people think they are trivial to create. However, my experience tells me that it is, in fact, a difficult task.

The first time I created an automated assessment to grade students’ programs, I wrote almost everything from scratch, thinking it might just require a few lines of code to compare the output from the student’s program against the expected output. However, I soon realized that I needed to write more than just lines to compare outputs. Additionally, I had to write a bunch of code to properly iterate through students’ programs, clean up any intermediate results and side effects, and collect grades and feedback. All these tasks significantly distracted me.

Furthermore, I realized that the assessment code should be organized flexibly. For instance, if an assignment contained two questions in a previous course offering, and you created assessment code to assess both questions, but later you decided to split the assignment into two smaller assignments, each containing only one question, you would want your assessment code to be conveniently reconstructed.

I’ll demonstrate how I managed to use the unit testing framework pytest to handle these problems.

pytest

I’m sure there are more things to discuss as the complexity of the assessment code increases, but here are several essential concepts about pytest that allowed me to write my assessment code efficiently. They are test discovery, fixture, fixture scope, teardown/cleanup, and overriding fixtures.

Test Discovery (see its pytest document)

In our case, we just need to remember that pytest implements standard Python test discovery. That means it will search for test_*.py or *_test.py files, and in those files, it will collect test prefixed functions or methods outside of classes, and test prefixed functions or methods inside Test prefixed classes (without an __init__ method).

For example, consider the following folder structure and code in the files:

tests
├── a.py
└── test_a.py
# a.py
def test_a():
    assert True
# test_a.py
def test_a():
    assert True

class A:
    def test_a(self):
        assert True

class TestA:
    def test_a(self):
        assert True

Then the results of pytest -v tests will be as follows, since a.py and class A are skipped:

tests/test_a.py::test_a PASSED
tests/test_a.py::TestA::test_a PASSED

Notice that the methods only reference self in the signature as a formality. No state is tied to the actual test class. This is a difference from some other unit testing frameworks.

Fixture (see its pytest document)

In testing, a fixture provides a defined, reliable, and consistent context for the tests. This could include an environment (for example, a database configured with known parameters) or content (such as a dataset).

pytest recognizes a particular function as a fixture if it is decorated with @pytest.fixture, and its returned object is a fixture. There can be more than one fixture for a test. Fixtures can use (or depend on) other fixtures. If an earlier fixture function has a problem and raises an exception, pytest will stop executing fixture functions for that test; meanwhile, it will mark the test as having an error, indicating that the test could not be attempted.

A fixture often returns something which can be later used in test functions. In the following example, the my_obj argument in the test_obj function is the fixture returned by the my_obj fixture function.

import pytest

@pytest.fixture
def my_obj():
    return "Assume this is an object"

def test_obj(my_obj):
    assert my_obj == "Assume this is an object"

pytest has lots of useful built-in fixtures (see its list). Here are some of them which I think are very useful:

  • capsys/capfd: it allows you to access captured output from a test function without caring about setting/resetting output streams.
  • tmp_path: it provides a temporary directory unique to each test function.
  • request: it provides information for the requesting test function, see an example here.

Fixture Scope (see its pytest document)

By default, fixtures have the scope of function, which means they are destroyed (i.e., the cached objects are destroyed if any) at the end of the test. However, we can use other scopes so that a fixture function is invoked only once for multiple tests requiring it.

Fixtures are created when first requested by a test and are destroyed based on their scope:

  • function: the default scope, the fixture is destroyed at the end of the test. - class: the fixture is destroyed during teardown of the last test in the class. - module: the fixture is destroyed during teardown of the last test in the module. - package: the fixture is destroyed during teardown of the last test in the package where the fixture is defined, including sub-packages and sub-directories within it. - session: the fixture is destroyed at the end of the test session.

I would also recommend going through the pytest documentation about fixture availability.

Teardown/Cleanup (see its pytest document)

If a test requires some necessary preparations by requiring one or more fixtures, we would like to have necessary clean-up so that those preparations—which are only necessary for the particular test—do not mess with any other tests.

pytest provides a very simple mechanism to achieve this, which is to use yield instead of return inside fixture functions. Any code placed after yield is considered teardown code. For example:

@pytest.fixture
def my_fixture():
    my_obj = setup_code()
    yield my_obj
    teardown_code()

If setup_code throws an exception, pytest will not try to run the teardown_code. Otherwise, pytest will always attempt to execute teardown_code.

It is enough for us to understand how pytest handles teardown, but it is recommended to read about safe teardown.

Overriding Fixture (see its pytest document)

Fixture functions can be defined in the same .py file with their requiring tests. However, it is likely to cause problems if we want to use fixtures defined in different files. For example:

graph TD;
    subgraph test_b.py
        fixture_b
        test_b
    end

    subgraph test_a.py
        test_a
        fixture_a
    end

    test_a-->|uses|fixture_b;
    test_b-->|uses|fixture_a;

Therefore, pytest allows us to put fixture functions into a file called conftest.py so that tests from multiple test modules in the directory can access those fixture functions.

graph TD;
    subgraph test_b.py
        test_b
    end

    subgraph test_a.py
        test_a
    end

    subgraph conftest.py
        fixture_a
        fixture_b
    end

    test_a-->|uses|fixture_b;
    test_b-->|uses|fixture_a;

Please do not mix the concept of fixture scope with this. In the following example, the module scoped fixture function will return different fixtures for test_a and test_b, but the session scoped fixture function will return the same fixture.

# conftest.py
import random

import pytest

@pytest.fixture(scope="module")
def module_val():
    """Generate a random number."""
    return random.randint(0, 9)

@pytest.fixture(scope="session")
def session_val():
    """Generate a random number."""
    return random.randint(0, 9)
# test_a.py
def test_a(module_val, session_val):
    # intentionally fail so we can see the values
    assert 0, (module_val, session_val)
# test_b.py
def test_b(module_val, session_val):
    # intentionally fail so we can see the values
    assert 0, (module_val, session_val)

A sample output looks like:

FAILED test_a.py::test_a - AssertionError: (2, 4)
FAILED test_b.py::test_b - AssertionError: (6, 4)

The simplest approach to override fixtures is probably by creating a different conftest.py file. In the following example, we override the expected_len fixture for special cases.

tests/
    conftest.py
        # content of tests/conftest.py
        import pytest

        @pytest.fixture
        def expected_len():
            return 8

    test_username.py
        # content of tests/test_username.py
        def test_len(expected_len):
            username = 'username'  # ideally this is extracted from the student's file name
            assert len(username) == expected_len

    special_cases/
        conftest.py
            # content of tests/special_cases/conftest.py
            import pytest

            @pytest.fixture
            def expected_len():
                return 16  # override

        test_username_special.py
            # content of tests/special_cases/test_username_special.py
            def test_len(expected_len):
                username = 'special-username'  # ideally this is extracted from the student's file name
                assert len(username) == expected_len