Ansible Roles Testing — OSCI.IO - Open Source Community Infrastructure

My team is very small so we try to automate as much as possible. As sysadmins we use Ansible and follow the “Infrastructure as Code” principle. Over time we built many Ansible roles and we want to be sure we spot problems when we make changes but also when new version of the tooling are released. We’ve tried to make this as easy to use as possible.

In this article we’ll explore the different pieces of this system and explain some of the steps and difficulties we encountered.

The Tooling

We’re not reinventing the wheel so we’re using Molecule to run the tests as well as all the linting tools available:

yamllint: to validate all YAML files
ansible-lint: to validate the test playbook and the role’s rules
flake8: for all Python files, especially if your role embeds a module or a filter

Molecule offers various backends to run the tests. Initially we used the docker backend but we soon ran into problems. One major problem is Docker’s inability to setup cgroups properly, breaking systemd support in containers:

https://github.com/docker/for-linux/issues/835
https://github.com/systemd/systemd/issues/11752

Since our roles are deploying various services and systemd is the default in Red Hat systems (and other major distributions) we absolutely needed a working systemd. This was not the only problem and in the meanwhile podman made a lot of progress so we decided to switch to podman.

The CI (Continuous Integration)

GitHub is very popular but it is not open source and even before Microsoft decided to acquire it we felt better by using Gitlab instead. GitLab is not fully open source unfortunately but it would be possible to operate the open-core and still have enough features if things turned sour. At the time (in 2016) alternatives like gitea were not mature and featureful enough to be considered,

GL (Gitlab) comes with its own CI and it has some interesting features for what we want to do. You can still use our work as a template if you use a different CI but you’ll need to find some solutions to really makes things handy.

Runners (Gitlab CI workers)

First CIs are nowadays usually using containers but Molecule also spawn containers. Having a working Docker in Docker setup is not trivial and not recommended upstream for various technical and security reasons (the post is linked in the official Docker documentation). Running podman in podman with systemd inside needed some work but it is now working well. We integrated this setup in a Gitlab CI custom worker that we use for our public repositories (but that would work on a custom instance too). We’ll talk in more details about this setup in a following article.

Reusing the template

The second feature we love about GL CI is its ability to include remote bits of configuration. This is inspired by the work the Debian Salsa team put in place to provide rules that every package could use. Previously our rules were used as a template but after each update we had to update all our roles and that was very painful. With this mechanism any change is automatically available to all roles.

Each role simply add this .gitlab-ci.yml configuration to load the template:

---

# it is not possible to use custom variables in includes, so it has to match
variables:
  TEST_BRANCH: master

include:
  - project: osci/ansible-molecule-tests-template
    file: gl-ci/pipeline.yml
    ref: HEAD

In our case we need to clone the template repository to fetch some files which constitute the base of the Molecule configuration (more details in the next chapter). Unfortunately if GL CI allows certain variables to be used in includes, it is not possible to use custom variables defined in the YAML itself; that’s why we need to sync the TEST_BRANCH variable with the ref parameter.

The configuration template

Even if technically this is an include of a remote shared configuration it acts as a template on top of which you can customize. The template documentation describes how to use it but does not necessarily explains what all the changes are used for and we’re going to dig into it now.

The entry point gl-ci/pipeline.yml:

sets up a Python virtual environment
installs all the tooling dependencies
tweaks a few parameters for the containers (subuid/subgid, network backend for podman…)
install the base Molecule configuration
dumps a few debug information
and finally runs molecule

Some extra dependencies like netaddr or testinfra are also installed at this time since they need to be available on the control node.

Running tests for multiple versions

We wanted to test with specific Ansible versions because open source communities move at their own pace and cannot upgrade to the latest version and deal with the upgrade changes at once. We also had breakage with specific versions or broken dependencies and we wanted to be able to skip such versions. We generally also test for the latest version of the whole stack to be sure to keep up-to-date.

We use a matrix of dependencies using the TESTS_DEPS variable to set independent runs of the tests (expressed as pip dependencies).

Extended Molecule Configuration

Molecule uses Ansible as backend for its work and it is very practical because you can use Ansible powerful features like modules and variables to prepare and drive your tests, but unfortunately if does not apply to the whole setup. We modified the podman backend rules to load host_vars and group_vars variables and take them into account during container instantiation.

With this setup the template is able to add default values for container parameters; for example it is used to set the hostname on the example.com domain and setup the network. But these are defaults which can be overriden fully or partially in the role (for all scenarios) and each scenario can also override them.

For example, we use it for a role that spawn a client-server service that cannot be properly tested on the same shared network namespace due to port conflict. In this role, in molecule/_resources/group_vars/ns_servers/ns.yml and molecule/_resources/host_vars/ansible-test-web.yml we are able to change the exposed ports using the usual Ansible hosts and groups variables (grouping with group order and host overrides are then possible).

Common setup

Network configuration

For roles deploying a simple service the only requirement would be to be able to install packages and then tests can simply target localhost. For more complex services or if you need to test ACLs then test containers cannot share the same network host. The template sets up a bridged network that allocates a dedicated IP per container. The bind9 role we talked about earlier needs such a configuration to test ACLs for primary->secondary zone replication as well as checking recursive queries ACLs. In this network all test containers can communicate using IPv4 and IPv6.

Test prepare step

The prepare steps of the molecule tests are defined in this template because we have identified a few settings that are needed in all our roles and did not want to repeat them. An example of such setting is to ensure the locales are properly set. Also containers only ship the bare minimum but we want our tests to reflect a distro base system as our roles are generally meant for bare metal and VMs; for example we install Network-Manager on Red Hat system since this is the default in non-container installation and we can then fully test the role can open ports for its deployed services.

The common steps also install basic tools to dump a few debug information in the logs, making debugging esier.

You can add your own steps in molecule/\_resources/prepare_custom.yml as this file is included from the base file.

How to use

A typical project would use the .gitlab-ci.yml seen above and have this structure under molecule/\_resources/:

/ with `molecule.yml` and `tests` as usual
converge.yml as usual
prepare_custom.yml replaces the usual prepare.yml
host_vars/group_vars to add play and also container definition variables
files/templates if you need them in the playbooks above

We usually put out tests in molecule/_resources/tests and symlink them as needed in the scenario tests directory since many are shared.

More info can be found in the template README.

Conclusion

The initial development took some time, especially since we had to set up our own runner, but it is working really well and greatly simplified our setup without sacrificing customization. We did not want to diverge too much from a vanilla molecule setup but had to override the podman hooks; we night be able to convince upstream of the value of this feature; the rest is pretty much the usual Molecule workflow and should be easy to pickup by anyone.

Currently we mostly have unrelated roles, or super-roles, but no topic-related bunch of roles that’s why we did not create any Ansible collections so far. It would be nice to add such support in the future.