Build Plugins with Pluggy

Sun, Oct 18, 2020 Read as Markdown

Introduction

The blog post is a write up of my two talks from PyGotham and PyCon India titled, Build Plugins with Pluggy. The write-up covers a trivial use-case, discusses why a plugin-based architecture is a good fit, what is plugin-based architecture, how to develop plugin-based architecture using pluggy, and how pluggy works.

Link to PyCon India 2020 Talk

Trivial Use Case

For the scope of the blog post, consider a command-line application queries gutenberg service, processes the data, and displays the relevant information. Let’s see how to build such an application using pluggy.

Here is the JSON output from the application.

 $python host.py search -t  "My Bondage and My Freedom"
[
    {
        "bookshelves": [
            "African American Writers",
            "Slavery"
        ],
        "copyright": false,
        "download_count": 1538,
        "media_type": "Text",
        "name": "Douglass, Frederick",
        "title": "My Bondage and My Freedom",
        "xml": "http://www.gutenberg.org/ebooks/202.rdf"
    }
]

Normal code

Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Normal_Architecture.png

The application has three parts - user input processor, details gatherer, and result renderer.

The below is the code

import click
import requests
import json
from pygments import highlight, lexers, formatters

def colorize(formatted_json):
    return highlight(
        formatted_json.encode("UTF-8"),
        lexers.JsonLexer(),
        formatters.TerminalFormatter(),
    )

def print_output(resp, kwargs):
    data = resp.json()
    table = [
        {
            "name": result["authors"][0]["name"],
            "bookshelves": result["bookshelves"],
            "copyright": result["copyright"],
            "download_count": result["download_count"],
            "title": result["title"],
            "media_type": result["media_type"],
            "xml": result["formats"]["application/rdf+xml"],
        }
        for result in data["results"]
    ]
    if kwargs.get('format', '') == 'json':
        indent = kwargs.get("indent", 4)
        formatted_json = json.dumps(table, sort_keys=True, indent=indent)
        if kwargs.get('colorize'):
            print(colorize(formatted_json))
        else:
            print(formatted_json)
    # TODO: Add YAML Format
    # TODO: Add Tabular Format

class Search:
    def __init__(self, term, kwargs):
        self.term = term
        self.kwargs = kwargs

    def make_request(self):
        resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
        return resp

    def run(self):
        resp = self.make_request()
        print_output(resp, self.kwargs)

@click.group()
def cli():
    pass

@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
@click.option("--format", "-f", type=str, help="Output format", default='json')
def search(title, author, **kwargs):
    if not (title or author):
        print("Pass either --title or --author")
        exit(-1)
    else:
        search = Search(title or author, kwargs)
        search.run()

if __name__ == '__main__':
    cli()

The print_output function supports one output format. It’s easy to add one more format. When the application is a library, print_output suffers from a few issues while supporting more output renderers. It’s hard for a developer to support all possible and requested formats by end-users. It’s painful to extend the functionality to every format. One way to extend the functionality is to re-architect the code to follow plugin based architecture.

What are plugins?

noun: plug-in is a software component that adds a specific feature to an existing computer program.

A plugin is a software component that enhances or modifies the behavior of the program at run-time. For example, Google Chrome extension or Firefox addon change the behavior or adds functionality to the browser. The browser extensions are good example for plugin based architecture.

Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Plugin_Architecture_.png

In general, plugin architecture has two main components - host/caller/core system and plugin/hook. The host or core system is responsible for calling the plugin or hook at registered functionality.

Pluggy introduction

Pluggy is a Python library that provides a structured way to manage, discover plugins, and enable hooks to change the host program’s behavior at runtime.

Here is the code structure of the application.

$tree                                                                                                                                                                                               (pluggy_talk)
.
├── LICENSE
├── README.md
├── hookspecs.py
├── host.py
├── output.py
├── requirements.txt
└── tests.py

Apart from the test file, there are three python files. Before getting to know what are these three files, let’s familiarize them with pluggy concepts.

Host Program/Core system - host.py is the core system that orchestrates the program flow by discovering, registering, and calling them.

class Search:
    def __init__(self, term, hook, kwargs):
        # initializes the attrs

    def make_request(self):
        # makes the request to gutenberg URL

    def run(self):
        # co-ordinates the flow

def get_plugin_manager():
    # plugin spec, implementation registration

@click.group()
def cli():
    pass

@cli.command()
# click options
def search(title, author, **kwargs):
    # validates the user input, manages search workflow

def setup():
    pm = get_plugin_manager()
    pm.hook.get_click_group(group=cli)

if __name__ == "__main__":
    setup()
    cli()

Plugin - The file output.py implements the plugin[s] logic.
Plugin Manager (instance in host.py) - Plugin manager is responsible for creating instances for plugin management.
Hook Specification (hookspec.py) - Hook specification is the blueprint or contract for the plugin. The hook specification is a python function or a method with an empty body.
Hook Implementation (function/method in output.py) - Hook implementation carries hook logic.

Pluggy walkthrough

Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Plugin_Flow.png

The plugin workflow happens in a single machine. The registration, hooking calling occurs in the same process as of host program. The above image represents the logical flow of the plugin-based architecture. Each colored block represents different functionality, and the arrow represents the direction of the flow.

Hook Spec

# hookspec.py

import pluggy
hookspec = pluggy.HookspecMarker(project_name="gutenberg")

@hookspec
def print_output(resp, config):
    """Print formatted output"""

A hook specification is a contract for the hook to implement. The first step in declaring the hook specification is to create an instance of HookspecMarker with the desired name. The second step is to mark the python function as hookspec using the marker as a decorator.

print_output hook name is print_output and defines two arguments in the function signature - response object and configuration object.

Hook Implementation

# Name should match hookspec marker (plugin.py)
hookimpl = pluggy.HookimplMarker(project_name="gutenberg")

@hookimpl
def print_output(resp, config):
    """Print output"""
    data = resp.json()
    table = [
        {
            "name": result["authors"][0]["name"],
            "bookshelves": result["bookshelves"],
            "copyright": result["copyright"],
            "download_count": result["download_count"],
            "title": result["title"],
            "media_type": result["media_type"],
            "xml": result["formats"]["application/rdf+xml"],
        }
        for result in data["results"]
    ]
    indent = config.get("indent", 4)
    if config.get('format', '') == 'json':
        print(f"Using the indent size as {indent}")
        formatted_json = json.dumps(table, sort_keys=True,
                                    indent=indent)
        if config.get('colorize'):
            print(colorize(formatted_json))
        else:
            print(formatted_json)

The function print_output implements the hook implementation. Hook spec and hook implementation functions should carry the same function signature.

The first step in hook specification is to create an instance of HookimplMarker with the same name in HookspecMarker. The second step is to mark the python function as a hook implementation using the marker as a decorator.

print_output function performs serious of operations - read JSON data from the response, collect the relevant details from the JSON data, collect the configuration operation passed to the plugin, and at last, print the details.

Plugin Manager

import hookspecs
import output

def get_plugin_manager():
    pm = pluggy.PluginManager(project_name="gutenberg")
    pm.add_hookspecs(hookspecs)
    # Add a Python file
    pm.register(output)
    # Or add a load from setuptools entrypoint
    pm.load_setuptools_entrypoints("gutenberg")
    return pm

The plugin manager is responsible for discovering, registering the hook specification, and hook implementation.

The first step is to create the PluggyManager instance with the common name. The second step is to add the hook specification to plugin manager. The final step is to register or discover the python hook implementation. The implementation can be in python files in the import system path or registered using python setup tools entry point. In the example, the output.py resides in the same directory.

Invoke the hook

# host.py
class Search:
    def __init__(self, term, hook, kwargs):
        self.term = term
        self.hook = hook
        self.kwargs = kwargs

    def make_request(self):
        resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
        return resp

    def run(self):
        resp = self.make_request()
        self.hook.print_output(resp=resp, config=self.kwargs)


@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
def search(title, author, **kwargs):
    if not (title or author):
        print("Pass either --title or --author")
        exit(-1)
    else:
        pm = get_plugin_manager()
        search = Search(title or author, pm.hook, kwargs)
        search.run()

After setting up all the parts for hook calling, the final step in the workflow is to call the hook at the proper time. The run method after receiving the response, calls the print_output hook.

Output

Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/output_indent_2.png

Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/output_indent_4.png

The two screenshots are from two different inputs. The input terms are My freedom and My bondage and indent as 4 and 8.

Internal details

It’s possible to register multiple hook implementation for a single hook specification. In our case, there can be two print_output implementations, one for JSON rendering and another for YAML rendering. The pluggy will call each hook one after the other in Last In First Out order.

The hooks can return output. When the hooks return values, the caller will receive the return values as a list. In our case, self.hook.print_output(resp=resp, config=self.kwargs), hooks don’t return any value because there is only one plugin.

It’s sub-optimal to call other hooks when the previous hook returns a value. To short circuit the flow, pluggy provides an option while declaring the specification. @hookspec(firstresult=True) notifies the plugin manager to stop calling the hooks once a return value is available.

Testing the Plugin

Testing the hook implementation is same as testing any other python function.

Here is how the unit test looks like

def test_print_output(capsys):
    resp = requests.get("http://gutendex.com/books/?search=Kafka")
    print_output(resp, {})

    captured = capsys.readouterr()
    assert len(json.loads(captured.out)) >= 1

Here is how the integration test looks like

def test_search():
    setup()
    runner = CliRunner()
    result = runner.invoke(
        search,
        ["-t", "My freedom and My bondage",
        "--indent", 8, "--colorize", "false"],
    )

    expected_output = """
[
        {
                "bookshelves": [
                        "African American Writers",
                        "Slavery"
                ],
                "copyright": false,
                "download_count": 1201,
                "media_type": "Text",
                "name": "Douglass, Frederick",
                "title": "My Bondage and My Freedom",
                "xml": "http://www.gutenberg.org/ebooks/202.rdf"
        }
]
    """
    assert result
    assert result.output.strip() == expected_output.strip()

Conclusion

Pytest test runner uses pluggy extensively. There are 100+ pytest plugins use pluggy framework to develop the testing features like test coverage .
Tox aims to automate and standardize testing in Python. It is part of a larger vision of easing the packaging, testing and release process of Python software.
Datasette is a python tool to publish and explore the dataset.
The concept of plugin is a powerful concept, it has a lot of advantage while managing highly configurable and extensible systems.

Important links from the blog post:

Slides - https://slides.com/kracekumar/pluggy
Pluggy Documentation - https://pluggy.readthedocs.io/en/latest/
GitHub Source Code -https://github.com/kracekumar/pluggy_talk