Introduction
The blog post is a write up of my two talks from PyGotham and PyCon India titled, Build Plugins with Pluggy.
The write-up covers a trivial use-case, discusses why a plugin-based architecture is a good fit, what is plugin-based architecture, how to develop plugin-based architecture using pluggy, and how pluggy works.
Trivial Use Case
For the scope of the blog post, consider a command-line application queries gutenberg
service, processes the data, and displays the relevant information. Let’s see how to build such an application using pluggy.
Here is the JSON output from the application.
$python host.py search -t "My Bondage and My Freedom"
[
{
"bookshelves": [
"African American Writers",
"Slavery"
],
"copyright": false,
"download_count": 1538,
"media_type": "Text",
"name": "Douglass, Frederick",
"title": "My Bondage and My Freedom",
"xml": "http://www.gutenberg.org/ebooks/202.rdf"
}
]
Normal code
The application has three parts - user input processor, details gatherer, and result renderer.
The below is the code
import click
import requests
import json
from pygments import highlight, lexers, formatters
def colorize(formatted_json):
return highlight(
formatted_json.encode("UTF-8"),
lexers.JsonLexer(),
formatters.TerminalFormatter(),
)
def print_output(resp, kwargs):
data = resp.json()
table = [
{
"name": result["authors"][0]["name"],
"bookshelves": result["bookshelves"],
"copyright": result["copyright"],
"download_count": result["download_count"],
"title": result["title"],
"media_type": result["media_type"],
"xml": result["formats"]["application/rdf+xml"],
}
for result in data["results"]
]
if kwargs.get('format', '') == 'json':
indent = kwargs.get("indent", 4)
formatted_json = json.dumps(table, sort_keys=True, indent=indent)
if kwargs.get('colorize'):
print(colorize(formatted_json))
else:
print(formatted_json)
# TODO: Add YAML Format
# TODO: Add Tabular Format
class Search:
def __init__(self, term, kwargs):
self.term = term
self.kwargs = kwargs
def make_request(self):
resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
return resp
def run(self):
resp = self.make_request()
print_output(resp, self.kwargs)
@click.group()
def cli():
pass
@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
@click.option("--format", "-f", type=str, help="Output format", default='json')
def search(title, author, **kwargs):
if not (title or author):
print("Pass either --title or --author")
exit(-1)
else:
search = Search(title or author, kwargs)
search.run()
if __name__ == '__main__':
cli()
The print_output
function supports one output format. It’s easy to add one more format. When the application is a library, print_output
suffers from a few issues while supporting more output renderers. It’s hard for a developer to support all possible and requested formats by end-users. It’s painful to extend the functionality to every format. One way to extend the functionality is to re-architect the code to follow plugin based architecture.
What are plugins?
noun: plug-in is a software component that adds a specific feature to an existing computer program.
A plugin is a software component that enhances or modifies the behavior of the program at run-time. For example, Google Chrome extension or Firefox addon change the behavior or adds functionality to the browser. The browser extensions are good example for plugin based architecture.
In general, plugin architecture has two main components - host/caller/core system and plugin/hook. The host or core system is responsible for calling the plugin or hook at registered functionality.
Pluggy introduction
Pluggy is a Python library that provides a structured way to manage, discover plugins, and enable hooks to change the host program’s behavior at runtime.
Here is the code structure of the application.
$tree (pluggy_talk)
.
├── LICENSE
├── README.md
├── hookspecs.py
├── host.py
├── output.py
├── requirements.txt
└── tests.py
Apart from the test file, there are three python files. Before getting to know what are these three files, let’s familiarize them with pluggy concepts.
-
Host Program/Core system -
host.py
is the core system that orchestrates the program flow by discovering, registering, and calling them.class Search: def __init__(self, term, hook, kwargs): # initializes the attrs def make_request(self): # makes the request to gutenberg URL def run(self): # co-ordinates the flow def get_plugin_manager(): # plugin spec, implementation registration @click.group() def cli(): pass @cli.command() # click options def search(title, author, **kwargs): # validates the user input, manages search workflow def setup(): pm = get_plugin_manager() pm.hook.get_click_group(group=cli) if __name__ == "__main__": setup() cli()
-
Plugin - The file
output.py
implements the plugin[s] logic. -
Plugin Manager (instance in host.py) - Plugin manager is responsible for creating instances for plugin management.
-
Hook Specification (hookspec.py) - Hook specification is the blueprint or contract for the plugin. The hook specification is a python function or a method with an empty body.
-
Hook Implementation (function/method in output.py) - Hook implementation carries hook logic.
Pluggy walkthrough
The plugin workflow happens in a single machine. The registration, hooking calling occurs in the same process as of host program. The above image represents the logical flow of the plugin-based architecture. Each colored block represents different functionality, and the arrow represents the direction of the flow.
Hook Spec
# hookspec.py
import pluggy
hookspec = pluggy.HookspecMarker(project_name="gutenberg")
@hookspec
def print_output(resp, config):
"""Print formatted output"""
A hook specification is a contract for the hook to implement. The first step in declaring the hook specification is to create an instance of HookspecMarker
with the desired name. The second step is to mark the python function as hookspec using the marker as a decorator.
print_output
hook name is print_output
and defines two arguments in the function signature - response object and configuration object.
Hook Implementation
# Name should match hookspec marker (plugin.py)
hookimpl = pluggy.HookimplMarker(project_name="gutenberg")
@hookimpl
def print_output(resp, config):
"""Print output"""
data = resp.json()
table = [
{
"name": result["authors"][0]["name"],
"bookshelves": result["bookshelves"],
"copyright": result["copyright"],
"download_count": result["download_count"],
"title": result["title"],
"media_type": result["media_type"],
"xml": result["formats"]["application/rdf+xml"],
}
for result in data["results"]
]
indent = config.get("indent", 4)
if config.get('format', '') == 'json':
print(f"Using the indent size as {indent}")
formatted_json = json.dumps(table, sort_keys=True,
indent=indent)
if config.get('colorize'):
print(colorize(formatted_json))
else:
print(formatted_json)
The function print_output
implements the hook implementation.
Hook spec and hook implementation functions should carry the same function signature.
The first step in hook specification is to create an instance of HookimplMarker
with the same name in HookspecMarker
. The second step is to mark the python function as a hook implementation using the marker as a decorator.
print_output
function performs serious of operations - read JSON data from the response, collect the relevant details from the JSON data, collect the configuration operation passed to the plugin, and at last, print the details.
Plugin Manager
import hookspecs
import output
def get_plugin_manager():
pm = pluggy.PluginManager(project_name="gutenberg")
pm.add_hookspecs(hookspecs)
# Add a Python file
pm.register(output)
# Or add a load from setuptools entrypoint
pm.load_setuptools_entrypoints("gutenberg")
return pm
The plugin manager is responsible for discovering, registering the hook specification, and hook implementation.
The first step is to create the PluggyManager
instance with the common name. The second step is to add the hook specification to plugin manager. The final step is to register or discover the python hook implementation. The implementation can be in python files in the import system path or registered using python setup tools entry point. In the example, the output.py
resides in the same directory.
Invoke the hook
# host.py
class Search:
def __init__(self, term, hook, kwargs):
self.term = term
self.hook = hook
self.kwargs = kwargs
def make_request(self):
resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
return resp
def run(self):
resp = self.make_request()
self.hook.print_output(resp=resp, config=self.kwargs)
@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
def search(title, author, **kwargs):
if not (title or author):
print("Pass either --title or --author")
exit(-1)
else:
pm = get_plugin_manager()
search = Search(title or author, pm.hook, kwargs)
search.run()
After setting up all the parts for hook calling, the final step in the workflow is to call the hook at the proper time. The run
method after receiving the response, calls the print_output
hook.
Output
The two screenshots are from two different inputs. The input terms are My freedom and My bondage
and indent as 4 and 8.
Internal details
It’s possible to register multiple hook implementation for a single hook specification. In our case, there can be two print_output
implementations, one for JSON rendering and another for YAML rendering. The pluggy will call each hook one after the other in Last In First Out
order.
The hooks can return output. When the hooks return values, the caller will receive the return values as a list. In our case, self.hook.print_output(resp=resp, config=self.kwargs)
, hooks don’t return any value because there is only one plugin.
It’s sub-optimal to call other hooks when the previous hook returns a value. To short circuit the flow, pluggy provides an option while declaring the specification.
@hookspec(firstresult=True)
notifies the plugin manager
to stop calling the hooks once a return value is available.
Testing the Plugin
Testing the hook implementation is same as testing any other python function.
Here is how the unit test looks like
def test_print_output(capsys):
resp = requests.get("http://gutendex.com/books/?search=Kafka")
print_output(resp, {})
captured = capsys.readouterr()
assert len(json.loads(captured.out)) >= 1
Here is how the integration test looks like
def test_search():
setup()
runner = CliRunner()
result = runner.invoke(
search,
["-t", "My freedom and My bondage",
"--indent", 8, "--colorize", "false"],
)
expected_output = """
[
{
"bookshelves": [
"African American Writers",
"Slavery"
],
"copyright": false,
"download_count": 1201,
"media_type": "Text",
"name": "Douglass, Frederick",
"title": "My Bondage and My Freedom",
"xml": "http://www.gutenberg.org/ebooks/202.rdf"
}
]
"""
assert result
assert result.output.strip() == expected_output.strip()
Conclusion
- Pytest test runner uses pluggy extensively. There are 100+ pytest plugins use pluggy framework to develop the testing features like
test coverage
. - Tox aims to automate and standardize testing in Python. It is part of a larger vision of easing the packaging, testing and release process of Python software.
- Datasette is a python tool to publish and explore the dataset.
- The concept of plugin is a powerful concept, it has a lot of advantage while managing highly configurable and extensible systems.
Important links from the blog post:
- Slides - https://slides.com/kracekumar/pluggy
- Pluggy Documentation - https://pluggy.readthedocs.io/en/latest/
- GitHub Source Code -https://github.com/kracekumar/pluggy_talk
See also
- Python Typing Koans
- Model Field - Django ORM Working - Part 2
- Structure - Django ORM Working - Part 1
- jut - render jupyter notebook in the terminal
- Five reasons to use Py.test
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.