Vorner's random stuff

Announcing pyo3-log

It’s been a while since I published the library and haven’t had the time to write something about it until now, but better late than never.

About pyo3

The pyo3 is quite a nice library to help interoperate between Python and Rust. There are multiple similar libraries, but personally I find this one to have the most convenient API ‒ most of its „magic“ happens by procedural macros in form of function attributes in combination with some clever types (like the Python token that represents both the interpreter and its Global Interpreter Lock).

Recently the library started to work on stable Rust (previously it required nightly) so I’ve decided to give it a closer look and experiment with it a bit.

When playing with it, I’ve noticed the module produced by the library doesn’t solve just the bare minimum of a module, exporting some classes and functions. It allows one to do the more advanced things (easily making objects bool or string convertible, creating iterators or context managers). It even gives one the built-in help(module) and help(Class) documentation, by reusing the ordinary Rust documentation comments. The integration feels almost seamless. There are, of course, some details that makes it impossible to just write a fully idiomatic Rust code, stamp an attribute on the function and be done with it, like the GIL or that some types don’t know how to convert to their counterpart in Python, but it’s not the gruesome juggling with C APIs and structs to do what needs to be done.

What is missing, then, to be able to write a really nice and Python-looking module with just small amount of work?

Logging

When writing Rust code, you may be used to spreading logging statements through it. Or you might use a dependency that logs itself. But when you wrap it up into a Python module, where do the messages go? Well, by default, nowhere. They disappear.

The log library is just a facade. It allows one to produce the messages, but needs a back-end to send the messages somewhere. Usually, such back-end is global for the whole application. However, due to technical details, the global means the one Rust module in here ‒ if there are two distinct Rust modules loaded to one Python program, they have independent back-ends (I haven’t tried what happens in the reverse scenario where the top level application is in Rust and loads some Python, or when that python loads some more Rust). So as the author of the module, you can decide to simply send the messages where ever you like, for example by using env-logger or fern.

And then, of course, the Python side of the application will also have its own logging and will log somewhere. If you’re the owner of both, then you probably can make all instances of Rust modules and the Python logging system produce same-looking messages to the same place.

But it’s not exactly the right way to do it. Furthermore, if you decide to provide the module to some third party (publish it, or just provide it to a different department in your company), this is not the expected behaviour and it doesn’t really look like a well-behaved Python module. One expects the log messages to go through the logging system of the top-level application, which, in this case, would be the Python logging system.

Meet the pyo3-logging bridge

So I’m finally getting to the point. The pyo3-log crate is not a module in the Python sense, it can’t be used directly from that world. Instead it provides the Log (the back-end) for Rust’s log library. When logging a message, it’d call into the Python and hand the message over to its logging library.

Of course, it’s not that simple. The smaller detail is that Python doesn’t have the trace log level. It does, however, allow specifying log levels as integers so the library simply extrapolates and uses the next integer. The calling application can add a name for it.

The other one is performance. Once a message is to be logged, it can’t be really helped, one must go to the Python. But there’s usually going to be huge number of trace and debug messages that won’t get logged and these don’t in principle have to go to Python.

Considering Rust is likely going to be used because of performance (compiled language, no garbage collector, better parallelism due to no GIL in Rust code), interspersing the fast Rust code with calls to comparatively slower Python might be a problem. Furthermore, the long-running or parallel Rust code would likely be running without GIL, allowing the other Python threads to run in parallel with it. But every time a trace statement appears in the code, it would have to lock the GIL (wait its turn to get it), call into Python to find out that no, the message doesn’t have to be logged, like all the ones before that. That doesn’t sound very fast and it could discourage the programmer from proper logging.

This is the reason why the library offers caching of the decisions. The idea is that Rust calls into Python only the first time a message of a particular level and target is logged. If Python is configured not to log it, the next message with the same target and level is simply thrown away without all the hassle of acquiring GIL, blocking other threads and such.

The downside is, it is possible to change the configuration of the Python logging at runtime (or simply after the module has already logged something during the initialization). These changes would not be reflected. Therefore, there’s an option not to use the cache and there’s a function to wipe the cache and start over.

Try it out

The use case is a bit niche, but if you’re in a position where it might be useful, please try it out and report any issues or suggestions to the repository. While it is a release with low version number, I feel like it is mostly feature complete. There’s not much that could be added; there’s one TODO about key-value pairs but it is unstable feature of log, so I’m waiting it out.

I do plan to release new version whenever a breaking release of pyo3 happens, to keep up with it.

I’m not going to put examples here, but there’s enough of them in the documentation and the repository itself.