Dark side of POSIX APIs

One user of the signal-hook crate expressed an interest in knowing the PID and UID of the process that sent the signal. As a general idea I liked it. The low-level sigaction does indeed provide this information and it might be useful (more than once I have wondered why the program terminated because someone sent it a SIGTERM, but the logs contained no information about what other process it was; knowing its PID is the first step to figuring it out, though I don’t know of a nice portable way to convert a PID to a name of a process or something of similar usability). In fact, I had a long-standing open issue on the repository about being able to somehow make the additional information in the siginfo_t structure available.

So I’ve decided to give it a go and implement it. In fact, I’ve already had the place in the API this would plug in that would make it immediately available to all the signal-hook-* asynchronous add-ons too, not only the iterator inside signal-hook.

For a few reasons, this turned out to be a bigger challenge than I expected.

The usual async-signal-safe trouble

Inside the signal handler, one can do almost nothing. It’s because it can happen on whatever thread at any time, so one can’t allocate in there (what if that thread was actually in the middle of some allocator routine with inconsistent thread-local data structures), can’t lock stuff (the lock could be held by the same thread, which would deadlock)… actually, there’s a list of functions that can be used from the C standard library, but calling anything else is UB.

So signal-hook gives the user bunch of patterns how to postpone the reaction to a signal for a later time, for once outside of the actual handler. There the user can enjoy the full feature set of Rust or actually anything one can call from it too. But it’s one thing to export „This signal happened at least once“ and another „This signal with all this auxiliary information happened“. For one, collating multiple instances of the first is losing much less information than in the latter case. For another, the first is a single bit of information, here we have much more.

For obvious reasons, having something like a global Mutex<Vec<Info>> doesn’t work (locking, allocations, …). Having a static mutable variable doesn’t either, because the same signal can happen on two different threads at the same time (thank you, POSIX!, you’re not making it any easier).

The good news is, atomic variables work. The less good news is, all the information I wanted to take out was more than 64 bits. And some of the platforms signal-hook has in its CI don’t even have 64bit atomics! So encoding everything into a 64bit number is not the right way.

I’ve had to implement a kind of channel-like thing. It has some preallocated slots to store the information in and few atomic variables around that to make sure two threads never write to the same one and that the reader knows which slots to read things from. They do kind of locking, but instead of blocking, the channel throws items out of the window if it is full. There’s nothing much better it could do, clearly it is not possible to allocate and if the reader end doesn’t keep up, any kind of preallocated capacity will get exhausted eventually.

Also, the signal only provides storage of the data, not any kind of wakeup (but other parts of that crate already supply that).

Reading data out of `signinfo_t`

So the kernel gives us a nice siginfo_t structure we can read any and all the extra information we are interested in. But there’s a catch. The POSIX standard gives a lot of room for interpretation of how that structure should look like.

Citing from the manual page of sigaction:

si_signo, si_errno and si_code are defined for all signals. (si_errno is generally unused on Linux.) The rest of the struct may be a union, so that one should read only the fields that are meaningful for the given signal:

OK, so if I’m reading this correctly, we get this:

si_signo, which is the same as the signal number we already got. So not really useful.
si_errno, which is useless on Linux at least (might be useful somewhere else).
si_code which tells us what caused the signal to happen (was it sent by the kernel, or some process, or was it caused by a timer, …). We actually need that to access some other fields, otherwise they might not be there because of some union style optimization to make the temporary on-stack structure a tiny bit smaller. If we access them and they are not there, it may even be an UB (who knows at this point, it’s interaction of C and Rust, better not even try).

We would be interested in the si_pid and si_uid fields. According to the libc::siginfo_t documentation, there are methods for these (not fields, because of unions). And they are unsafe, apparently because we first have to check the si_code (their unsafety is not explicitly documented, but one can assume).

The `SI_*` constants

Long story short, while Rust’s libc bindings give one access to the si_code field, it doesn’t export the actual constants to know what that value means. While the maintainers are generally open to adding these constants, I don’t have access to all the large number of platforms to figure out what values the constants have on each (yes, they are not the same) and what constants are even available on each of them (for example the SI_KERNEL seems to be available only on Linux).

If you get the idea in the lines of „Well, I’m interested in SIGINT, where else could that come from than from other process“, then it’s worth mentioning that if you press CTRL+C in a terminal, it is the kernel that sends the signal on behalf of that PTY, not the terminal emulator. At least on Linux, you’ll get SI_KERNEL and no PID or UID, therefore that idea is maybe leading to UB.

Furthermore, the si_pid and such methods are not available on all the platforms signal-hook compiles on.

So while the theoretically right way to move forward would be to go over a lot of platforms and extend libc with each of them, it would take far more time than what I have available. As a workaround, I’ve decided to write a minimal extraction code in C (which has access to the actual platform C headers, including the ability to detect if some constant exists by #ifdefs).

As having a working C toolchain available may be a bit of a pain (certainly in some cross-compiling situations), and it adds another dependency (cc), this decoding ability is gated behind a feature flag.

MacOS and the `si_code`

To add insult to injury, apparently on MacOS the si_code field is there, the constants are there, but the kernel does not set them. The field is always 0. As a small compensation, the structure is not an union, so we can read the fields of that structure no matter how the signal happened. In case there’s no information about a process, both these fields are set to 0.

So in an effort to unify these things a little bit, there’s a special case for MacOS to deal with this. Users of signal-hook don’t need to worry about this… well, it probably is technically not a violation of the POSIX standard, the standard is just implemented in a very not useful way. But parts of the information won’t be available on MacOS.

How reliable it is

I’ve done manual testing on Linux, the CI does some automated testing on few platforms (actually checking the value of the extracted PID) and compiles on many more. So it should work.

But because some information is platform specific, generally unknown situations are reported as Unknown and it is possible the kernel or the library consolidated multiple same signals into one, the extracted information should be mostly informative (interesting info in logs, for example). That is, if one gets some information, it should be correct, but it is not guaranteed to always get useful stuff.

The results

Now you can get the information about where the signal came from, something like this:

use signal_hook::consts::TERM_SIGNALS;
use signal_hook::iterator::SignalsInfo;
use signal_hook::iterator::exfiltrator::WithOrigin;

type Error = Box<dyn std::error::Error + Send + Sync>;

fn main() -> Result<(), Error> {
    let mut signals = SignalsInfo::<WithOrigin>::new(TERM_SIGNALS)?;
    for signal in &mut signals {
        dbg!(signal);
    }
    Ok(())
}

With first CTRL+C in terminal, then sending a signal with killall, the output would look something like this (yes, it would be nicer if that signal got translated to something like SIGTERM instead of the number, I’ll see about that in the future):

[src/main.rs:10] signal = Origin {
    signal: 2,
    process: None,
    cause: Kernel,
}
[src/main.rs:10] signal = Origin {
    signal: 15,
    process: Some(
        Process {
            pid: 25540,
            uid: 1000,
        },
    ),
    cause: Sent(
        User,
    ),
}

Conclusion

When dealing with C++ APIs, one expects that C++ can express some things that are not reasonably easy to translate to Rust and may need some kind of C++-side wrapper to make it usable.

It seems C and the POSIX standard has some of these dark corners where the best option is to adapt a similar strategy too. It’s not too hard to pull it off, but it’s a bit extra work and relies on having a C compiler available.