Sateless Tor relay

Posted on Oct 7, 2025

What we did

Started with a need

Physical attack, have physical infrastructure in private place in $locatin
threat model: physical ceasure, tampering of hardwarde. Very specific.
started to investigate, ended up in System Transparency project
we should do something diskless, it’s also cheap
- no disk, no ceasure, no logs; better when dealing with police
- fortunately (or unfortunately) didn’t test with police yes
- nodes are running
- so far only received copyright abuse
- but we stress tested a senario where police would come, so we’re ready

We can split the problem

some problems not specific of stateless relay. E.g. configuration
- distrbuting config
- can be shared with regular relays
- establish secure connection between our nods/relays and configurators (laptops, network, etc)
- common issues
specific for stateless relay
- how i can identify a machine that has state/config that belongs to that machine
- this is the main problem: give identity

in our usecase, thought:

good idea to have pull based config

amazing ansible layer, easy to use

but sometimes tricky to push everything, esp. on new node
so decided: just ship a binary/image of operating system; and on that binary put everything
we also needed to discover hardware specs, how much memory, etc. to know how much tor instances to put on the nodes
need ping pong with config server

The ping pong

boot process with system transparency
suddently you have a machine that boots in network
- talks to TPM, it’s the only persistent memory
  - boot 1: generate tpm keys
  - push fingerprint to server
  - idea: we trust the first boot/configuration/fingerprint
  - from that moment on: exchange of certificate, so we can do mutual TLS between node and server that stores configuration
- node discovers its hardware, says to server “i have this network, mem capabilities etc”. Server builds a list of Tor instances (per instance configuration, mainly IP address. Share things like family, exit policy, etc)
  - Push back to the node, node applies configuration
  - then runs the real instances
  - also doing some firewall stuff, not as interesting

(many ways one could have done this – another way: have just put hook in server and use ansible. Lot’s of ways it could have been done.)

In configuration, probably sensitive stuff? No offline signing of the whole config document because it’s generated. What happens if config server is hacked? Or you sign sensitive parts separetly etc?

we combine
long term key of node is generated on node itself encrypted with tpm, so the interesting part is only the real node can decrypt the tpm key. So if server is comrpomised, no one can steal the long term key of the node. And also easy to do revocation

compromise config server -> you can crash/dos. But you can’t steal the identity and run with those identities.

Most important things for us:

long term keys and identities; should only be available for the own physical hardware. Configuration, yes important. But the DoS attack is not so bad.

Pull based because: we don’t configurator to have credentials to all machines.

A configurator can just update configuration / change exit policy.

To chanfe configuration right now: requires rebooting the node.

stateless, in our use case, think it is good that everytime you reboot its something new. And a way to check that everything is working correctly.

where is the tor binary, how you do sw updates?

we build a debian image with tor repository
maybe in future could be interesting ot have alipine or something very small (small attack surface)
for updates, we have unattended updates configured in the host. If we think there’s something important to update, we rebuild the image and reboot.
so it’s an actual debian computer with systemd etc?
- yes

what is system transparency?

at the end, basically pixie boot. You want to boot something from network. But it’s hard to validate image and stuff when you do tha tnormally. But ST does that for us, so we can easily pixie boot with ST.
rebuilding the image means assemble the debian image with all the new software, and update it on our server where it’s stored. So update everything means build a new image. This is also why going to alpine etc could be interesting (also arti). Because less operating system depdencies, the easier it wil lbe to build 100 MB image and just ship the binary.

auto update is not running on image that runs tor?

system can update, because it’s a live system in memory. So can live patch binary. But if you reboot, you reset those updates so you need to maintain your static image. And that’s done out of band by a human like bic. But could be done in CI, can be improved in many different ways.

Comment: these things will eventually lead to us being able to toy around with remote attestation.

Our experience

this is open to broader discussion
we have lots of problems we can share with operators community
some problems we should fix
we would like easier way to configure and apply configuration; and distribute configuration. E.g., would be great to have tor daemon waiting for something machine readbale (e.g. json). Throw it to daemon. And say: this is now your configuration.
Comment: this exists now in tor already! Via control port.

Questions/Answers/Discussion

question: reboot part

tor is giving out flags based on uptime. So if suddenly all the relays have to reboot (just for tiny config update). Then might mess with flag assignment
e.g. won’t get stable flag if it reboots too often to update
ack – we could update config dynamically too. Not hard to extend it.
- boot first time, backup keys. Then run ansible on your running nodes. You can easily do that. Not a problem!

want something easy to manage/maintain

ship binary with configuration already inside?

comment: walking onions

variant of tor where tor nodes doesn’t need to have entire view of network locally; but guard can prove its doing the right thing
once we’re there: would like arti and tools (like pluggable transport) that can run on smaller hardware.
but this is def. on the radar
Comment: tls local certificate authorities to mutually authenticaite cattles on a farm; and onion services would be perfect for this.
Comment: iot device, open port…annoying. with onion service so easy - here’s the address! Think this can be a big use of tor.

we’re encrypting key with tpm and backing that up because we didn’t integrate using the key directly with tor via the tpm. This could be improved!

what would it take to lift out knowledge like secret key – abstract it away instead of having secret key have a key id – and then say “hey please sign this doc or do DH”. Lots of places in tor it is syncrhous without returning to main loop. So would require lots of refactoring. Have a branch where we tried to build this vault. Injecting key material into tor. But haven’t dared to finalize. In arti…down the line —something something key store—looks like this kinda stuff can maybe be supported.

tpm library/interface is poor, warning!
Comment: and if you don’t understand the docs, it’s not you it’s the docs!

Comment: few ppl, snow flake project, needed bridge some of the same keys on multiple nodes to solve scaling issues. You’re running a cluster with a number of keys. We hardcode the onion key. 12 copies of the same … running on the same machine. The fix is they freeze the file so tor can’t change it. Tor get upset and you ignore that in the logs.

Perfect definition of a hack – it works!
But an interesting case where we cud benefit from such an implementation in another way

If you restart before UTC midnight, you will have lots of clients that can’t connect through you?

everything we do with ctor now, we try to solve it as easy as possible. With e.g. voting transparency. What we could do in ctor is we have a config nob for “i want to manage my key myself”. And externally have a tool that can derive that from the TPM.
tolleration time to have old key and new key for some time (overlap, some clients have the old and some the new)
- externally generated keys helpful – yes pls!

tor started using ed25519 before it was on the radar on ietf.

everyone used same sig/pubkey structure. But secret key has different vars. And tor use the one that is not stadnard, that ietf did not standardize. Has given us all kinds of problems. but we have locked ourselves into this setup. We have this blinding mechanism that onion services use.
Argues how you can use long term key for other things, happy family key, etc.
So we have a bit of an issue everytime its an ed25519 key. With RSA keys easier because it’s completely standard.

happy family secret key

do you need to stick in the tpm and keep it over time?
your private key for relay is in memory? so if you get it from configurator and never write it to disk, then you get the secrety properties you want
but then configurator knows the key?
- no, you can use tpm for storing the secret part of the server.
- i.e. seal with tpm
- (we backup encrypted by the node’s tpm, stored on configurator. It just stored the encrypted blob and that’s it.)

higher level question

if we give ppl access to our infra. What do we do with family settings?
in your situations if you let other ppl host relays, think that is nice. But I would not set them in my family setup in yours, but they would be in extended family which is setup on ip ranges. So if you’re in the same prefix, algorithmically they would be considered to be in the same family.

family

idea is a single human have control of the keys/computers.

With System Transparency can verify that something we like is running.

Comment: learn about Tor ramdisk project from gentoo person?

maintained images to run tor relays in a ramdisk. Didn’t do TPM or anything, just made a relay without a disk.

If you have the System Transparency idea, implemented somehow. A benefit:

anyone can verify what is running on that piece of hardware
including me, i can look at your customers/users and see they are running the software they (or you) claim they are running. Tor project could also run a monitor to see this.
what can be verified?
- you can verify everything, the state of the system
- so long term: maybe family might need to express better who is controlling what, and who can verify what’s running
is this correct
- what you’d do is you take the know info (what port it listens on and so on)
- then compute some kind of hash
- then together with image, tor image, binaru, etc
- then ask tpm: is this what’s running?
- yes kinda, but depends on how you implement
- you measure into tpm the hash of the bootloader
- bootloader can measure hash of the kernel, kernel cmd
- and kernel could measure a root file system that will be used once it goes into user mode and starts systemd
- by induction: you don’t have sshd or similar that can state the state, you’re just running one tor relay. Modulo bugs in tor: you can reason aobut what state it is in. You can ask tpm to sign the state with its key. And if you trust the tpm key, you can basically ask: what’s the state. Then you compare the hash produced by the tpm, and compare it to a hash you build reproducibly.
- so: relay operator $foo provide a recipe for how to reproducibly build, and then anyone can recompute the expected hash and see if that’s what the tpm arrived at
- i.e. the hash the tpm arrives at can be recomputed without executing
- okay got it – that’s crazy nice!
- note: tor isn’t measuring config right now, but it could.

it sounds similar to what we do in nixos, can rebuild everything.

but then you can’t have keys included, since you can’t rebuild that
i.e. you have to exclude measurements of secret keys

Question: how can we be most useful for you people?

onion key via TPM?
this more general topic, transparency and tpm stuff. How much is a good idea to push those things inside tor core itself, and how much we should try to build external tooling. We don’t want to push too much things in tor.
- ctor: we try to do as little as possible, but if something is blocking we’re trying to come up with very simple solutions. So onion key would fall into that category. Often with warnings “don’t use this unless you’re one of the three people that really get it”

guest blog post at tor project, where you present what you and the community is doing with this? Would you be interested?

yes
comment: presentation at relay ops meet was fantastic

would like to have the full system transparency employed. Took long time for us to write TPM, ST docs have been great. But still no quick explanation to tell ppl how it works, easily. So someone that doesn’t have all knowledge can get it.

goal of blog post: list open problems, and the alternatives?

would be nice to ecnourage ppl to build different designs of this

comment: one of the hard parts is: many different ways to do it. “We did this experiement”. Make ppl understand they can do it the same way, but they could also do it in a different way.

think we could probably do an easier python poc of this. Now easy to deploy rust binary. Everything is simple for us to maintain. But for a public blog post, would be nice to have a side poc project python based you can play with more easily. And can be used as reference docs more easily.
comment: st folks also interested to help with this kinda stuff, and we have resources for this kinda things.

comment: remote attestation is very interesting!

arti, we’re talking about having images for onion services. Maybe overlap.
would be cool to remote attestate when connecting to an onion service