What is Onion Routing. How does it work?

March 23, 2023

Computer networks communicate with each other with protocols. In this context, a protocol is a system of rules that enforces how messages can be transmitted and interpreted. The Lightning Network’s protocol to transmit payments is described in BOLT #4, and is called “Onion Routing Protocol”.

Onion routing is a technology that precedes the invention of the Lightning Network by 25 years. It’s also used in Tor, hence its name: “The Onion Router”. The Lightning Network uses a slight variation of it called “source-based onion routing”, or SPHINX for short. In this post, we’ll explore how onion routing works.

Why Onion Routing?

There are plenty of different communications protocols available, but since Lightning is a payment network, it makes sense to choose a protocol that reveals the least amount of information about the payments being routed as possible.

If Lightning used the same protocol as the internet, every intermediary would be able to know who is the sender, the receiver, and all intermediaries that routed the payment. Onion routing becomes an appellative choice because its properties ensure that intermediary nodes:

Onion Routing from a high level

Let’s try to understand how onion routing works using an analogy with containers.

Pretend that Alice wants to pay Dina. First Alice finds a viable route for her payment:

Alice → Bob → Chan → Dina

Then, she builds the “onion”. She starts with Dina and works backward. She puts a secret message (the payment) inside a container addressed to Dina and locks it with a key that only Alice and Dina have. Now she picks up the container, puts it inside another container addressed to Chan, and locks it with a key that only Alice and Chan have. She does the exact same thing for Bob.

Alice sends the container to the first intermediary in the route, Bob. Bob uses his key to unlock his container and sees another container addressed to Chan. He proceeds to send Chan’s chest to Chan. Chan does the exact same thing that Bob did and forwards Dina’s chest to Dina. Dina opens up her chest and finds a payment.

With onion routing, intermediaries like Bob and Chan are unaware of the contents of the message to Dina or the length of the payment path. They only know who sent them the onion and who is receiving the onion next. This protects the message's privacy and routing path. Each intermediary can only access the layer of the message that's addressed to them.

In Lightning's source-based onion routing, the sender selects the payment path and builds the entire onion for that path, which can be seen as a privacy vulnerability. Other routing schemes like blinded routing address this issue by obfuscating part of the payment path from the sender. However, in this article, we'll focus on SPHINX.

Assembling the Onion

Now, let's explore the specifics of onion routing. To start, we need to define some terms:

Constructing the Hop Payloads

Once Alice chooses a payment path, she gathers information from each payment channel through the gossip protocol to create the payload for each hop, essentially telling each hop how to set up the HTLC for the payment being routed.

To set up a proper HTLC, each hop will need:

Most of this data comes from the “channel update” message, which contains information about routing fees, timelock requirements, and the ID of the payment channel. The amount to send is the total amount plus the fees accumulated for each hop, and the payment secret is calculated and embedded in Dina’s payment invoice.

Alice starts from the final node, Dina. She adds the amount to forward, the outgoing timelock value, the payment secret, and the total amount of the payment. Notice that she doesn’t add the channel ID because Dina is the final node and doesn’t need to route the payment to anyone else.

It may seem redundant that Dina's amount to forward and the total amount are the same, but this is because multipart payments can split the total amount between multiple routes, in which case the two values will differ.

For Chan’s payload, Alice adds the channel ID for Chan’s channel with Dina. She also adds the amount to forward and the outgoing timelock value. Finally, she does the same for Bob. Chan charges 100 sats to route a payment through his channel with Dina, therefore Alice needs to tell Bob to route the total amount plus the fees. As specified in Chan's channel update message, the outgoing timelock is also increased by 20 blocks. The decrementing timelocks ensure that the HTLC chain is unwound backward in case of a timeout. Finally, Alice will also account for Bob's fees and timelock constraints, forwarding him an HTLC of 100,200 satoshis with the timelock set to 700,040.

Shared Secrets and Key Generation

Next, Alice prepares the onion by generating a shared secret with each hop, including the final node. This shared secret is a value that can be independently generated by both Alice and the hop using their respective private and public keys.

Shared secrets are necessary for onion routing because they allow Alice and each hop to derive the same keys. These keys are then used by Alice to prepare and obfuscate the onion layer, and by the hop to deobfuscate it.

To preserve Alice's privacy, she creates a one-time session key for each onion instead of using her node key to derive the shared secret. She uses this session key for the first hop, and for each subsequent hop, Alice deterministically randomizes the key by multiplying it by a blinding factor. These keys used to create the shared secret are called "ephemeral keys".

Bob, Chan, and Dina also need to arrive at their shared secret with Alice, so they need the ephemeral key used for their session. Alice only puts the first key in the onion to keep the packet size small. Each hop computes the next ephemeral key and embeds it in the onion for the next node. Hops can calculate the blinding factor Alice used with their public key and shared secret, allowing them to determine the next ephemeral key.

As discussed earlier, the shared secrets are used to generate some keys that will be used by both Alice and the correspondent hop to do some operations on the onion Let’s go through each key and see what is used for.

Rho key

The rho key is used by Alice to encrypt an onion layer. This obfuscates the payload and makes it undecipherable by external observers. Only owners of the rho key can decrypt the payload. That’s exactly what the node receiving the onion does: it uses the rho key derived from its shared secret with Alice to decrypt the onion and read the content.

Mu key

Alices uses the mu key to create a checksum for each payload. She gives the checksum to the hop receiving the onion. The hop, in turn, uses the mu key to generate a checksum of the payload it received and check that it matches the checksum Alice gave him. This is done to check the integrity of the payload and verify that it has not been tampered with.

Pad Key

This key is only used by Alice to generate random “junk” data. This data is also part of the onion and it doesn’t matter the size of the payment path or how many hops have already forwarded the onion, the onion will always have the same size, even if some of its contents need to be irrelevant. This is how onion routing hides the total number of hops, effectively protecting the privacy of the sender and receiver.

Um Key

This key is also used to check the integrity of the data contained in the onion, but only when returning errors. And yes, it’s called “um” because it’s “mu” backward. In the case of an error, the hop will create a checksum using the um key, and when the previous node receives back the error, it’ll use the um key to verify the integrity of the message.

Wrapping Onion Layers

The final onion packet looks like the following:

All Alice has right now are the hop payloads and the shared secret to each hop. Let’s see how Alices turns this in the final onion. She starts from the final node and works her way backward.

She begins by creating an empty 1300-byte field, which is the total size of the onion payload. Then she uses the pad key to create a random stream of 1300 bytes. This is essentially junk that will not be used by any hop. This is done to make sure that every onion looks the same and it’s not possible to assume the total number of hops neither the sender nor receiver for the onion.

Then she adds a checksum to the end of the hop payload being processed. In the case of the final Node, Dina, the checksum is all zeros, which signals Dina that she is the intended recipient of the onion. With the checksum appended to the end of the payload, Alice inserts the hop payload into the beginning of the onion and removes the excess junk at the end to keep the onion to the intended size of 1300 bytes.

Alice uses the rho key to create a random byte stream and applies the exclusive or (XOR) operation to the onion payload to generate an obfuscated payload. The original payload can be obtained by XORing the random byte stream and the obfuscated payload. This operation will compare bit-by-bit the onion payload and the random byte stream generated from the rho key and yield 1 in the output only if one of the bits of the input is 1, which obfuscates the payload. The cool thing about the exclusive or operation is that if you take the random byte stream and the obfuscated payload and XOR them, you’ll get the original payload back.

Since the node receiving the onion will derive the same rho key, it can generate the same random stream of bytes that Alice generated. This is how each node along the way can deobfuscate its onion and read the contents.

With the obfuscated onion for the hop ready, Alice will repeat the same process for the next node. The key difference is that after finishing Dina’s onion, she doesn’t need to generate junk data anymore. She’ll just keep stripping the excess junk after appending the intended payload. Here is a GIF of the entire process.

https://www.youtube.com/watch?v=FzedRXqZDyY

To finish up, Alice takes the final obfuscated onion and appends a checksum, so Bob can verify the integrity of the onion. She adds the session public key, so Bob can use the key to compute the shared secret. Finally, she also adds a version byte to signal how other nodes should interpret the data in it. For the version described in BOLT #4, the version byte is zero.

Forwarding the Onion

To send the onion packet, the sender creates a `update_add_htlc` message that contains the following fields:

With the message ready, Alice sends it to Bob. Upon receiving it, Bob can begin to decode his onion. He first grabs the session key from the onion packet and uses it to derive its shared secret with Alice.

From the shared secret, Bob generates the mu key and uses it to verify that the checksum embedded in the onion packet matches the checksum that he computes with the mu key and the onion payload. If the payload has not been tempered, both checksums should match.

To prevent any other node in the route to know how long the path is, Bob appends a 1300-byte-long field to the onion packet, filled with zeros. With the filler data added in, Bob generates a 2600-byte-long stream from the rho key. Bob does the “exclusive or” operation using the generated stream and the onion payload filled with zeros.

Remember that I showed you what happens to the obfuscated onion payload when it is used as input to the “exclusive or” logical operation along with the same byte stream? Since Alice and Bob generate the same rho key from their shared secret, Bob will have the deobfuscated packet after the logical operation. The added bonus for that is that the operation also turns that 1300-byte-long field filled with zeros into random bytes.

The de-obfuscated onion for Bob contains its hop payload data and a fingerprint. Bob save this fingerprint in order to append it to the onion packet he’ll send to Chan. After that he detaches his payload from the onion, returning the packet to its original size of 1300 bytes, and randomizes his session key in the same way Alice did. Finally, Bob appends the version byte, the session key, and the fingerprint in the onion payload he prepared and forwards the onion packet on a `uptade_add_htlc` message to Chan.

This process continues iteratively for each hop until it reaches the final node, Dina. When Dina receives the `update_add_htcl` message she sees a payment hash for her generated secret, this indicates to her that this HTLC is intended for her. Therefore, Dina only checks the fingerprint and deobfuscates the onion to reveal her payload. The GIF below illustrates the entire process.

https://youtu.be/NhHAE6m9L6A

Error Handling

We covered the success case, where everything works fine, but if an error occurs along the way, a message must be propagated backward to inform all nodes of the problems. This works in a similar manner to regular onion routing. The node that discovered an error will derive the um key from the shared secret and use it to create a verification checksum for the packet it’ll send back. It’ll also derive the ammag key and use it to generate a random byte stream that in turn is used to obfuscate the return packet using the “exclusive or” logical operation.

The node that discovered the error will send the message back to the previous node in the payment path. Each hop will make the same operation with the um and ammag keys until the sender receives the packet. Finally, the sender will deobfuscate the packet and verify it using the ammag and um keys respectively.

Errors are usually caused by either a problem in the onion, in the node, or in the channel. If you use the Lightning Network regularly, you might have stumbled upon some of those errors, such as a “channel disabled” or “fee insufficient”.