-
Notifications
You must be signed in to change notification settings - Fork 236
IPIP-501: Amino DHT HTTP Trustless Gateway compatible Provider Records #501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,299 @@ | ||
--- | ||
title: "IPIP-0501: Amino DHT HTTP Provider Records" | ||
date: 2025-04-10 | ||
ipip: proposal | ||
editors: | ||
- name: Guillaume Michel | ||
github: guillaumemichel | ||
url: https://guillaume.michel.id/ | ||
affiliation: | ||
name: Shipyard | ||
url: https://ipshipyard.com | ||
- name: Marcin Rataj | ||
github: lidel | ||
url: https://lidel.org/ | ||
affiliation: | ||
name: Shipyard | ||
url: https://ipshipyard.com | ||
relatedIssues: | ||
order: 501 | ||
tags: ['ipips'] | ||
--- | ||
|
||
## Summary | ||
|
||
This IPIP introduces a secure mechanism for advertising `/tls/http` | ||
multiaddresses in the Amino DHT. HTTP servers are now required to host a text | ||
file at the well-known path `.well-known/libp2p/amino/providers` listing the | ||
libp2p peer IDs of authorized providers. This verification step enables DHT | ||
servers to ensure that only approved providers can advertise content, | ||
mitigating potential DDoS attacks and preventing malicious actors from falsely | ||
asserting that an HTTP server hosts content, all while leaving existing libp2p | ||
records unchanged. | ||
|
||
## Motivation | ||
|
||
Allowing content providers to advertise `/tls/http` multiaddresses within the | ||
Amino DHT is desirable because it broadens the network's interoperability and | ||
accessibility. With the introduction of HTTP retrievals, providers will be able | ||
to serve content from static HTTP hosting providers, such as S3 buckets, and | ||
they should be able to advertise these addresses to the Amino DHT. | ||
|
||
The current protocol already allows providers to choose which multiaddresses to | ||
associate with their records, and DHT servers serve all the addresses along | ||
with the provider record, even if they don’t understand them. Example: when | ||
`/webtransport` was rolled out, DHT servers that did not speak WebTransport | ||
still returned `/webtransport` addresses, despite not being able to use them. | ||
Hence advertising `/tls/http` multiaddresses to the Amino DHT is already | ||
possible. | ||
|
||
However, since `/tls/http` records are expected to be widely adopted by browser | ||
users, it is essential to mitigate potential Distributed Denial-of-Service | ||
(DDoS) attacks on HTTP servers. If any provider can freely associate arbitrary | ||
`/tls/http` multiaddresses with a provider record, a malicious actor could | ||
trigger significant HTTP traffic to a server they don’t control. We want to | ||
restrict `/tls/http` multiaddresses advertisement to hosts controlled by the | ||
provider. This verification would be performed by the DHT servers before | ||
associating the `tls/http` multiaddresses with the provider record. | ||
Additionally, this check would eliminate addresses pointing to misconfigured | ||
HTTP providers. | ||
|
||
This measure prevents HTTP clients (e.g., browser nodes) from being exploited | ||
in DDoS attacks through bogus DHT records. It is essential for integrating IPFS | ||
into browsers, as browser development teams prioritize robust DDoS prevention. | ||
|
||
## Detailed design | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This approach does not address my first issue from #496 (comment) if this is supposed to be "enough" to do trustless-gateway based retrieval.
More info (background, some options, what IPNI does, potential solutions, etc.) is in the linked comment. If we're going to flat out ignore the issue then we should at least document the ramifications / implied spec changes that come from not choosing an explicit mechanism for handling more than 1 HTTP-based protocol There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! The focus of this IPIP is to define the verification mechanism for HTTP Trustless Gateways advertisements in the Amino DHT, to prevent HTTP-only clients from being used in a DDoS (reflection) attack. In parallel, we should have another IPIP defining the handling of multiple transfer protocols in the DHT. The current IPIP (#501) will probably depend on the future IPIP, so it will block on that. |
||
|
||
Providers advertising content hosted on an HTTP server MUST host a text file at | ||
the [well-known location](https://www.rfc-editor.org/rfc/rfc8615) | ||
`.well-known/libp2p/amino/providers`. This file lists the libp2p peer IDs that are | ||
authorized to advertise that HTTP server’s content to the Amino DHT. Each peer | ||
ID MUST follow [string representation from Libp2p PeerID | ||
specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) | ||
(base58btc multihash or CID with libp2p codec), with one peer ID per line: | ||
|
||
``` | ||
12D3KooBase58MH | ||
k51KooBase36CID | ||
``` | ||
|
||
By listing these peer IDs, the HTTP server grants permission for the | ||
corresponding providers to advertise that the server hosts content identified | ||
by any CID. | ||
|
||
When a DHT Server receives an `ADD_PROVIDER` RPC that includes `/tls/http` | ||
multiaddresses, it MUST verify that the provider’s peer ID is listed in the | ||
file located at `.well-known/libp2p/amino/providers` on the advertised HTTP | ||
server, for all `/tls/http` addresses. If the peer ID is not found, the server | ||
Comment on lines
+84
to
+87
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This approach does not address my second issue from #496 (comment).
A particularly easy example of this is that IIUC just about all the work this proposal tries to prevent from happening with To illustrate:
I gave some options in the linked comment, but I suppose another one if you wanted to push the extra burden onto the routing system (rather than to the clients/servers) is to do more checking across addresses and/or to label which ones have been checked vs not. This does mean even more work for DHT servers though (e.g. you're now exposing them to attack vectors where Malory does a little bit of work and the DHT server ends up doing a bunch of work). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal here is not to make libp2p records/maddrs more secure against DDoS (reflection) attacks, nor preventing libp2p nodes from being fooled into participating to such an attack. The goal is to avoid adding another reflection attack vector to the Amino DHT, by preventing HTTP-only clients from being used in such attacks. This is important since we want/hope for a larger number of HTTP-only clients than we have currently of libp2p clients. HTTP-only clients are trustless gateway users in this context. So if this IPIP is adopted, HTTP addresses will have better security guarantees in the Amino DHT than pure libp2p addresses. It is fine to have this inconsistency because 1) the check for HTTP addresses is cheap 2) we expect HTTP-only clients to outgrow libp2p clients 3) verifying libp2p addresses is probably not cost effective (for now).
IIUC trustless gateways don't use
We built on one the possible suggested solutions.
|
||
MUST NOT associate that `/tls/http` address with the provider record. | ||
|
||
DHT Servers SHOULD cache the resolved mapping of each `/tls/http` multiaddress | ||
to its peer IDs for the duration of the `ReprovideInterval` to minimize | ||
repetitive HTTP GET requests. Additionally, for addresses that fail | ||
verification, a negative cache entry SHOULD be maintained for `15` minutes to | ||
reduce unnecessary load and mitigate potential abuse. | ||
|
||
## Design rationale | ||
|
||
* **Lightweight Verification:** Each HTTP server only answers approximately one | ||
GET request per DHT Server per `ReprovideInterval`, regardless of the number | ||
of CIDs being advertised. | ||
* **Revocation Considerations:** If a provider revokes a peer ID, the | ||
previously published records will persist until the next reprovide cycle. Thus, | ||
a cache duration equal to the `ReprovideInterval` is appropriate. | ||
* **Negative Caching:** A 15-minute negative cache prevents malicious actors | ||
from triggering repeated GET requests, as the cost of generating a DHT provide | ||
request is higher than that of performing an HTTP GET, mitigating | ||
amplification attacks. | ||
|
||
### User benefit | ||
|
||
* **HTTP Addresses in DHT Provider Records:** Official support for `/tls/http` | ||
addresses in the Amino DHT. | ||
* **DHT Delegated Provides (HTTP only):** HTTP Servers can delegate their DHT | ||
provide to any libp2p node identified by its peer id. They can later revoke | ||
this permission. | ||
* **DDoS Attack Mitigation:** The Amino DHT cannot be used to start a DDoS | ||
attack of HTTP clients (e.g browser nodes) upon an arbitrary HTTP server. | ||
|
||
### Cost estimation | ||
|
||
For simplicity, we assume that the HTTP content provider is advertising enough | ||
CIDs so that every online DHT server stores at least one associated provider | ||
record. | ||
|
||
Given that there are currently around 10k DHT servers in the Amino DHT | ||
([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#dht-availability-classified-overall-plot)), | ||
the HTTP server is expected to receive roughly 10k GET requests every | ||
`ReprovideInterval`, one from each DHT server. | ||
|
||
Around 300k libp2p clients interact with the Amino DHT on a daily basis | ||
([source](https://web.archive.org/web/20250404174746/https://probelab.io/ipfs/amino/#ipfs-servers-vs-clients-plot)). | ||
Therefore, if an attacker advertises a bogus provider record for a popular CID, | ||
they only need about 3% of these clients to contact the HTTP server in order to | ||
mount an attack that would be more resource-intensive than the countermeasure. | ||
A client trying to fetch content from the targeted server sends one GET | ||
request. | ||
|
||
This analysis only covers current libp2p-based nodes. As more users adopt IPFS | ||
in browsers, the number of nodes that could potentially participate in a DDoS | ||
attack will increase, as will the scale of such an attack. Furthermore, users | ||
of the Delegated Routing HTTP API could also contribute to the attack, even if | ||
they are not DHT clients. | ||
|
||
The cost of the proposed countermeasure seems reasonable compared to the | ||
potential cost of a real DDoS attack. | ||
|
||
### Compatibility | ||
|
||
Nothing changes for existing DHT Servers running an older version. Up-to-date | ||
DHT Servers will make an additional check before associating `/tls/http` | ||
multiaddresses with provider records. Over time, the network will stop | ||
propagating unauthorized HTTP endpoints. | ||
|
||
Providers advertising content with `/tls/http` multiaddresses to the Amino DHT | ||
MUST comply with the described check. We are not aware of `/tls/http` | ||
multiaddresses currently advertised to the Amino DHT, hence no change is | ||
expected from current providers. | ||
|
||
The same verification mechanism could be used by other content routing systems, | ||
such as IPNI. For more control, it is recommended that each content routing | ||
system use a dedicated path, e.g `.well-known/libp2p/ipni/providers` | ||
for IPNI. | ||
|
||
### Security | ||
|
||
In the current Amino DHT implementation, DHT servers do not verify the | ||
multiaddresses included in a provider record when processing an `ADD_PROVIDER` | ||
request. They only allow a node to announce itself as a provider. | ||
|
||
If a malicious libp2p node crafts a multiaddress that pairs its own valid peer | ||
ID with the IP address of another actual libp2p node and advertises that node | ||
as the provider for a particular CID, the client attempting to retrieve the | ||
content will encounter a peer ID mismatch error during the libp2p security | ||
handshake. This fail-fast mechanism prevents misuse in pure libp2p records. | ||
|
||
The challenge arises with HTTP clients because they do not use peer IDs when | ||
fetching content from an HTTP server. As a result, an HTTP connection cannot | ||
fail during the handshake, making it easier for a malicious actor to advertise | ||
an arbitrary peer as the provider for a popular CID. Such misrepresentation | ||
could negatively impact both the client and the HTTP server. | ||
|
||
To prevent this weakening of the system and to stop the DHT from being | ||
exploited as a vector for DDoS attacks using HTTP clients, we introduce an | ||
extra verification step. This step ensures that only authorized libp2p nodes | ||
are allowed to advertise HTTP addresses. With this additional check, DHT HTTP | ||
records will be more reliable and secure than standard libp2p-only records. | ||
|
||
A malicious node could still launch a DDoS attack on an HTTP server by | ||
advertising a libp2p TCP multiaddress, such as `/ip4/A.B.C.D/tcp/443`, as the | ||
provider. This deceptive advertisement might cause other libp2p nodes to | ||
attempt a TCP connection to the HTTP server, with the connection only failing | ||
later. The primary DDoS mitigation goal is to prevent HTTP-only clients from | ||
being drawn into such attacks, since they use `/tls/http` addresses rather than | ||
the unverified libp2p `/tcp/443` addresses. | ||
|
||
Another important consideration is maintaining a secure `CID -> peerid` | ||
mapping. While nodes might still advertise content they do not serve, they must | ||
not be allowed to falsely claim that another node provides a CID. This secure | ||
mapping also supports the potential implementation of a caching layer that | ||
verifies `peerid -> []maddrs` mappings, relying on the trustworthy DHT `CID -> | ||
peerid` mapping. | ||
|
||
In summary, the extra verification for HTTP addresses does not stop nodes from | ||
advertising content they do not possess; it only prevents them from targeting | ||
other nodes by falsely claiming that those nodes provide content they do not | ||
actually host. | ||
|
||
### Alternatives | ||
|
||
#### Do nothing: not verifying `/tls/http` addresses at all | ||
|
||
In its current state, the Amino DHT allows for `/tls/http` provider records. | ||
However, it would be possible for malicious actors to use the DHT as vector of | ||
DDoS attack where numerous HTTP-only clients target a specific HTTP server. | ||
|
||
See [Cost estimation](#cost-estimation) for the rationale why it is better to | ||
do something about it. | ||
|
||
#### Reuse Peer ID Authentication over HTTP | ||
|
||
The [Peer ID Authentication over | ||
HTTP](https://github.com/libp2p/specs/blob/master/http/peer-id-auth.md) | ||
mechanism could potentially be reused, but it presents several significant | ||
drawbacks that render it less practical for HTTP-only IPFS providers. Notably, | ||
it lacks a "server-only" authentication option. While mutual authentication | ||
could be halted after the server responds with an HTTP 401 status and includes | ||
its own PeerID in the HTTP header, this approach introduces notable challenges: | ||
|
||
* It increases complexity, requiring not just a standard HTTP GET request but | ||
also the implementation of a custom Authorization header workflow. | ||
* It restricts the HTTP server to representing only a single PeerID, preventing | ||
the sharding of announcements across multiple PeerIDs and thus making | ||
multi-user storage providers unfeasible. | ||
* It constrains deployment options, requiring the HTTP server to run custom | ||
software, which eliminates the possibility of using static-only hosting | ||
solutions like an S3 bucket. | ||
|
||
#### Generic `.well-known/libp2p/peerid` file | ||
|
||
PeerIDs that the HTTP server has authorized to advertise content to the Amino | ||
DHT could be listed in the generic `.well-known/libp2p/peerid` file. This file | ||
may also be used to delegate content provision requests to other content | ||
routing systems (for example, IPNI), or generally for other applications. | ||
|
||
However, since modifying the DHT protocol is a long and painful process, the | ||
file used by Amino DHT servers for verification MUST remain stable. Any | ||
alteration to the `.well-known/libp2p/peerid` format would require months or | ||
even years for full adoption by DHT servers. In addition, if other applications | ||
begin using this generic file, DHT servers may end up retrieving unnecessary | ||
extra information. | ||
|
||
#### Flat `.well-known/libp2p/amino/providers/{peerid}` empty files | ||
|
||
An alternative approach is to host an empty file for each authorized provider | ||
peer ID at `./well-known/libp2p/amino/providers/{peerid}`. This approach allows | ||
for HTTP HEAD requests instead of GET requests, which is more efficient on the | ||
wire. | ||
|
||
However, this method doesn't support the different [string representation | ||
from the Libp2p PeerID | ||
specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) | ||
and would lead to false negatives if the DHT server looks for another | ||
string representation than the one used on the HTTP server. | ||
|
||
#### Reuse `did:web` Method Specification | ||
|
||
The [did:web Method Specification](https://w3c-ccg.github.io/did-method-web/) | ||
outlines a mechanism for listing one or more ED25519 keys. However, adopting it | ||
presents several challenges: | ||
|
||
* PeerIDs are not simple key fingerprints; they are multihashes derived from a | ||
protobuf structure. | ||
* The method’s JSON manifest must adhere to a specific schema. | ||
* This results in an overly complex JSON format, necessitating additional | ||
processing and conversion, which introduces unnecessary complexity to the DHT | ||
server implementation. | ||
|
||
#### Reuse `.well-known/libp2p/protocols` file | ||
|
||
[Existing libp2p HTTP | ||
specification](https://github.com/libp2p/specs/tree/master/http#namespace) | ||
states that application protocols can be discovered by the well-known resource | ||
`.well-known/libp2p/protocols`. Adding “authorized_peers” field to this file | ||
would allow DHT Servers to dispatch a single GET request to learn about both | ||
PeerID and supported HTTP protocols. | ||
|
||
The downside of this approach is mixing responsibilities of unrelated specs and | ||
use cases, however performance benefit may be worth it. | ||
|
||
## Out of Scope | ||
|
||
* Amino DHT Providing over HTTP | ||
* Amino DHT lookups for HTTP-only Clients | ||
* Amino DHT Delegated Provides for libp2p nodes | ||
* HTTP Provider Records in IPNI | ||
|
||
## Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @aschmahmann as it aims to address concerns from #496
cc @willscott @masih for visibility and feedback from IPNI
cc @vasco-santos @ribasushi @alanshaw from Storacha side of things
Context: this IPIP explores idea of only authorized PeerIDs being able to announce
/tls/http
endpoint on Amino DHT (DHT servers would be ignoring multiaddrs that dont pass this validation), but could also act as blueprint for other routing systems that don't want to be used for amplification attacks.Why we need this and how this relates to Storacha if it does not use DHT?
How does this relate to IPNI or other routing systems?
https://cid.contact
could be mitigated by doing a one-time announcement to Amino DHT. DHT acting as a "hot storage" for routing info, before it gets propagated to systems like IPNI./tls/http
maddrs)