-
Notifications
You must be signed in to change notification settings - Fork 367
CIP-???? | Plutus Script Caching #1031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This looks like a very good idea. Not sure if it should be in the CIP, but I would suggest there should always be some kind of metrics associated with LRU caches, tracking at least:
So one can determine:
I would also suggest a pre-load strategy when the node is first run, so the cache is already warm. But again, it may not be necessary depending on how the cache behaves, so metrics would help one work out if pre-loading the cache is necessary or desirable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to introducing this at the next CIP meeting (https://hackmd.io/@cip-editors/111) and hopefully the Plutus contingent will be there.
Co-authored-by: Robert Phair <rphair@cosd.com>
|
Quick analysis of evaluation of 1200 blocks of epoch 543 shows there were evaluated 504 unique scripts of all versions with total size of (flat serialized) scripts: 1,288,321 bytes. This cache is such an obvious optimization to do that I'm shocked it's never been implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea, but not sure whether this is a CIP?
It's not changing how cardano works. While we could demand such caching for all block producing node implementations because it is a CIP, other performance criteria are not covered by CIPs and some alignment needs to happen anyways in a multi-node implementation world.
|
|
||
| ## Rationale: how does this CIP achieve its goals? | ||
|
|
||
| This proposal follows the same spirit as performance optimizations in other smart contract chains (e.g. Solana's JIT caching model). It does not modify on-chain structures or Plutus semantics — only the **execution engine of node software**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not changing anything on-chain, is it even a CIP?
I created a CIP because I want to socialize this change and its impact to the Cardano developer community, and for it to become a hard requirement for all node implementations. The goal of this CIP is to be a first step towards getting rid of the non-linear scaling of fees based on the size of reference script bytes. Indeed, I could have proposed the reversion of that fee change directly in this proposal, however, given how risk adverse core developers tend to be, I figured it would be better to propose this as a standalone first step, and if accepted and integrated, then a second proposal which reverts the reference script byte fee scaling change with the justification being that this caching mechanism addresses the concerns that the fee scaling change was introduced for. |
|
When we created the If as @colll78 suggests (and I support the idea) this change should be presented for community endorsement, this could be given a category of If there is any evidence that Haskell and/or Amaru node architects would generally be interested in what we used to require as enlisting in the CIP process, it would be helpful to see it here... and also to hear what more of them think about this particular proposal. |
|
@colll78 the CIP meeting today decided to leave this We recognised @ch1bo's #1031 (review) that particular software enhancements don't work well in the CIP process: generally because there's no opportunity for the community itself to confirm or support a "standard" and because such changes are more properly requested, documented & progressed as enhancements for that software project; I mentioned that there could be grounds for including it as a CIP anyway given that Ouroboros improvements are also being documented as CIPs... but @Crypto2099 correctly pointed out that taking this approach on one node implementation wouldn't solve, or even be done the same way, on another (e.g. Rust vs. Haskell). So it appears the "social" value of having this documented in a CIP is problematic since we wouldn't know how to make it into a usable standard. Currently we can't even give it a category (not properly |
|
IMO the CIP process likely needs an overhaul. We discussed this at the node diversity workshop, observing that Ethereum has: EIP - improvements to the protocol itself that all nodes must agree on and adopt (if approved) (There might have been a third one, but I can't remember the details!) So perhaps we need: CPS - generic problem statements that might have a variety of possible solutions The way I would expect Phil's goal to progress through these then is:
In particular, the division between these artifacts is around "level of social consensus needed", which is the division you want when designing a process that is meant to reach and enforce a social consensus. |
|
thanks @Quantumplation ... let's continue that suggestion here: |
|
After discussion with @Quantumplation I'm on board with his approach, I will edit this CIP to remove any concrete suggestions regarding node architecture or implementation details, and instead to provide concrete constraints on the amount of work nodes do for large adversarial payloads targeting reference script deserialization that ensures that the amount of work done by nodes to process such is negligible (can be absorbed into the network) thus we can justify the removal of the fee. Likewise, I will introduce another CIP for the ledger category that proposes that the fee per reference script byte change is reverted once a node is conformant with the first CIP. I already maintain so many CIPs, so if possible, can someone else take the responsibility of drafting the CPS describing why the ref script bytes is highly detrimental and needs a solution to remove. |
|
The issue with getting rid of the reference script fee is that you can still have attacks that maximize cache misses, in which SPOs would do a lot of work but aren't rewarded for it. I'd really like to get rid of this fee (especially that hideous recursive exponential) but the work in the worst-case scenario needs to be rewarded for somehow. My suggestion to avoid this fee back then was to ensure that the operation required to get the script into a runnable state would be really cheap. This is relatively easy to do if you have a data structure without pointers because you can just read it into memory from wherever you store it, but Plutus scripts aren't such a data structure. I suggested a GHC compact region which sounds like a fantastic tool for exactly this purpose but apparently it doesn't quite work for Plutus scripts. I also don't know whether this problem is easy to solve in other languages like Rust (I guess you could just have some binary encoding with 'local pointers' and when reading it in all you do is allocate memory for every indirection & substitute in the pointer to that allocated memory). So I think it is possible in principle to avoid this fee, but you really need to have fast loading of all scripts and not just a cache. At least if you want SPOs to be sufficiently paid for their work in the presence of attacks. |
|
Firstly, I agree with you 100% that we should work to reduce the cost of deserialization / setup time for scripts in the first place, that will benefit us regardless. However, I think we can get rid of the reference script bytes fee with a cache + a high one-time fee for reference script deployment. In Solana for instance, to deploy the equivalent of what would be a 100kb reference script on Cardano, the transaction fee incurred would be 89 SOL or roughly $14,906. On Cardano, instead of charging a lot for reference script deployment, the transaction fee to deploy a ~16kb reference script is roughly 1 ada, and we prevent bloat by requiring a large min-ada deposit to compensate for the low fee, the min-ada required for a UTxO with a ~16kb reference script is roughly ~68 ada. The issue is that users can simply reclaim this ada at any-time by sending the reference script UTxO back to themselves (very low tx fee). Instead, we should make the transaction fee for deploying such large reference scripts be non-trivial. Developers would unilaterally prefer to pay a very high one-time fee to deploy a contract in order to reduce the fee per transaction that uses their contract (reference script byte fee). With the above, it would cost hundreds of thousands of dollars for a malicious attacker to construct malicious payloads to attempt to create cache misses. Another future solution is to allow reference scripts to be submitted as blob transactions and cost those transactions accordingly. Blobs will likely be released in the next year or two and will allow publishing large amounts of data directly to the chain outside of the UTxO set but such that it can still be referenced in the execution layer and accessed by the ledger. |
|
Sadly this doesn't quite work because an attacker doesn't need to deploy scripts themself to cause cache misses. As long as there are enough reference scripts around to exceed the size of the cache + what you need to fill up one transaction the attacker can just cycle through all of them. And since the attack doesn't require succeeding scripts then can all just fail them really quickly by providing Sorry for always being the guy saying that something doesn't work 😅 But I really like these suggestions and if you have more thoughts I'd like to hear them! |
|
There are so many cache replacement policies that it's not obvious why LRU is the best. If your goal is reducing fees, then I'd suggest a random replacement policy, or at least have some degree of randomness. Like @WhatisRT said - any deterministic algorithm would be subject to certain access patterns that make it perform poorly. |
I disagree. Why would the cache store failing scripts? A transaction with a script that fails in phase 2 consumes collateral, which can of-course be costed higher than a legitimate transaction (and should be, because there is no reason any honest actor would ever submit such a transaction). I think to state that this doesn't work, we need more concrete proof, because this is how Solana addresses the exact same problem and allows them to not cost deserialization. They use an LRU cache with a high degree of randomness baked in. If this doesn't work, then the Solana mainnet is vulnerable to the same exact deserialization attack. |
The cache has to store failing scripts because the attack gets even easier if it doesn't. Remember that cache misses are the issue here, and so if I know my scripts aren't going to be cached I can just do many transactions with exactly the same scripts. And yes, you could not cache the scripts and charge a higher collateral that makes up for this difference. That means that you're now moving the complexity from the fee calculation to the collateral calculation. This is a bit nicer for the users because the actual fees are lower, but it doesn't actually make things easier. Another issue is that you can also do this attack with succeeding scripts. I don't know how viable it is, but I'm sure you could write some tooling that finds scripts on mainnet that you can use for your attack. So if this attack is viable and there are enough honest scripts that you can abuse then all of these optimizations won't save you.
Do you have a source for this? I wonder if Solana just has a massive cache size which they can do because they require beefy machines. Randomness also helps a lot here because you need a much larger pool of scripts to choose from to get a large amount of cache misses. And if you only have a couple of cache misses then there's no attack. However, all of this requires benchmarking & simulation. Our standard of proof shouldn't be 'Solana does it so it must be fine'. They have different hardware requirements, different consensus and lots of other differences, so what works for them might not work for us. |
Introducing very high fees for transactions that fail phase 2 validation is in and of itself a good change that encourages the intended behavior of the network. It is impossible for an honest party to submit a phase 2 invalid transaction unwillingly. There are close to zero of such transactions throughout the history of the Cardano mainnet and all of them have been submitted deliberately for testing or otherwise. The "complexity" for collateral calculation should be simple relative to the complexity of fee calculation, because the goal for collateral calculation is just to make it very high, as we know that an honest user should never have their collateral consumed.
I agree that shouldn't be proof that this solution will work for us, my point was that given this is the case, we cannot immediately conclude that this solution will not work for us. It provides an argument for why we should explore this as a potential solution on Cardano. |
|
hello! Cool proposal This proposal firmly seems to be more of a implementation specific detail, I see this in a similar vein to LSM Tree / UTxO-HD, where it is a optimisation that a node implementer may or may not implement. Although I feel this proposal doesn't exactly fit into the current CIP landscape while discussions of CIP process evolution continue |
|
On the topic of whether or not this CIP is "on topic" I would like to say that I've loved the conversation that has already occurred within this thread but, it raises and important point, that being:
Also I do like the idea of creating a standard for I'm not satisfied with @colll78's argument that "Solana does it this way so we should too!" (I know that's reductive and not exactly what he said, don't kill me) but I'm also not convinced by @WhatisRT arguments that this one solution may lead to other issues so it's not worth considering. We've already rather poorly traded one issue (i.e. a DoS vulnerability from the deserialization of reference scripts) with another issue (extremely high transaction costs for transactions that have legitimate need for multiple reference scripts). Finding solutions that can bring the "per tx" fees for end users down are admirable and should be supported so I would love to see how we move this into a "next phase" for testing and simulation to see whether Phil's performance assumptions are true or false. |
|
Ah, sorry, I didn't want to give the impression that some things aren't worth considering. We just need to make sure that we don't allow for attacks of this kind, which requires careful analysis. Let me also suggest a hybrid approach: If the attack becomes harder to execute, we could lower the fees. Maybe a randomized cache combined with some other strategies makes this attack already expensive on the attacker, and then the fees could be lowered to compensate for this. I don't have the time to look into this, but it might be worth a shot if anyone is interested. |
Summary
This CIP proposes introducing a memory-bound, Least Recently Used (LRU) in-memory caching mechanism within
cardano-nodeto store deserialized Plutus scripts. The goal is to avoid redundant deserialization of scripts during transaction and block validation, particularly for frequently reused scripts (e.g., in DeFi or DAO contracts).By caching deserialized
PlutusScriptobjects keyed by their script hash, and reusing them when present, nodes can significantly reduce CPU overhead without changing any ledger rules or validation outcomes. Scripts not found in cache will be deserialized on-demand (cold reload), preserving correctness.This is a runtime-only optimization that applies at the
cardano-nodelevel. It does not require any changes to thecardano-ledgerrepository or protocol parameters, and is fully backwards compatible and deterministic.The proposed cache would be:
Importantly, if you consider the ledger history on mainnet over the past few months, the presence of such a caching mechanism with a bound of ~256MB would have reduced redundant deserializations to near zero. It also vastly increases the cost of spam attacks related to deserialization as you would need to execute unique scripts that are not in the cache which would require you to either submit each unique script as a reference script in an output to the chain (expensive large transactions) or to include the unique script in the witness set (increases the tx size and fee).
Link to rendered proposal