Skip to content

Conversation

@osiewicz
Copy link

@osiewicz osiewicz commented Aug 7, 2025

Objective

This PR adds stacksafe feature. We ran taffy super_deep benchmark with different display kinds and annotated all stack overflow-inducing call-sites with stacksafe::stacksafe.

Changes that will affect external library users must update RELEASES.md before they will be merged.

Context

There was a prior discussion about stacksafe use in #853.

osiewicz and others added 2 commits August 7, 2025 17:55
Co-authored-by: Anthony Eid <hello@anthonyeid.me>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Anthony Eid <hello@anthonyeid.me>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Copy link
Collaborator

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osiewicz
Copy link
Author

osiewicz commented Aug 8, 2025

Oddly enough I could not trigger an overflow with grid layout easily, but yeah, let's stay consistent.

@osiewicz
Copy link
Author

osiewicz commented Aug 8, 2025

I ran benchmarks against taffy without stacksafe enabled:
Grid:

Grid (tl;dr: 8-10% perf regression) grid/wide/31x31/961 time: [656.85 µs 664.80 µs 668.10 µs] change: [−3.3794% −0.2172% +3.4173%] (p = 0.91 > 0.05) No change in performance detected. grid/wide/100x100/10000 time: [9.0808 ms 9.1444 ms 9.1716 ms] change: [−1.4588% +1.1055% +3.6078%] (p = 0.40 > 0.05) No change in performance detected. Benchmarking grid/wide/316x316/99856: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 9.2s or enable flat sampling. grid/wide/316x316/99856 time: [154.95 ms 155.43 ms 156.40 ms] change: [+2.7604% +3.2565% +3.8323%] (p = 0.00 < 0.05) Performance has regressed. grid/deep/2x2/1024 time: [1.6690 ms 1.6836 ms 1.6905 ms] change: [+7.0525% +8.3548% +9.6254%] (p = 0.00 < 0.05) Performance has regressed. grid/deep/3x3/6561 time: [9.1765 ms 9.2361 ms 9.2734 ms] change: [+6.2477% +7.4270% +8.6247%] (p = 0.00 < 0.05) Performance has regressed. grid/deep/2x2/16384 time: [34.196 ms 34.301 ms 34.448 ms] change: [+6.3845% +7.2430% +8.0459%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild grid/superdeep/1x1/100 time: [175.93 µs 179.93 µs 182.26 µs] change: [+10.696% +12.494% +14.303%] (p = 0.00 < 0.05) Performance has regressed. grid/superdeep/1x1/1000 time: [1.8662 ms 1.8758 ms 1.8812 ms] change: [+9.9307% +10.608% +11.334%] (p = 0.00 < 0.05) Performance has regressed.
Flexbox (tl;dr: >10% perf regression) yoga 'huge nested'/Taffy 0.7 /10000 time: [2.4040 ms 2.4058 ms 2.4077 ms] change: [+11.913% +12.043% +12.166%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe Wide tree/Taffy 0.7 (2-level hierarchy)/10000 time: [3.9141 ms 3.9206 ms 3.9266 ms] change: [+9.9228% +11.393% +12.883%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 10 measurements (20.00%) 2 (20.00%) low severe Deep tree (auto size)/Taffy 0.7 (12-level hierarchy)/4000 time: [2.2935 ms 2.3227 ms 2.3361 ms] change: [+14.057% +16.311% +18.752%] (p = 0.00 < 0.05) Performance has regressed. Deep tree (auto size)/Taffy 0.7 (14-level hierarchy)/10000 time: [5.9560 ms 5.9848 ms 5.9957 ms] change: [+13.704% +15.171% +16.676%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 10 measurements (20.00%) 2 (20.00%) low mild Benchmarking Deep tree (random size)/Taffy 0.7 (12-level hierarchy)/4000: Collecting 10 samples in estimated 5.0668 s (2530 iterationDeep tree (random size)/Taffy 0.7 (12-level hierarchy)/4000 time: [1.5994 ms 1.6166 ms 1.6241 ms] change: [+3.1367% +7.1762% +11.125%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking Deep tree (random size)/Taffy 0.7 (14-level hierarchy)/10000: Collecting 10 samples in estimated 5.2958 s (935 iterationDeep tree (random size)/Taffy 0.7 (14-level hierarchy)/10000 time: [4.3597 ms 4.3754 ms 4.3847 ms] change: [+8.5868% +9.6264% +10.751%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild super deep/Taffy 0.7 /100 time: [196.86 µs 198.73 µs 199.47 µs] change: [+15.924% +18.632% +21.418%] (p = 0.00 < 0.05) Performance has regressed.

That's a bit yikes (even though we at Zed are likely to roll with it). I wonder if all of the callsites need the stacksafe annotation and if not, whether reducing them would reduce the perf impact of stacksafe on taffy's performance.

Copy link
Collaborator

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some principled reasoning for where to include the annotations:

As a starting point, we want to consider the functions here: https://docs.rs/taffy/latest/taffy/compute/index.html. Looking at those:

  • compute_flexbox_layout
  • compute_grid_layout
  • compute_block_layout
  • compute_leaf_layout

Should definitely have the annotation. One of these functions gets called each time we recurse into a child node.

  • compute_root_layout doesn't need it. That only gets called once for the very root node of the tree.
  • compute_cached_layout doesn't need it because it is redundant with the layout-mode specific functions above. And we prefer to put it on the other methods to speed up cache access.
  • compute_hidden_layout maybe needs it? This also recurses down the tree but with a very minimal function that doesn't use a lot of stack space: https://docs.rs/taffy/latest/src/taffy/compute/mod.rs.html#266-278
  • round_layout - The inner round_layout_inner should probably have it (you already have this).
  • print_tree - The inner print_node should probably have it

/// Attempts to find a cached layout for the specified node and layout inputs.
///
/// Uses the provided closure to compute the layout (and then stores the result in the cache) if no cached layout is found.
#[cfg_attr(feature = "stacksafe", stacksafe::stacksafe)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing from here might help perf, because this function gets called just prior to Taffy reading from it's cache. It's also probably not necessary here, because:

  • If there's a cache hit then we will immediately return back up the stack
  • If there's a cache miss then we will recurse into another function which will be annotated

}

#[inline(always)]
#[cfg_attr(feature = "stacksafe", stacksafe::stacksafe)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also remove here for same reason as above. This is just prior to hitting the cache, and we'd like to only run this check if we get a cache miss to keep cache access fast.

}

/// Computes the layout of [`LayoutBlockContainer`] according to the block layout algorithm
#[cfg_attr(feature = "stacksafe", stacksafe::stacksafe)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this and perform_final_layout_on_in_flow_children aren't necessary, but are probably harmless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants