Skip to content

Conversation

Schultzer
Copy link

@Schultzer Schultzer commented Oct 9, 2025

This PR optimize the beam_trim pass to avoid quadratic slowdown, it does so by leveraging the sets module and ignore the pass for small functions, the 50 value is currently arbitery and could be changed based on empiracal tests.

As this is my first time trying to contribute, please be kind has I have no idea what I'm doing and how this impact other parts of the compiler, but it seams to work on my project :)

From my own testing this shave of almost two seconds in my library and should be more for even bigger projects:

with changes:

mix compile  69.50s user 7.60s system 191% cpu 40.330 total

OTP 28.1

mix compile  72.18s user 7.74s system 194% cpu 41.047 total

master

mix compile  71.73s user 7.77s system 190% cpu 41.663 total

This PR optimize the beam_trim pass to avoid quadratic slowdown, it does so by leveraging the set module and ignore the pass for small functions, the 50 value is currently arbitery and could be changed based on empiracal tests.

From my own testing this shave of almost two seconds in my library and should be for even bigger projects:

with changes:

mix compile  69.50s user 7.60s system 191% cpu 40.330 total

OTP 28.1

mix compile  72.18s user 7.74s system 194% cpu 41.047 total

master

mix compile  71.73s user 7.77s system 190% cpu 41.663 total
@CLAassistant
Copy link

CLAassistant commented Oct 9, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

github-actions bot commented Oct 9, 2025

CT Test Results

Tests are running... https://github.com/erlang/otp/actions/runs/18387655786

Results for commit d03275e

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

  • No CT logs found
  • No HTML docs found
  • No Windows Installer found

// Erlang/OTP Github Action Bot

@jhogberg
Copy link
Contributor

jhogberg commented Oct 10, 2025

Thanks for the PR! 🙂

I’ll have a closer look today, but want to give the quick feedback that the beam_trim pass is especially important for small functions (cf. #2594), and should never be skipped based on size alone.

@jhogberg jhogberg self-assigned this Oct 10, 2025

-import(lists, [any/2,reverse/1,reverse/2,seq/2,sort/1]).
-import(lists, [any/2,reverse/2,seq/2,sort/1,last/1]).
-import(sets, [from_list/1,intersection/2,is_disjoint/2,is_element/2,is_subset/2,union/2,subtract/2,del_element/2,to_list/1]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: our convention in the compiler is to only import the lists module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted this and used gb_sets, although I notice a diffrence in compile time when importing the relevent functions vs not:

With import

mix compile  71.01s user 7.19s system 185% cpu 42.055 total

Without import

mix compile  71.66s user 7.61s system 195% cpu 40.616 total

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like it’s not correct as there is precedent here: https://github.com/erlang/otp/blob/master/lib/compiler/src/v3_core.erl#L93

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files differ, mainly those that haven't been touched for a very long time. If you have a look at git blame you'll see that the related lines were touched some 10 years ago, and some "16 years" which is as far back as the git history goes. In either case the reason we don't want to import is largely because the diffs become very large when any of sets/gb_sets/ordsets are used in the same module since they largely share the same interface. They each have their uses.

As for with/without import, you probably haven't set up your computer fully for benchmarking, and this is something I also noticed in the forum thread. Modern computers vary their processing speed pretty wildly based on near-term usage, temperature, etc. To get accurate results you need to (at least) disable frequency scaling.

@Schultzer
Copy link
Author

Thanks for the PR! 🙂

I’ll have a closer look today, but want to give the quick feedback that the beam_trim pass is especially important for small functions (cf. #2594), and should never be skipped based on size alone.

Thanks for all the feedback, learned something new.

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Oct 13, 2025
UsedEliminated = gb_sets:intersection(gb_sets:from_list(Eliminated), UsedRegs),
case gb_sets:is_subset(UsedEliminated, Moved) andalso gb_sets:is_disjoint(Illegal, UsedRegs) of
true ->
UsedEliminated = Moved, %Assertion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gb_sets can have multiple representations for the same set, so the is_equal/2 function must be used here.

Suggested change
UsedEliminated = Moved, %Assertion.
true = gb_sets:is_equal(UsedEliminated, Moved), %Assertion.

Comment on lines 156 to 165
take_last_dead_1([{live,_}|T], Acc) ->
take_last_dead_1(T, Acc);
take_last_dead_1([{kill,Reg}|T], Acc) ->
{Reg,reverse(T, Acc)};
take_last_dead_1([{dead,Reg}|T], Acc) ->
{Reg,reverse(T, Acc)};
take_last_dead_1([], _) ->
none;
take_last_dead_1([H|T], Acc) ->
take_last_dead_1(T, [H|Acc]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of this function has changed; you're not taking the last kill/dead anymore, but the first, resulting in a much weaker optimization.

Try running scripts/diffable 0 just before the changes in this branch, and then scripts/diffable 1 afterwards (don't forget to make compiler in between). This will create two folders with assembly listings for all of OTP before and after the change. Ideally, when optimizing a compiler pass a comparison of these (diff -u 0 1 | grep ^diff) should only show a change in the assembly listing of the pass you've optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:VM Assigned to OTP team VM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants