-
Notifications
You must be signed in to change notification settings - Fork 3k
optimize beam_trim #10272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
optimize beam_trim #10272
Conversation
This PR optimize the beam_trim pass to avoid quadratic slowdown, it does so by leveraging the set module and ignore the pass for small functions, the 50 value is currently arbitery and could be changed based on empiracal tests. From my own testing this shave of almost two seconds in my library and should be for even bigger projects: with changes: mix compile 69.50s user 7.60s system 191% cpu 40.330 total OTP 28.1 mix compile 72.18s user 7.74s system 194% cpu 41.047 total master mix compile 71.73s user 7.77s system 190% cpu 41.663 total
CT Test ResultsTests are running... https://github.com/erlang/otp/actions/runs/18387655786 Results for commit d03275e To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
Thanks for the PR! 🙂 I’ll have a closer look today, but want to give the quick feedback that the |
lib/compiler/src/beam_trim.erl
Outdated
|
||
-import(lists, [any/2,reverse/1,reverse/2,seq/2,sort/1]). | ||
-import(lists, [any/2,reverse/2,seq/2,sort/1,last/1]). | ||
-import(sets, [from_list/1,intersection/2,is_disjoint/2,is_element/2,is_subset/2,union/2,subtract/2,del_element/2,to_list/1]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: our convention in the compiler is to only import the lists
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted this and used gb_sets, although I notice a diffrence in compile time when importing the relevent functions vs not:
With import
mix compile 71.01s user 7.19s system 185% cpu 42.055 total
Without import
mix compile 71.66s user 7.61s system 195% cpu 40.616 total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like it’s not correct as there is precedent here: https://github.com/erlang/otp/blob/master/lib/compiler/src/v3_core.erl#L93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some files differ, mainly those that haven't been touched for a very long time. If you have a look at git blame
you'll see that the related lines were touched some 10 years ago, and some "16 years" which is as far back as the git history goes. In either case the reason we don't want to import is largely because the diffs become very large when any of sets
/gb_sets
/ordsets
are used in the same module since they largely share the same interface. They each have their uses.
As for with/without import, you probably haven't set up your computer fully for benchmarking, and this is something I also noticed in the forum thread. Modern computers vary their processing speed pretty wildly based on near-term usage, temperature, etc. To get accurate results you need to (at least) disable frequency scaling.
Thanks for all the feedback, learned something new. |
114be98
to
dbbb219
Compare
lib/compiler/src/beam_trim.erl
Outdated
UsedEliminated = gb_sets:intersection(gb_sets:from_list(Eliminated), UsedRegs), | ||
case gb_sets:is_subset(UsedEliminated, Moved) andalso gb_sets:is_disjoint(Illegal, UsedRegs) of | ||
true -> | ||
UsedEliminated = Moved, %Assertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gb_sets
can have multiple representations for the same set, so the is_equal/2
function must be used here.
UsedEliminated = Moved, %Assertion. | |
true = gb_sets:is_equal(UsedEliminated, Moved), %Assertion. |
lib/compiler/src/beam_trim.erl
Outdated
take_last_dead_1([{live,_}|T], Acc) -> | ||
take_last_dead_1(T, Acc); | ||
take_last_dead_1([{kill,Reg}|T], Acc) -> | ||
{Reg,reverse(T, Acc)}; | ||
take_last_dead_1([{dead,Reg}|T], Acc) -> | ||
{Reg,reverse(T, Acc)}; | ||
take_last_dead_1([], _) -> | ||
none; | ||
take_last_dead_1([H|T], Acc) -> | ||
take_last_dead_1(T, [H|Acc]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of this function has changed; you're not taking the last kill
/dead
anymore, but the first, resulting in a much weaker optimization.
Try running scripts/diffable 0
just before the changes in this branch, and then scripts/diffable 1
afterwards (don't forget to make compiler
in between). This will create two folders with assembly listings for all of OTP before and after the change. Ideally, when optimizing a compiler pass a comparison of these (diff -u 0 1 | grep ^diff
) should only show a change in the assembly listing of the pass you've optimized.
This PR optimize the beam_trim pass to avoid quadratic slowdown, it does so by leveraging the sets module and ignore the pass for small functions, the 50 value is currently arbitery and could be changed based on empiracal tests.
As this is my first time trying to contribute, please be kind has I have no idea what I'm doing and how this impact other parts of the compiler, but it seams to work on my project :)
From my own testing this shave of almost two seconds in my library and should be more for even bigger projects:
with changes:
OTP 28.1
master