Skip to content

Use LoopVectorization in julia stencil / transpose #543

@haampie

Description

@haampie

What type of issue is this?

  • Bug in the code or other problem
  • Inadequate/incorrect documation
  • Feature request

LoopVectorization.jl usually does a better job than the julia compiler + llvm at unrolling and vectorization. You might want to use it for some of the benchmarks.

For instance on zen2:

$ ~/julia-1.6.0-rc1/bin/julia -O3

(@v1.6) pkg> activate --temp
  Activating new environment at `/tmp/jl_GYGsu9/Project.toml`

(jl_GYGsu9) pkg> add BenchmarkTools, LoopVectorization

julia> using LoopVectorization, BenchmarkTools

julia> r = 3;

julia> n = 1000;

julia> A = zeros(Float64, n, n);

julia> B = zeros(Float64, n, n);

julia> W = zeros(Float64, 2*r+1, 2*r+1);

julia> function do_stencil(A, W, B, r, n)
           for j=r:n-r-1
               for i=r:n-r-1
                   for jj=-r:r
                       for ii=-r:r
                           @inbounds B[i+1,j+1] += W[r+ii+1,r+jj+1] * A[i+ii+1,j+jj+1]
                       end
                   end
               end
           end
       end
do_stencil (generic function with 1 method)

julia> @benchmark do_stencil($A, $W, $B, $r, $n)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     24.744 ms (0.00% GC)
  median time:      24.799 ms (0.00% GC)
  mean time:        24.803 ms (0.00% GC)
  maximum time:     24.948 ms (0.00% GC)
  --------------
  samples:          202
  evals/sample:     1

julia> function do_stencil_avx(A, W, B, r, n)
           @avx for j=r:n-r-1, i=r:n-r-1, jj=-r:r, ii=-r:r
               B[i+1,j+1] += W[r+ii+1,r+jj+1] * A[i+ii+1,j+jj+1]
           end
       end
do_stencil_avx (generic function with 1 method)

julia> @benchmark do_stencil_avx($A, $W, $B, $r, $n)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.234 ms (0.00% GC)
  median time:      3.267 ms (0.00% GC)
  mean time:        3.275 ms (0.00% GC)
  maximum time:     3.452 ms (0.00% GC)
  --------------
  samples:          1527
  evals/sample:     1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions