vulkan: Use coopmat2 for conv2d#14982
Conversation
|
I upgraded the nvidia driver and the shader compiler and did a quick test. sd2, 512x512. before (w/ prev pr): after: I also noticed that the first run of a specific pipeline seems to take longer. eg fresh after compilation: Any following runs don't look like this. edit: sampling speed is now also faster with conv2d_direct used in the diffusion model. enabled: |
|
perf: before: after: |
|
Looks good: |
7a6b4d0 to
493e61b
Compare
|
The 4096 by 4096 case is unfortunately somewhat slower, however that is a synthetic test so it's not high priority. From #14933: |
|
I have a couple more small changes that get another 10% or so, but haven't matched im2col for that case yet. I'll put those in a separate PR after this merges. |
Stacked on #14933, Draft until that's merged.
I haven't done any perf tuning on this yet, there may still be more perf to get.
Directed perf tests:
stable-diffusion: