diff options
author | Johannes M Dieterich <johannes.dieterich@amd.com> | 2019-01-22 18:21:07 -0800 |
---|---|---|
committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | 2019-01-22 18:23:51 -0800 |
commit | 8b49efe86adf4afed395b4afc1165e092a5622af (patch) | |
tree | eb3f5a60c2ca59dd23191d8621d15f8c90c47bce /docs | |
parent | ddeaa541aa5ff5c13c8fadf43a15eb142c3be207 (diff) | |
download | pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.gz pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.bz2 pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.zip |
tune elementwise for AMD uarch (#16217)
Summary:
Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression.
No functional/performance change for CUDA - just shifting numbers into constrexpr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217
Differential Revision: D13776684
Pulled By: bddppq
fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions