summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorJohannes M Dieterich <johannes.dieterich@amd.com>2019-01-22 18:21:07 -0800
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>2019-01-22 18:23:51 -0800
commit8b49efe86adf4afed395b4afc1165e092a5622af (patch)
treeeb3f5a60c2ca59dd23191d8621d15f8c90c47bce /docs
parentddeaa541aa5ff5c13c8fadf43a15eb142c3be207 (diff)
downloadpytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.gz
pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.bz2
pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.zip
tune elementwise for AMD uarch (#16217)
Summary: Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression. No functional/performance change for CUDA - just shifting numbers into constrexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217 Differential Revision: D13776684 Pulled By: bddppq fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions