platform/upstream/pytorch - Domain: Machine Learning / ML Framework; Licenses: BSD-3-Clause;

diff options

author	Johannes M Dieterich <johannes.dieterich@amd.com>	2019-01-22 18:21:07 -0800
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>	2019-01-22 18:23:51 -0800
commit	8b49efe86adf4afed395b4afc1165e092a5622af (patch)
tree	eb3f5a60c2ca59dd23191d8621d15f8c90c47bce /docs
parent	ddeaa541aa5ff5c13c8fadf43a15eb142c3be207 (diff)
download	pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.gz pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.tar.bz2 pytorch-8b49efe86adf4afed395b4afc1165e092a5622af.zip

tune elementwise for AMD uarch (#16217)

Summary: Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression. No functional/performance change for CUDA - just shifting numbers into constrexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217 Differential Revision: D13776684 Pulled By: bddppq fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9

Diffstat (limited to 'docs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: