Fix bilinear performance regression (#6110)

The current implementation of bilinar uses a matrix multiplication approach. This creates a large intermediate matrix (batch * output dimension * input dimension). Relative to the previous pure python approach, this caused severe performance regression (600ms vs. 18ms for 300x100x200 weights and a batch of 50 on CPU, and also quadratic memory). The attached change restores the performance using the previous strategy of looping over output features. It implements forward, backward, and double backward as native ATen code. Credits: Martin Tutek reported the regression and pinpointed the problem Adam Paszke patiently answered my questions about ATen I would not have been able to prepare this without you, thank you! I referenced the old python implementation, used a python version of the naive implementation, and coded manual functions etc. The tests have gradgradcheck etc. * fix memory use of native bilinear * bilinear double backward * Move bilinear_double_backward to Functions.cpp Addresses review comment by Tongzhou Wang. Thank you! * add WrapDimUtilsMulti.h * start at generic trilinear * move to generic trilinear * catch up on dim_list_to_bitset * switch bilinear to use _trilinear implement _trilinear_backward * add comments to Linear.cpp, move _trilinear in yaml
author: Thomas Viehmann <tv.github@beamnet.de> 2018-04-16 20:41:47 +0200
committer: Edward Z. Yang <ezyang@mit.edu> 2018-04-16 14:41:47 -0400
commit: 40592f91b5f045c02443e9d390491bba7f5dcf46 (patch)
tree: 6a9a9918bc0ea0b81c2591f9cfa38d056f810f32 /tools
parent: 24b49314625e13f98cd761dec939db1d825420d9 (diff)
download: pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.tar.gz
pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.tar.bz2
pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.zip
2 files changed, 15 insertions, 0 deletions
diff --git a/tools/autograd/derivatives.yaml b/tools/autograd/derivatives.yaml
index 1e256c3dda..60e785ad38 100644
--- a/tools/autograd/derivatives.yaml
+++ b/tools/autograd/derivatives.yaml
@@ -688,6 +688,8 @@
   self: not_implemented("_standard_gamma_grad")
 
 # NN
+- name: _trilinear(Tensor i1, Tensor i2, Tensor i3, IntList expand1, IntList expand2, IntList expand3, IntList sumdim, int64_t unroll_dim)
+  i1, i2, i3: _trilinear_backward(grad, i1, i2, i3, expand1, expand2, expand3, sumdim, unroll_dim, grad_input_mask)
 
 - name: binary_cross_entropy_forward(Tensor self, Tensor target, Tensor weight, bool size_average, bool reduce)
   self: binary_cross_entropy_backward(grad, self, target, weight, size_average, reduce)
diff --git a/tools/autograd/templates/Functions.cpp b/tools/autograd/templates/Functions.cpp
index 15fd0ede61..1bf7986f01 100644
--- a/tools/autograd/templates/Functions.cpp
+++ b/tools/autograd/templates/Functions.cpp
@@ -1268,6 +1268,19 @@ std::tuple<Tensor, Tensor, Tensor> batchnorm_double_backward(
 
 }
 
+std::tuple<Tensor, Tensor, Tensor> _trilinear_backward(const Tensor& grad_out, const Tensor& i1, const Tensor& i2, const Tensor& i3,
+						       IntList expand1, IntList expand2, IntList expand3,
+						       IntList sumdim, int64_t unroll_dim, std::array<bool, 3> grad_mask) {
+  Tensor grad_i1, grad_i2, grad_i3;
+  if (grad_mask[0])
+    grad_i1 = at::_trilinear(grad_out, i2, i3, sumdim, expand2, expand3, expand1);
+  if (grad_mask[1])
+    grad_i2 = at::_trilinear(i1, grad_out, i3, expand1, sumdim, expand3, expand2);
+  if (grad_mask[2])
+    grad_i3 = at::_trilinear(i1, i2, grad_out, expand1, expand2, sumdim, expand3);
+  return std::tuple<Tensor, Tensor, Tensor>(grad_i1, grad_i2, grad_i3);
+}
+
 } // anonymous namespace
 
 ${autograd_function_definitions}
author	Thomas Viehmann <tv.github@beamnet.de>	2018-04-16 20:41:47 +0200
committer	Edward Z. Yang <ezyang@mit.edu>	2018-04-16 14:41:47 -0400
commit	40592f91b5f045c02443e9d390491bba7f5dcf46 (patch)
tree	6a9a9918bc0ea0b81c2591f9cfa38d056f810f32 /tools
parent	24b49314625e13f98cd761dec939db1d825420d9 (diff)
download	pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.tar.gz pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.tar.bz2 pytorch-40592f91b5f045c02443e9d390491bba7f5dcf46.zip