summaryrefslogtreecommitdiff
path: root/tools
diff options
context:
space:
mode:
authorRichard Zou <zou3519@users.noreply.github.com>2018-04-27 15:49:58 -0400
committerEdward Z. Yang <ezyang@mit.edu>2018-04-27 15:49:58 -0400
commit932c4c2364884af52609ea8a86c7232a926d958f (patch)
tree9be214819a855a8b306d337b7ae269ac6c86700f /tools
parentc730792d5133f489e368adf2702ef11825e8f503 (diff)
downloadpytorch-932c4c2364884af52609ea8a86c7232a926d958f.tar.gz
pytorch-932c4c2364884af52609ea8a86c7232a926d958f.tar.bz2
pytorch-932c4c2364884af52609ea8a86c7232a926d958f.zip
Prevent stack overflow on deletion of deep graph (#6873)
* Prevent stack overflow on deletion of deep graph Fixes #5534. Sometimes one can end up with a very big computation graph of Functions and Edges. Each std::shared_ptr<Function> contains a list of Edge, and each Edge contains a std::shared_ptr<Function>. Deleting a std::shared_ptr<Function> can trigger the recursive deletion of other std::shared_ptr<Function>'s: this can stack overflow if the graph is deep enough. Here is an example of such a graph: shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function> The solution here is to use a custom deleter with each std::shared_ptr<Function>. The custom deleter keeps track of how many nested deleters it is in. When this number exceeds the maximum allowed depth, the Function* to be deleted are accumulated in a per-thread delete queue and handled by one of the deleters. Example code that could trigger the overflow (set ``depth`` to something > 100000) is below. I also benchmarked the below code before/after the changes to see if there are any significant performance differences. ``` import torch def scope(): depth = 80000 x = torch.randn(9, requires_grad=True) y = x.clone() # build deeply nested computation graph for i in range(depth): y = y + y * 0.000001 %timeit -n 100 scope() 376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) Without changes: 352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` With the change, the above code is 6.8% slower. UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a * WIP * Add custom deleter to PyFunctions created by THPFunction * Address some comments; pick new value * Address some more comments * Add more complicated test; special case the windows depth constant
Diffstat (limited to 'tools')
-rw-r--r--tools/autograd/gen_variable_type.py2
1 files changed, 1 insertions, 1 deletions
diff --git a/tools/autograd/gen_variable_type.py b/tools/autograd/gen_variable_type.py
index c672c447ce..a47c46d422 100644
--- a/tools/autograd/gen_variable_type.py
+++ b/tools/autograd/gen_variable_type.py
@@ -93,7 +93,7 @@ if (compute_requires_grad( ${args_with_derivatives} )) {
""")
ASSIGN_GRAD_FN = CodeTemplate("""\
-grad_fn = std::make_shared<${op}>(${op_ctor});
+grad_fn = std::shared_ptr<${op}>(new ${op}(${op_ctor}), deleteFunction);
grad_fn->set_next_edges(collect_next_edges( ${args_with_derivatives} ));
""")