diff options
author | Richard Zou <zou3519@users.noreply.github.com> | 2018-04-27 15:49:58 -0400 |
---|---|---|
committer | Edward Z. Yang <ezyang@mit.edu> | 2018-04-27 15:49:58 -0400 |
commit | 932c4c2364884af52609ea8a86c7232a926d958f (patch) | |
tree | 9be214819a855a8b306d337b7ae269ac6c86700f /tools | |
parent | c730792d5133f489e368adf2702ef11825e8f503 (diff) | |
download | pytorch-932c4c2364884af52609ea8a86c7232a926d958f.tar.gz pytorch-932c4c2364884af52609ea8a86c7232a926d958f.tar.bz2 pytorch-932c4c2364884af52609ea8a86c7232a926d958f.zip |
Prevent stack overflow on deletion of deep graph (#6873)
* Prevent stack overflow on deletion of deep graph
Fixes #5534.
Sometimes one can end up with a very big computation graph of Functions
and Edges. Each std::shared_ptr<Function> contains a list of Edge, and
each Edge contains a std::shared_ptr<Function>. Deleting a
std::shared_ptr<Function> can trigger the recursive deletion of other
std::shared_ptr<Function>'s: this can stack overflow if the graph
is deep enough. Here is an example of such a graph:
shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function>
The solution here is to use a custom deleter with each
std::shared_ptr<Function>. The custom deleter keeps track of how many
nested deleters it is in. When this number exceeds the maximum allowed
depth, the Function* to be deleted are accumulated in a per-thread
delete queue and handled by one of the deleters.
Example code that could trigger the overflow (set ``depth`` to something >
100000) is below. I also benchmarked the below code before/after the
changes to see if there are any significant performance differences.
```
import torch
def scope():
depth = 80000
x = torch.randn(9, requires_grad=True)
y = x.clone()
# build deeply nested computation graph
for i in range(depth):
y = y + y * 0.000001
%timeit -n 100 scope()
376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Without changes:
352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
With the change, the above code is 6.8% slower.
UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a
* WIP
* Add custom deleter to PyFunctions created by THPFunction
* Address some comments; pick new value
* Address some more comments
* Add more complicated test; special case the windows depth constant
Diffstat (limited to 'tools')
-rw-r--r-- | tools/autograd/gen_variable_type.py | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/tools/autograd/gen_variable_type.py b/tools/autograd/gen_variable_type.py index c672c447ce..a47c46d422 100644 --- a/tools/autograd/gen_variable_type.py +++ b/tools/autograd/gen_variable_type.py @@ -93,7 +93,7 @@ if (compute_requires_grad( ${args_with_derivatives} )) { """) ASSIGN_GRAD_FN = CodeTemplate("""\ -grad_fn = std::make_shared<${op}>(${op_ctor}); +grad_fn = std::shared_ptr<${op}>(new ${op}(${op_ctor}), deleteFunction); grad_fn->set_next_edges(collect_next_edges( ${args_with_derivatives} )); """) |