Age | Commit message (Collapse) | Author | Files | Lines |
|
Summary:
Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression.
No functional/performance change for CUDA - just shifting numbers into constrexpr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217
Differential Revision: D13776684
Pulled By: bddppq
fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219
Differential Revision: D13776742
Pulled By: bddppq
fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9
|
|
Summary:
Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7
Included changes:
- **[dc75285](https://github.com/onnx/onnx/commit/dc75285)**: Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam>
- **[985c8cd](https://github.com/onnx/onnx/commit/985c8cd)**: Fix typo in scan shape inferencing (#1753) <Scott McKay>
- **[ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d)**: remove stale test cases <Lu Fang>
- **[56434bb](https://github.com/onnx/onnx/commit/56434bb)**: Removing experimental ConstantFill op. <Spandan Tiwari>
- **[881c63c](https://github.com/onnx/onnx/commit/881c63c)**: Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji>
- **[0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4)**: Update ConstantOfShape op. (#1744) <Bowen Bao>
- **[ef028e5](https://github.com/onnx/onnx/commit/ef028e5)**: Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang>
Reviewed By: BIT-silence
Differential Revision: D13773962
fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090
|
|
Summary:
Closes #16156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200
Differential Revision: D13747455
Pulled By: mrshenli
fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122
Reviewed By: smessmer
Differential Revision: D13717900
fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121
Reviewed By: smessmer
Differential Revision: D13717899
fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116
Reviewed By: smessmer
Differential Revision: D13717719
fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166
Since we now don't use std::function anymore, we can make kernel registration constexpr again.
Reviewed By: ezyang
Differential Revision: D13738630
fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165
Store kernels as direct function pointers instead of std::function.
Using direct function pointers avoids a performance risk std::function would introduce.
Reviewed By: ezyang
Differential Revision: D13738627
fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066
Don't unwrap and re-wrap but directly pass through the IValues
Reviewed By: ezyang
Differential Revision: D13689037
fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e
|
|
Summary:
address the second future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182
Differential Revision: D13744972
Pulled By: mrshenli
fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92
|
|
partially scriptable
Differential Revision:
D13540278
Original commit changeset: 3768c76a90b0
fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7
|
|
C++ operators (#15429)
Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394
Implementation detail:
Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
HANDLE_TH_ERRORS
static PythonArgParser parser({
"svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
}, /*traceable=*/true);
ParsedArgs<6> parsed_args;
auto r = parser.parse(args, kwargs, parsed_args);
static PyStructSequence_Field fields0[] = {
{"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
};
static PyStructSequence_Desc desc0 = {
"torch.return_types.svd_out", nullptr,
fields0, 3
};
static PyTypeObject type0;
static bool namedtuple_type_initialized0 = false;
if (!namedtuple_type_initialized0) {
PyStructSequence_InitType(&type0, &desc0);
namedtuple_type_initialized0 = true;
}
static PyStructSequence_Field fields1[] = {
{"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
};
static PyStructSequence_Desc desc1 = {
"torch.return_types.svd", nullptr,
fields1, 3
};
static PyTypeObject type1;
static bool namedtuple_type_initialized1 = false;
if (!namedtuple_type_initialized1) {
PyStructSequence_InitType(&type1, &desc1);
namedtuple_type_initialized1 = true;
}
if (r.idx == 0) {
if (r.isNone(3)) {
return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
} else {
auto results = r.tensorlist_n<3>(3);
return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
}
}
Py_RETURN_NONE;
END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.
When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.
In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.
Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.
There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429
Differential Revision: D13709678
Pulled By: ezyang
fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237
Reviewed By: dskhudia
Differential Revision: D13751791
Pulled By: jspark1105
fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799
|
|
Summary:
Initial enabling of the upcoming hip-clang compiler for the PyTorch source base.
Changes:
* update the Eigen submodule to a version including our upstreamed hip-clang enabling there
* modify a few ifdef guards with the `__HIP__` macro used by hip-clang
* use `__lane_id` instead of `hc::__lane_id`
* add Debug flags for ROCm to the cmake infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085
Differential Revision: D13709459
Pulled By: ezyang
fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7
|
|
(#16069)
Summary:
`torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0.
For better debugging it should always raise an error if processes launched have unusual behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069
Differential Revision: D13709467
Pulled By: ezyang
fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20
|
|
Summary:
Save reallocation costs, by reserving vectors according to how many elements we expect to put in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201
Differential Revision: D13762594
Pulled By: ezyang
fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0
|
|
Summary:
Fixed a few C++ API callsites to work with v1.0.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221
Differential Revision: D13759207
Pulled By: yf225
fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340
|
|
Summary:
Prerequisite of https://github.com/onnx/onnx/pull/1434
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214
Reviewed By: BIT-silence
Differential Revision: D13755116
Pulled By: houseroad
fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220
Differential Revision: D13755108
Pulled By: zdevito
fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b
|
|
Summary:
Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun.
The NinjaBuilder stuff is dead. It was used to make building _C.so
faster but now _C.so is just an empty stub file.
Removed a bunch of custom build commands from setup.py that are
no longer meaningful now that cmake handles most of the build.
Removed unused targets in build_pytorch_lib.sh/bat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162
Differential Revision: D13744155
Pulled By: zdevito
fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892
|
|
Summary:
This PR does three things:
~~Allow `int64_t?` in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~
~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~
~~Example:~~
```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
variants: function
```
~~cc: zou3519~~
Edit: implemented in https://github.com/pytorch/pytorch/pull/15234
Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.
Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.
Differential Revision: D13540278
Pulled By: wanchaol
fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd
|
|
(#16190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16190
Previous import was fd60104394fa353e1762f44ecad1b2166e33deef
Included changes:
- **[c553fb3](https://github.com/onnx/onnx/commit/c553fb3)**: Handle negative axis in scan shape inference (#1748) <G. Ramalingam>
- **[51b6ecc](https://github.com/onnx/onnx/commit/51b6ecc)**: external_data: Store large tensor values in separate files (#678) <Michał Karzyński>
- **[ba05f26](https://github.com/onnx/onnx/commit/ba05f26)**: Scan output axes (#1737) <G. Ramalingam>
- **[90920c0](https://github.com/onnx/onnx/commit/90920c0)**: Add NonZero op. (#1714) <Sergii Dymchenko>
- **[c4cf112](https://github.com/onnx/onnx/commit/c4cf112)**: fix the test cases for constantofshape (#1746) <Lu Fang>
- **[d902349](https://github.com/onnx/onnx/commit/d902349)**: Add sample implementation support (#1712) <Lu Fang>
Differential Revision: D13745693
fbshipit-source-id: 05e2cce9ae1dfa2865db83840df64673d55cea57
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175
Separate Moments from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13742472
fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
|
|
Summary:
Addresses one future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150
Differential Revision: D13732299
Pulled By: mrshenli
fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18
|
|
Summary:
Submitting this PR as an update to existing PR (https://github.com/pytorch/pytorch/pull/15938) on houseroad 's request.
This PR replaces the use of ONNX op `ConstantLike` with `ConstantOfShape` in the ONNX exporter. In addition to removing the call sites in `symbolic.py`, it also replace the call site in `peephole.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16095
Differential Revision: D13745723
Pulled By: houseroad
fbshipit-source-id: e2a5f534f01adf199df9e27544f7afcfa540e1f0
|
|
Summary:
Resolves #15923 where LBFGS threw "Error: a leaf Variable that requires grad has been used in an in-place operation."
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16167
Differential Revision: D13745822
Pulled By: soumith
fbshipit-source-id: 7d1d0511d06838c0c6f4c8a6b53cf15193283059
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174
Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done.
An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem.
Reviewed By: jspark1105
Differential Revision: D13708785
fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb
|
|
Summary:
This PR is the prerequisite to land https://github.com/pytorch/pytorch/pull/16095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16108
Reviewed By: BIT-silence
Differential Revision: D13725722
Pulled By: houseroad
fbshipit-source-id: 28c0fb72f075cd04f9db44dfab0163844c20c620
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135
Separate affine_channel from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13727606
fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16065
Before, we registered the caffe2 kernel with the c10 dispatcher using plain C types.
Now, we pass in IValues, which avoids the unwrapping inbetween.
Reviewed By: ezyang
Differential Revision: D13689036
fbshipit-source-id: b976a2c46a5a541f6a926b3df255e8a535e32420
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.
Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`.
Reviewed By: ezyang
Differential Revision: D13684518
fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
|
|
Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.
In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)
The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.
In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924
Differential Revision: D13709576
Pulled By: ezyang
fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
|
|
Summary:
Partial fix for #15804, only w/o dim.
For jcjohnson benchmarking script I'm getting the following results on V100:
Before:
```
unning with N = 10000, M = 10000
cuda (no inverse): 0.98 ms
cpu (no inverse): 0.96 ms
cuda (with inverse): 1.07 ms
cpu (with inverse): 1.76 ms
Running with N = 10000, M = 100000
cuda (no inverse): 0.76 ms
cpu (no inverse): 1.53 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 3.02 ms
Running with N = 100000, M = 100000
cuda (no inverse): 1.28 ms
cpu (no inverse): 11.22 ms
cuda (with inverse): 69.76 ms
cpu (with inverse): 20.28 ms
Running with N = 100000, M = 1000000
cuda (no inverse): 0.78 ms
cpu (no inverse): 18.78 ms
cuda (with inverse): 133.45 ms
cpu (with inverse): 34.09 ms
Running with N = 500000, M = 500000
cuda (no inverse): 1.43 ms
cpu (no inverse): 61.13 ms
cuda (with inverse): 3315.18 ms
cpu (with inverse): 104.57 ms
Running with N = 500000, M = 5000000
cuda (no inverse): 0.86 ms
cpu (no inverse): 96.44 ms
cuda (with inverse): 5209.93 ms
cpu (with inverse): 176.10 ms
```
After
```
Running with N = 10000, M = 10000
cuda (no inverse): 1.04 ms
cpu (no inverse): 0.94 ms
cuda (with inverse): 0.64 ms
cpu (with inverse): 1.76 ms
Running with N = 10000, M = 100000
cuda (no inverse): 0.77 ms
cpu (no inverse): 1.55 ms
cuda (with inverse): 0.58 ms
cpu (with inverse): 2.79 ms
Running with N = 100000, M = 100000
cuda (no inverse): 1.30 ms
cpu (no inverse): 14.15 ms
cuda (with inverse): 1.63 ms
cpu (with inverse): 20.90 ms
Running with N = 100000, M = 1000000
cuda (no inverse): 0.82 ms
cpu (no inverse): 18.63 ms
cuda (with inverse): 0.61 ms
cpu (with inverse): 33.52 ms
Running with N = 500000, M = 500000
cuda (no inverse): 1.51 ms
cpu (no inverse): 59.81 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 110.69 ms
Running with N = 500000, M = 5000000
cuda (no inverse): 0.92 ms
cpu (no inverse): 104.26 ms
cuda (with inverse): 0.84 ms
cpu (with inverse): 187.12 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16145
Differential Revision: D13738821
Pulled By: soumith
fbshipit-source-id: 0811fb4ade47e3b466cebbc124e3f3333a986749
|
|
Summary:
It turns out that clang-tidy is bundled with travis's standard trusty distribution, so no need to install it manually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16164
Differential Revision: D13738986
Pulled By: suo
fbshipit-source-id: d0cd76c615625b2ed7f18951289412989f15849d
|
|
Summary:
Fixes #16019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128
Differential Revision: D13721850
Pulled By: mrshenli
fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16049
We might see the pattern
```
if (scale_.numel() != N) {
scale_->Resize(N);
// set initial value for scale_
}
// In class:
Tensor scale_{CPU};
```
before in the code, where `scale_` is a member variable of Type `caffe2::Tensor`
This pattern actually serves two purposes, if `scale_` is partially initialized with device type but not size, this call will
initialize Tensor with the correct size, or if `scale_` is already initialized with size, it will check whether the size
matches a runtime value `N` and if not it will Resize. To rewrite this we'll do the following:
```
if (!scale_.defined() || scale_.numel() != N) {
ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
// set initial value for scale_
}
```
There are some variants, if `scale_` is resized to a constant size, we can call `ReinitializeTensor` instead
```
if (scale_.numel() != 1) {
scale_->Resize(1);
}
```
-->
```
ReinitializeTensor(&scale_, {1}, at::dtype<float>().device(CPU));
```
Normal Resize will be refactored directly into ReinitializeTensor:
```
scale_->Resize(N);
```
-->
```
ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
```
Reviewed By: dzhulgakov
Differential Revision: D13667883
fbshipit-source-id: 2c7cb61544b72765b594011b99150eb5a1b50836
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118
att
Differential Revision: D13697211
fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16086
[caffe2] RNN operators should inherit step_net device_options
According to NetDef documentaiton, if network has a specific device option it applies to all network operators that do not explicitly specifiy it.
But this does not seem to be the case for RecurrentNetwork operators
Reviewed By: orionr
Differential Revision: D13699552
fbshipit-source-id: 14529bc9504e3b02f763e3c2429be21e46f82b68
|
|
Summary:
Add support for type inference for optional type refinement.
If a conditional is of the form "x is None" or "x is not None", or is a boolean expression containing multiple none checks, the proper type refinements are inserted in each branch.
For example:
if optional_tensor is not None and len(optional_tensor) < 2:
# optional_tensor is a Tensor
if optional_tensor1 is not None and optional_tensor2 is not None:
# both optional_tensor1 and optional_tensor2 are Tensors
TODO:
- not run an op for unchecked unwrap optional in the interpreter
- potentially refine types to prim::None (omitted for now to simply things & because it's not an actual use cause).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15587
Differential Revision: D13733810
Pulled By: eellison
fbshipit-source-id: 57c32be9f5a09ab5542ba0144a6059b96de23d7a
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125
Add defined() method to check whether the Tensor is defined.
Reviewed By: ezyang
Differential Revision: D13719222
fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115
Reviewed By: bddppq
Differential Revision: D13717049
fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547
Differential Revision: D13549495
Pulled By: mrshenli
fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853
|
|
Summary:
Mention that if enforce_sorted=True, the user can set
enforce_sorted=False. This is a new flag that is probably hard to
discover unless one throughly reads the docs.
Fixes #15567
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084
Differential Revision: D13701118
Pulled By: zou3519
fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260
|
|
Summary:
This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it.
This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit). The master rank (who is the server) will always wait until all the ranks to complete before complete itself.
This should fix: https://github.com/pytorch/pytorch/issues/15638
Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage.
I had to make rendezvous test in c10d the world size of 1, since it is a single process code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684
Differential Revision: D13570904
Pulled By: teng-li
fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589
|
|
Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.
This change also identified a few places that misused the copy constructor - those are fixed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416
Reviewed By: Yangqing
Differential Revision: D13524598
fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16132
Differential Revision: D13726816
Pulled By: zdevito
fbshipit-source-id: 26ad70651b0138642ad5240670f5c452018c13a2
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16129
Differential Revision: D13724297
Pulled By: ZolotukhinM
fbshipit-source-id: 24e140bc052c85ef40b928eb84f463d341346a51
|
|
Summary:
This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098
Differential Revision: D13717637
Pulled By: ZolotukhinM
fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15856
They seem to be wrong.
cc zdevito to take a look but I think this is now more correct.
It's weird this didn't cause linker errors. Probably, this functionality isn't used across library boundaries yet.
Reviewed By: dzhulgakov
Differential Revision: D13605257
fbshipit-source-id: 7077ca9027c3ac79a4847ec15ead7ddb28696445
|