summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-01-22tune elementwise for AMD uarch (#16217)Johannes M Dieterich2-8/+18
Summary: Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression. No functional/performance change for CUDA - just shifting numbers into constrexpr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217 Differential Revision: D13776684 Pulled By: bddppq fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9
2019-01-22fix typo in resnet50_trainer.pyrohithkrn1-1/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219 Differential Revision: D13776742 Pulled By: bddppq fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9
2019-01-22Automatic update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70aLu Fang85-84/+84
Summary: Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 Included changes: - **[dc75285](https://github.com/onnx/onnx/commit/dc75285)**: Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam> - **[985c8cd](https://github.com/onnx/onnx/commit/985c8cd)**: Fix typo in scan shape inferencing (#1753) <Scott McKay> - **[ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d)**: remove stale test cases <Lu Fang> - **[56434bb](https://github.com/onnx/onnx/commit/56434bb)**: Removing experimental ConstantFill op. <Spandan Tiwari> - **[881c63c](https://github.com/onnx/onnx/commit/881c63c)**: Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji> - **[0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4)**: Update ConstantOfShape op. (#1744) <Bowen Bao> - **[ef028e5](https://github.com/onnx/onnx/commit/ef028e5)**: Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang> Reviewed By: BIT-silence Differential Revision: D13773962 fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090
2019-01-22Add default_stream() and enhance current_stream() (#16200)Shen Li3-7/+107
Summary: Closes #16156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200 Differential Revision: D13747455 Pulled By: mrshenli fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec
2019-01-22complex_registration_extension.cpp includes to angled bracketsEdward Yang1-8/+8
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122 Reviewed By: smessmer Differential Revision: D13717900 fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b
2019-01-22Remove ATen/Allocator.h forwarding header.Edward Yang10-11/+9
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121 Reviewed By: smessmer Differential Revision: D13717899 fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff
2019-01-22Remove dead curVal store.Edward Yang1-1/+0
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116 Reviewed By: smessmer Differential Revision: D13717719 fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e
2019-01-22Make kernel registration constexpr again (#16166)Sebastian Messmer1-5/+5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166 Since we now don't use std::function anymore, we can make kernel registration constexpr again. Reviewed By: ezyang Differential Revision: D13738630 fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387
2019-01-22Avoid closure around kernel (#16165)Sebastian Messmer21-92/+78
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165 Store kernels as direct function pointers instead of std::function. Using direct function pointers avoids a performance risk std::function would introduce. Reviewed By: ezyang Differential Revision: D13738627 fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0
2019-01-22Pass IValues from JIT to c10 dispatcher (#16066)Sebastian Messmer1-46/+55
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066 Don't unwrap and re-wrap but directly pass through the IValues Reviewed By: ezyang Differential Revision: D13689037 fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e
2019-01-22Release GIL when synchronize or wait (#16182)Shen Li3-3/+65
Summary: address the second future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182 Differential Revision: D13744972 Pulled By: mrshenli fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92
2019-01-22Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique ↵Wanchao Liang13-211/+58
partially scriptable Differential Revision: D13540278 Original commit changeset: 3768c76a90b0 fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7
2019-01-22Return namedtuples from torch.* function with multiple return arguments for ↵Xiang Gao18-41/+238
C++ operators (#15429) Summary: Partially fixes: https://github.com/pytorch/pytorch/issues/394 Implementation detail: Codegen is modified to generate codes that looks like below: ```C++ static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)", }, /*traceable=*/true); ParsedArgs<6> parsed_args; auto r = parser.parse(args, kwargs, parsed_args); static PyStructSequence_Field fields0[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc0 = { "torch.return_types.svd_out", nullptr, fields0, 3 }; static PyTypeObject type0; static bool namedtuple_type_initialized0 = false; if (!namedtuple_type_initialized0) { PyStructSequence_InitType(&type0, &desc0); namedtuple_type_initialized0 = true; } static PyStructSequence_Field fields1[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc1 = { "torch.return_types.svd", nullptr, fields1, 3 }; static PyTypeObject type1; static bool namedtuple_type_initialized1 = false; if (!namedtuple_type_initialized1) { PyStructSequence_InitType(&type1, &desc1); namedtuple_type_initialized1 = true; } if (r.idx == 0) { if (r.isNone(3)) { return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2))); } else { auto results = r.tensorlist_n<3>(3); return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2])); } } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called. When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`. In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue. Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs. There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429 Differential Revision: D13709678 Pulled By: ezyang fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf
2019-01-22Fix formating in caffe2/quantization/server/README.mdJongsoo Park1-1/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237 Reviewed By: dskhudia Differential Revision: D13751791 Pulled By: jspark1105 fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799
2019-01-22hip-clang enablement (#16085)Yaxun (Sam) Liu7-5/+10
Summary: Initial enabling of the upcoming hip-clang compiler for the PyTorch source base. Changes: * update the Eigen submodule to a version including our upstreamed hip-clang enabling there * modify a few ifdef guards with the `__HIP__` macro used by hip-clang * use `__lane_id` instead of `hc::__lane_id` * add Debug flags for ROCm to the cmake infrastructure Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085 Differential Revision: D13709459 Pulled By: ezyang fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7
2019-01-22Raise CalledProcessError when torch.distributed launch process not return 0 ↵Andy Wei1-0/+3
(#16069) Summary: `torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0. For better debugging it should always raise an error if processes launched have unusual behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069 Differential Revision: D13709467 Pulled By: ezyang fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20
2019-01-22Reserve vectors that we know the size in advance for. (#16201)Shahzad Lone1-1/+3
Summary: Save reallocation costs, by reserving vectors according to how many elements we expect to put in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201 Differential Revision: D13762594 Pulled By: ezyang fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0
2019-01-21cpp doc fix (#16221)Will Feng2-9/+7
Summary: Fixed a few C++ API callsites to work with v1.0.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221 Differential Revision: D13759207 Pulled By: yf225 fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340
2019-01-21Move away from ConstantFill (#16214)Lu Fang3-24/+19
Summary: Prerequisite of https://github.com/onnx/onnx/pull/1434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214 Reviewed By: BIT-silence Differential Revision: D13755116 Pulled By: houseroad fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879
2019-01-21ban conv_double_backward from sandcastle, it takes too longZachary DeVito1-0/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220 Differential Revision: D13755108 Pulled By: zdevito fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b
2019-01-21Remove dead code from setup.py, remove need for build target. (#16162)Zachary DeVito5-506/+192
Summary: Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun. The NinjaBuilder stuff is dead. It was used to make building _C.so faster but now _C.so is just an empty stub file. Removed a bunch of custom build commands from setup.py that are no longer meaningful now that cmake handles most of the build. Removed unused targets in build_pytorch_lib.sh/bat Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162 Differential Revision: D13744155 Pulled By: zdevito fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892
2019-01-21Unhide unique from C++, make unique partially scriptable (#15256)Xiang Gao13-58/+211
Summary: This PR does three things: ~~Allow `int64_t?` in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~ ~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~ ~~Example:~~ ```yaml - func: myop(Tensor self, int64_t? dim=None) -> Tensor variants: function ``` ~~cc: zou3519~~ Edit: implemented in https://github.com/pytorch/pytorch/pull/15234 Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`. Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user. Differential Revision: D13540278 Pulled By: wanchaol fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd
2019-01-21Automatic update of fbcode/onnx to c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 ↵Lu Fang2-0/+1
(#16190) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16190 Previous import was fd60104394fa353e1762f44ecad1b2166e33deef Included changes: - **[c553fb3](https://github.com/onnx/onnx/commit/c553fb3)**: Handle negative axis in scan shape inference (#1748) <G. Ramalingam> - **[51b6ecc](https://github.com/onnx/onnx/commit/51b6ecc)**: external_data: Store large tensor values in separate files (#678) <Michał Karzyński> - **[ba05f26](https://github.com/onnx/onnx/commit/ba05f26)**: Scan output axes (#1737) <G. Ramalingam> - **[90920c0](https://github.com/onnx/onnx/commit/90920c0)**: Add NonZero op. (#1714) <Sergii Dymchenko> - **[c4cf112](https://github.com/onnx/onnx/commit/c4cf112)**: fix the test cases for constantofshape (#1746) <Lu Fang> - **[d902349](https://github.com/onnx/onnx/commit/d902349)**: Add sample implementation support (#1712) <Lu Fang> Differential Revision: D13745693 fbshipit-source-id: 05e2cce9ae1dfa2865db83840df64673d55cea57
2019-01-20Separate Moments from math and optimize it (#16175)Xiaomeng Yang17-690/+599
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175 Separate Moments from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13742472 fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
2019-01-19Unify device() return type in Stream, Event, and Tensor (#16150)Shen Li6-36/+76
Summary: Addresses one future work item in #15937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150 Differential Revision: D13732299 Pulled By: mrshenli fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18
2019-01-19Replace use of ConstantLike with with ConstantOfShape (#16095)Spandan Tiwari7-9/+201
Summary: Submitting this PR as an update to existing PR (https://github.com/pytorch/pytorch/pull/15938) on houseroad 's request. This PR replaces the use of ONNX op `ConstantLike` with `ConstantOfShape` in the ONNX exporter. In addition to removing the call sites in `symbolic.py`, it also replace the call site in `peephole.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16095 Differential Revision: D13745723 Pulled By: houseroad fbshipit-source-id: e2a5f534f01adf199df9e27544f7afcfa540e1f0
2019-01-19Fix LBFGS issue (#16167)Miro Furtado1-0/+2
Summary: Resolves #15923 where LBFGS threw "Error: a leaf Variable that requires grad has been used in an in-place operation." Pull Request resolved: https://github.com/pytorch/pytorch/pull/16167 Differential Revision: D13745822 Pulled By: soumith fbshipit-source-id: 7d1d0511d06838c0c6f4c8a6b53cf15193283059
2019-01-19Allow for concurrent quantization in FullyConnectedDNNLowPOp (#16174)Kjell Schubert1-15/+15
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174 Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done. An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem. Reviewed By: jspark1105 Differential Revision: D13708785 fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb
2019-01-18Support ConstantOfShape in Caffe2 ONNX Backend (#16108)Lu Fang4-56/+158
Summary: This PR is the prerequisite to land https://github.com/pytorch/pytorch/pull/16095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16108 Reviewed By: BIT-silence Differential Revision: D13725722 Pulled By: houseroad fbshipit-source-id: 28c0fb72f075cd04f9db44dfab0163844c20c620
2019-01-18Separate affine_channel from math and optimize it (#16135)Xiaomeng Yang14-156/+246
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135 Separate affine_channel from math and optimize it i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D13727606 fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
2019-01-18Pass IValue from c10 dispatcher to caffe2 operator (#16065)Sebastian Messmer1-13/+9
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16065 Before, we registered the caffe2 kernel with the c10 dispatcher using plain C types. Now, we pass in IValues, which avoids the unwrapping inbetween. Reviewed By: ezyang Differential Revision: D13689036 fbshipit-source-id: b976a2c46a5a541f6a926b3df255e8a535e32420
2019-01-18Make c10 dispatcher use boxed kernel function pointers (#16051)Sebastian Messmer49-390/+501
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051 This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*. Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`. Reviewed By: ezyang Differential Revision: D13684518 fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991
2019-01-18add back NNPACK in PyTorch (#15924)Thomas Viehmann11-16/+712
Summary: This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions. In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.) The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.) The CMake changes try to use the NNPack we already have in git. In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924 Differential Revision: D13709576 Pulled By: ezyang fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
2019-01-18improve performance of unique with inverse indices (#16145)Natalia Gimelshein1-32/+21
Summary: Partial fix for #15804, only w/o dim. For jcjohnson benchmarking script I'm getting the following results on V100: Before: ``` unning with N = 10000, M = 10000 cuda (no inverse): 0.98 ms cpu (no inverse): 0.96 ms cuda (with inverse): 1.07 ms cpu (with inverse): 1.76 ms Running with N = 10000, M = 100000 cuda (no inverse): 0.76 ms cpu (no inverse): 1.53 ms cuda (with inverse): 1.23 ms cpu (with inverse): 3.02 ms Running with N = 100000, M = 100000 cuda (no inverse): 1.28 ms cpu (no inverse): 11.22 ms cuda (with inverse): 69.76 ms cpu (with inverse): 20.28 ms Running with N = 100000, M = 1000000 cuda (no inverse): 0.78 ms cpu (no inverse): 18.78 ms cuda (with inverse): 133.45 ms cpu (with inverse): 34.09 ms Running with N = 500000, M = 500000 cuda (no inverse): 1.43 ms cpu (no inverse): 61.13 ms cuda (with inverse): 3315.18 ms cpu (with inverse): 104.57 ms Running with N = 500000, M = 5000000 cuda (no inverse): 0.86 ms cpu (no inverse): 96.44 ms cuda (with inverse): 5209.93 ms cpu (with inverse): 176.10 ms ``` After ``` Running with N = 10000, M = 10000 cuda (no inverse): 1.04 ms cpu (no inverse): 0.94 ms cuda (with inverse): 0.64 ms cpu (with inverse): 1.76 ms Running with N = 10000, M = 100000 cuda (no inverse): 0.77 ms cpu (no inverse): 1.55 ms cuda (with inverse): 0.58 ms cpu (with inverse): 2.79 ms Running with N = 100000, M = 100000 cuda (no inverse): 1.30 ms cpu (no inverse): 14.15 ms cuda (with inverse): 1.63 ms cpu (with inverse): 20.90 ms Running with N = 100000, M = 1000000 cuda (no inverse): 0.82 ms cpu (no inverse): 18.63 ms cuda (with inverse): 0.61 ms cpu (with inverse): 33.52 ms Running with N = 500000, M = 500000 cuda (no inverse): 1.51 ms cpu (no inverse): 59.81 ms cuda (with inverse): 1.23 ms cpu (with inverse): 110.69 ms Running with N = 500000, M = 5000000 cuda (no inverse): 0.92 ms cpu (no inverse): 104.26 ms cuda (with inverse): 0.84 ms cpu (with inverse): 187.12 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16145 Differential Revision: D13738821 Pulled By: soumith fbshipit-source-id: 0811fb4ade47e3b466cebbc124e3f3333a986749
2019-01-18fix for clang-tidy (#16164)Michael Suo1-6/+0
Summary: It turns out that clang-tidy is bundled with travis's standard trusty distribution, so no need to install it manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16164 Differential Revision: D13738986 Pulled By: suo fbshipit-source-id: d0cd76c615625b2ed7f18951289412989f15849d
2019-01-18Change current device in stream context manager if necessary (#16128)Shen Li3-4/+36
Summary: Fixes #16019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128 Differential Revision: D13721850 Pulled By: mrshenli fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5
2019-01-18Fix SoftmaxOps (#16049)Jerry Zhang6-64/+108
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16049 We might see the pattern ``` if (scale_.numel() != N) { scale_->Resize(N); // set initial value for scale_ } // In class: Tensor scale_{CPU}; ``` before in the code, where `scale_` is a member variable of Type `caffe2::Tensor` This pattern actually serves two purposes, if `scale_` is partially initialized with device type but not size, this call will initialize Tensor with the correct size, or if `scale_` is already initialized with size, it will check whether the size matches a runtime value `N` and if not it will Resize. To rewrite this we'll do the following: ``` if (!scale_.defined() || scale_.numel() != N) { ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU)); // set initial value for scale_ } ``` There are some variants, if `scale_` is resized to a constant size, we can call `ReinitializeTensor` instead ``` if (scale_.numel() != 1) { scale_->Resize(1); } ``` --> ``` ReinitializeTensor(&scale_, {1}, at::dtype<float>().device(CPU)); ``` Normal Resize will be refactored directly into ReinitializeTensor: ``` scale_->Resize(N); ``` --> ``` ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU)); ``` Reviewed By: dzhulgakov Differential Revision: D13667883 fbshipit-source-id: 2c7cb61544b72765b594011b99150eb5a1b50836
2019-01-18rest of uses for deprecation of dims() in Tensor (#16118)Jerry Zhang1-0/+5
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118 att Differential Revision: D13697211 fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112
2019-01-18RNN operators should inherit step_net device_options (#16086)Nikita Shulga1-0/+9
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16086 [caffe2] RNN operators should inherit step_net device_options According to NetDef documentaiton, if network has a specific device option it applies to all network operators that do not explicitly specifiy it. But this does not seem to be the case for RecurrentNetwork operators Reviewed By: orionr Differential Revision: D13699552 fbshipit-source-id: 14529bc9504e3b02f763e3c2429be21e46f82b68
2019-01-18Add implicit optional unwrapping (#15587)Elias Ellison6-16/+326
Summary: Add support for type inference for optional type refinement. If a conditional is of the form "x is None" or "x is not None", or is a boolean expression containing multiple none checks, the proper type refinements are inserted in each branch. For example: if optional_tensor is not None and len(optional_tensor) < 2: # optional_tensor is a Tensor if optional_tensor1 is not None and optional_tensor2 is not None: # both optional_tensor1 and optional_tensor2 are Tensors TODO: - not run an op for unchecked unwrap optional in the interpreter - potentially refine types to prim::None (omitted for now to simply things & because it's not an actual use cause). Pull Request resolved: https://github.com/pytorch/pytorch/pull/15587 Differential Revision: D13733810 Pulled By: eellison fbshipit-source-id: 57c32be9f5a09ab5542ba0144a6059b96de23d7a
2019-01-18Add defined() to caffe2::Tensor (#16125)Jerry Zhang2-0/+9
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125 Add defined() method to check whether the Tensor is defined. Reviewed By: ezyang Differential Revision: D13719222 fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47
2019-01-18Remove ATen/Half.h and ATen/core/Half.h forwarding headers.Edward Yang16-19/+15
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115 Reviewed By: bddppq Differential Revision: D13717049 fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a
2019-01-18Port legacy any(*) to ATenShen Li14-195/+120
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547 Differential Revision: D13549495 Pulled By: mrshenli fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853
2019-01-18Improve pack_sequence and pack_padded_sequence error message (#16084)Richard Zou2-2/+15
Summary: Mention that if enforce_sorted=True, the user can set enforce_sorted=False. This is a new flag that is probably hard to discover unless one throughly reads the docs. Fixes #15567 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084 Differential Revision: D13701118 Pulled By: zou3519 fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260
2019-01-18TCP init method race condition fix (#15684)Teng Li6-50/+101
Summary: This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it. This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit). The master rank (who is the server) will always wait until all the ranks to complete before complete itself. This should fix: https://github.com/pytorch/pytorch/issues/15638 Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage. I had to make rendezvous test in c10d the world size of 1, since it is a single process code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684 Differential Revision: D13570904 Pulled By: teng-li fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589
2019-01-18Remove caffe2::Tensor copy constructor (#15416)Dmytro Dzhulgakov7-24/+61
Summary: Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior. This change also identified a few places that misused the copy constructor - those are fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416 Reviewed By: Yangqing Differential Revision: D13524598 fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7
2019-01-18Fix RERUN_CMAKEZachary DeVito1-4/+3
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16132 Differential Revision: D13726816 Pulled By: zdevito fbshipit-source-id: 26ad70651b0138642ad5240670f5c452018c13a2
2019-01-17Cleanup includes in python_print.cpp.Mikhail Zolotukhin2-2/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16129 Differential Revision: D13724297 Pulled By: ZolotukhinM fbshipit-source-id: 24e140bc052c85ef40b928eb84f463d341346a51
2019-01-17Refactor attributes.h (#16098)Mikhail Zolotukhin5-291/+277
Summary: This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098 Differential Revision: D13717637 Pulled By: ZolotukhinM fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb
2019-01-17Fix export macros (#15856)Sebastian Messmer1-2/+2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15856 They seem to be wrong. cc zdevito to take a look but I think this is now more correct. It's weird this didn't cause linker errors. Probably, this functionality isn't used across library boundaries yet. Reviewed By: dzhulgakov Differential Revision: D13605257 fbshipit-source-id: 7077ca9027c3ac79a4847ec15ead7ddb28696445