summaryrefslogtreecommitdiff
path: root/aten
AgeCommit message (Collapse)AuthorFilesLines
2019-04-08Export C10 operator in PyTorch Model (#18210)Lu Fang1-0/+6
Summary: Almost there, feel free to review. these c10 operators are exported to _caffe2 domain. TODO: - [x] let the onnx checker pass - [x] test tensor list as argument - [x] test caffe2 backend and converter - [x] check the c10 schema can be exported to onnx - [x] refactor the test case to share some code - [x] fix the problem in ONNX_ATEN_FALLBACK Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210 Reviewed By: zrphercule Differential Revision: D14600916 Pulled By: houseroad fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
2019-04-08Renamed bool tensors into byte tensors (#19021)Iurii Zdebskyi2-23/+23
Summary: Renamed bool tensors into byte tensors to represent the correct type in generated code Pull Request resolved: https://github.com/pytorch/pytorch/pull/19021 Differential Revision: D14835188 Pulled By: izdeby fbshipit-source-id: 0252d2c69dab35ac2f076cf9a87423463e902c76
2019-04-08ifdef guard some explicit pragma unrolls (#19018)Johannes M Dieterich2-0/+6
Summary: the ROCm compiler cannot and will not satisfy them, causing compile time warnings. Reason being a runtime loop trip count. Some warnings remain arising from other parts of the ROCm stack - tickets are filed and they will be resolved within these components. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19018 Differential Revision: D14832859 Pulled By: ezyang fbshipit-source-id: 0d66e4aebe4e56af14dd5e2967d3c374a82be25c
2019-04-08Clean up some sparse code. (#18962)Gregory Chanan3-11/+0
Summary: 1) sparse_dispatches in native_parse was not used anymore, got rid of it. 2) got rid of overloaded sizes_ in SparseTensorImpl, which just uses the base implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18962 Differential Revision: D14811545 Pulled By: gchanan fbshipit-source-id: 2fa60ef50456b5f605caa63beae1d8d2542fd527
2019-04-08Remove tensorWithAllocator() from Type (#18780)Roy Li8-31/+16
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18780 ghimport-source-id: 7d18a11ce87d988bd32f6ebb96acd878ab8d61be Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18780 Remove tensorWithAllocator() from Type** * #18779 Remove tensorFromBlob() from Type Differential Revision: D14739336 fbshipit-source-id: 429ab10bb9f6ac9f97b5a11c7a836b6b6336cb2d
2019-04-07Fix sparse mm for ROCm (#18985)Johannes M Dieterich2-4/+31
Summary: * Annotate also two pass reduction with launch bounds * ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed * while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE) * while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation * enable test_dsmm in test_sparse which now passes Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985 Differential Revision: D14822009 Pulled By: bddppq fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833
2019-04-07Remove tensorFromBlob() from Type (#18779)Roy Li15-102/+96
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18779 ghimport-source-id: e7453b74fcce0e4f4a9cbce0324992a85272a426 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18780 Remove tensorWithAllocator() from Type * **#18779 Remove tensorFromBlob() from Type** Differential Revision: D14739335 fbshipit-source-id: 8a0619a5b412332efa3b2d60c1edebd53d089d50
2019-04-05Create Object that represents a Module (#18469)Zachary DeVito4-2/+27
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18469 ghimport-source-id: 73cb8b58f43f10b1dcfca805fd5b25c4fa977632 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18469 Create Object that represents a Module** * #18468 slots with explicit value/setValue make more sense in future patches * #18467 Make Object hold its ClassType * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. This changes the underlying storage for script::Module to hold a ivalue::Object which has slots for all the parameters and attributes. NamedIValue and Slot are now merged together into one class Slot that stores the tuple (ivalue::Object, offset) and can be used to read the name, type, or value of the slot and also to set the value. This cleans up a bunch of client uses. This PR does not actually use the module object in any generated code. A future PR will switch how code is generated to treat modules as first class. Differential Revision: D14613508 fbshipit-source-id: d853a7559f58d244de2ef54a781427fcd1060ed0
2019-04-05Add numpy like repeat as torch.repeat_interleave (#18395)Gao, Xiang7-0/+125
Summary: Fixes: https://github.com/pytorch/pytorch/issues/14093 cc: SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395 Differential Revision: D14599509 Pulled By: umanwizard fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3
2019-04-05Make Object hold its ClassType (#18467)Zachary DeVito4-9/+16
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18467 ghimport-source-id: d51bdd64d2529d08c634c58df1a0870b54ad49fb Stack from [ghstack](https://github.com/ezyang/ghstack): * #18469 Create Object that represents a Module * #18468 slots with explicit value/setValue make more sense in future patches * **#18467 Make Object hold its ClassType** * #18379 Enforce single parent for script submodules * #18378 Unify namespace of script::Module * #18314 Add ability to specialize class types to ArgumentSpec * #18226 Add Slot type to abstract the raw pointers being used for slots. Currently it holds a symbol whose unqualified name is the name of the class. This will get confusing when there are multiple possible registries, and it makes getting the class type from the object difficult. The pointer to the class is only 4 more bytes so this patch just puts it in the object. Reviewed By: suo Differential Revision: D14613510 fbshipit-source-id: b35175ba4be83d2522deaa6dad5070d6ec691fed
2019-04-05More numerically stable lerp (#18871)Marek Kolodziej2-4/+8
Summary: The C++ and CUDA implementations of the lerp are not numerically stable. This is discussed on Wikipedia [here](https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support). I checked the GPU SASS output and there's no overhead from using the more precise implementation, from Kepler all the way to Turing. I haven't looked at CPU ASM though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18871 Differential Revision: D14793438 Pulled By: ezyang fbshipit-source-id: 2ddc2e026c5285466cae7d1b4101174253100445
2019-04-05Revert D14742020: Wrap workaround for cpp custom types a bit prettier and ↵Edward Yang2-61/+24
add an example Differential Revision: D14742020 Original commit changeset: 0f2fd83ae56a fbshipit-source-id: 5640255aef0319b7d8996e07132e87213130d31c
2019-04-05Wrap workaround for cpp custom types a bit prettier and add an example (#18791)Dmytro Dzhulgakov2-24/+61
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18791 As a temporary demonstration on how to extend this hack further until custom C types are ready. Reviewed By: jamesr66a Differential Revision: D14742020 fbshipit-source-id: 0f2fd83ae56ab2abe16977a1829ed421e6abe74b
2019-04-05Remove cuda::compat functions in aten (#18905)bddppq1-2/+2
Summary: Looks like the issue of using `std::` functions is fixed in new rocm version Pull Request resolved: https://github.com/pytorch/pytorch/pull/18905 Differential Revision: D14792943 Pulled By: bddppq fbshipit-source-id: af11acbb85872943f23b6e55415db1f0699e7b8f
2019-04-05fix side-effects and aliasing for custom ops (#18711)Michael Suo1-5/+10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18711 ghimport-source-id: c9caedc0660b2b7ba3730cd0e1a2e0e9c3cf422b Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18711 [jit] fix side-effects and aliasing for custom ops** Previously we didn't track aliasing, mutation, or side effects for custom ops. This PR adds in guards with the most conservative assumptions possible: the op will 1) have side effects, 2) write to everything 3) produce a wildcard. In order to tell whether a given operator is a custom op, this PR introduces the concept of a "reserved" namespace (basically all our builtin namespaces). Custom ops live in non-reserved namespaces, so a check on the namespace is sufficient to tell whether a schema/node is "custom" or not. This is just to get things correct for now. Follow-ups to this: - Users should be able to specify aliasing/mutability without having to learn the whole alias annotation schema. - Relax assumptions a bit. In particular outputs can only alias input tensors, they don't have to be wildcards. Fixes #18490 Differential Revision: D14730978 fbshipit-source-id: 540b47a24ccf24145051609bdcc99c97e46e0fe0
2019-04-05Expand the list of ops that mutate an inputs shape (#18812)Elias Ellison1-0/+16
Summary: Expand the list of ops that resize an input in-place to include broadcasting ops and other ops that affect shape. Whoever is reviewing the PR could you please look through pytorch in place ops and see if I missed any. Expanding the PR from: https://github.com/pytorch/pytorch/pull/17518 This is already being tested in test_resize_input_ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18812 Differential Revision: D14793410 Pulled By: eellison fbshipit-source-id: 125f4f5375ac1036fb96fabc9da2aaccc9adc778
2019-04-05add launch bounds, enable more tests (#18909)J M Dieterich16-0/+102
Summary: Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use. Enable tests that now work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909 Differential Revision: D14801490 Pulled By: ezyang fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7
2019-04-05Unify caffe2 and libtorch build scripts on Windows (#18683)peter1-0/+4
Summary: `scripts/build_windows.bat` is the original way to build caffe2 on Windows, but since it is merged into libtorch, the build scripts should be unified because they actually do the same thing except there are some different flags. The follow-up is to add the tests. Looks like the CI job for caffe2 windows is defined [here](https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/caffe2.groovy#L906). Could we make them a separate file, just like what we've done in `.jenkins/pytorch/win-build.sh`? There's a bunch of things we can do there, like using ninja and sccache to accelerate build. cc orionr yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18683 Differential Revision: D14730188 Pulled By: ezyang fbshipit-source-id: ea287d7f213d66c49faac307250c31f9abeb0ebe
2019-04-05Simplify storage wrapping in TH. (#18855)Gregory Chanan2-8/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18855 ghimport-source-id: 01faa229fa4db901ab8539d3778b716d909ba4cf Reviewed By: dzhulgakov Differential Revision: D14790669 Pulled By: gchanan fbshipit-source-id: 167b9bc9c9872743fa8f6040a26ddf7ff5789c27
2019-04-05Cache device on TensorImpl; clean up TensorImpl constructors. (#18833)Gregory Chanan4-25/+30
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18833 ghimport-source-id: 6f2be25fcc5e6be3ffe20582e604bd2c1fbab66b Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.** * #18832 [STACK] Disallow changing the device of a tensor via set_. * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. 1) We cache device on TensorImpl. This means we can access the device without a virtual function and allows us to more easily extend TensorImpls (because they don't need to figure out how to store the Device for themselves). 2) Clean up TensorImpl APIs. We had a constructor that took a TensorTypeId and an allocator and would allocate a Storage based on the recognized types of TensorTypeIds. Instead, we just have two different constructors: one for types with a storage, one without. Reviewed By: dzhulgakov Differential Revision: D14766230 fbshipit-source-id: 745b8db84dcd6cb58f1a8675ad3ff8d033bc50df
2019-04-05Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)Vitaly Fedyunin5-73/+68
Summary: This reverts commit c484cf43a02863efd2f4a76aad43246fb0191ab5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854 Differential Revision: D14778393 Pulled By: VitalyFedyunin fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452
2019-04-05Silence compiler warnings (#18912)Sebastian Messmer2-0/+12
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18912 We intentionally test a deprecated API, no need to show the warnings here. Reviewed By: dzhulgakov Differential Revision: D14792617 fbshipit-source-id: 9ea2a4106d566064283726eed2c274b98f49a2e5
2019-04-04Fixed the comment to reference gist example instead of private repo (#18852)Iurii Zdebskyi1-2/+2
Summary: Replace link to a file in a private repo with a gist Pull Request resolved: https://github.com/pytorch/pytorch/pull/18852 Reviewed By: ezyang Differential Revision: D14778481 Pulled By: izdeby fbshipit-source-id: 8389aa4bf115ddcfd85079cc2c861404efa678e7
2019-04-04Added bool and half support for resize_as_ and view methods (#18821)Iurii Zdebskyi2-0/+12
Summary: Enabled **resize_as_** and **view** methods for bool and half tensors. tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821 Reviewed By: ezyang Differential Revision: D14762852 Pulled By: izdeby fbshipit-source-id: 4312079fb4e893fea6f71ff4f163094b2674f1e8
2019-04-04Disallow changing the device of a tensor via set_. (#18832)Gregory Chanan1-2/+13
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832 ghimport-source-id: fde4ad90541ba52dfa02bdd83466f17e6541e535 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * **#18832 [STACK] Disallow changing the device of a tensor via set_.** * #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors. This is necessary to cache the device on a TensorImpl. Differential Revision: D14766231 fbshipit-source-id: bba61634b2d6252ac0697b96033c9eea680956e8
2019-04-04Stop swapping in Storages of the wrong device for Tensors. (#18831)Gregory Chanan3-3/+22
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831 ghimport-source-id: 2741e0d70ebe2c2217572c3af54ddd9d2047e342 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors. * #18832 [STACK] Disallow changing the device of a tensor via set_. * **#18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.** This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578. In library code, we potentially swap in Storages with the wrong device when device_guard is False. This happens as follows with "view-like" operations. 1) We allocate a tensor on the 'wrong' device (because device_guard is false). 2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage. Instead, we can just construct the Tensor with the correct Storage from the beginning. This is what we do with 'view'. Note there are two other "view-like" cases where this happens: 1) unfold 2) set_() Because these aren't performance critical, I just added the device_guard instead of applying the above correction. For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions. Reviewed By: dzhulgakov Differential Revision: D14766232 fbshipit-source-id: 0865c3ddae3f415df5da7a9869b1ea9f210e81bc
2019-04-04Store ScalarType and Backend instead of Type in TensorIteratorRoy Li3-45/+60
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601 Reviewed By: ezyang Differential Revision: D14274754 fbshipit-source-id: b08880ae586b6ae57d4c0bbeb203796d087926c4
2019-04-04Introduce DeprecatedTypeProperties class (#17991)Roy Li36-501/+637
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991 changes: -Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&. -Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic. -Tensor::dispatch_type() now returns Type& like Tensor::type() used to do. -Changed callsites of Tensor::type() appropriately. Reviewed By: ezyang Differential Revision: D14443117 fbshipit-source-id: 239ccb7a09626279a71d1a37f8f82e7f57bf7d9e
2019-04-04Fix to handle null strides in DLPack tensor (#18510)Bram Wasti2-0/+19
Summary: DLPack can have non-strided tensors, which is represented by a nullptr in the place of dl_tensor.strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18510 Differential Revision: D14647328 Pulled By: bwasti fbshipit-source-id: 5364282810a5772cfc2319fc8133fe86fdd84dd1
2019-04-03fix flake8 lint (#18835)Michael Suo1-0/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18835 ghimport-source-id: 7b1f433ae51232822704d62699233688072cbc23 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18835 fix flake8 lint** * #18826 [jit] run cpp tests for non-cuda builds in test_jit.py ...again Reviewed By: ZolotukhinM Differential Revision: D14766790 fbshipit-source-id: 29361a407589092831dfbc3c5d63d2834934cd02
2019-04-03Fix layernorm ad formula on weight and bias (#18233)Wanchao Liang1-1/+1
Summary: Fix the layernorm formula when weight and bias passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18233 Differential Revision: D14760375 Pulled By: wanchaol fbshipit-source-id: d6bd3b137bc04c391aa5c24d021d1f811ba2a877
2019-04-03Step 1: Secretly add return_counts to unique, and refactor unique_dim for ↵Vitaly Fedyunin3-132/+250
performance (#18648) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648 ghimport-source-id: 1cf4a8fe91492621e02217f38cae5d7e0699fb05 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18661 Step 7: remove _unique * #18655 Step 6: Rename _unique2 to unique and add int? dim * #18654 Step 5: remove _unque_dim in favor of unique_dim * #18651 Step 4: add support for unique with dim=None * #18650 Step 3: Add support for return_counts to torch.unique for dim not None * #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim * **#18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance** `unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back. The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure. Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure. Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`. Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure. Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure. Step 5: Remove `_unique_dim`. This might cause internal failure. Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail. Step 7: Remove `_unique`. This is very likely to cause internal failure. This PR ====== This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements. Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy. Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`: Before --------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() ``` ``` 1.0.1 226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After ------- ```python print(torch.__version__) %timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ```python print(torch.__version__) %timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize() %timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+83ab8ac 232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Differential Revision: D14730905 fbshipit-source-id: 10026b4b98628a8565cc28a13317d29adf1225cc
2019-04-03Remove `device_guard: False` from native_functions that don't have a … ↵Gregory Chanan1-4/+0
(#18803) Summary: …tensor. There's nothing to device_guard on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18803 Reviewed By: ezyang Differential Revision: D14748091 Pulled By: gchanan fbshipit-source-id: ed6f16d6f4d3f07b6d5ad9696f71a14333c228b8
2019-04-03QTensor (#18230)Jerry Zhang22-9/+571
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230 Implementing minimum qtensor API to unblock other workstreams in quantization Changes: - Added Quantizer which represents different quantization schemes - Added qint8 as a data type for QTensor - Added a new ScalarType QInt8 - Added QTensorImpl for QTensor - Added following user facing APIs - quantize_linear(scale, zero_point) - dequantize() - q_scale() - q_zero_point() Reviewed By: dzhulgakov Differential Revision: D14524641 fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
2019-04-03push magma init into lazyInitCUDA (#18527)Soumith Chintala1-0/+3
Summary: Tries to fix C++ API's usage of MAGMA-based functions. Attempts to Fix https://github.com/pytorch/pytorch/issues/18074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18527 Differential Revision: D14691694 Pulled By: soumith fbshipit-source-id: dd04e74418e486d73ea4a92193ddf79352ed71ba
2019-04-03For some files that are touched by the QTensor diff (#18765)Jerry Zhang1-20/+22
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18765 att Reviewed By: ZolotukhinM Differential Revision: D14733442 fbshipit-source-id: 525002034e6dccc2045da645e1193671fd0474b3
2019-04-03Added indexing for bool tensors and bool Indices (#18583)Iurii Zdebskyi11-83/+107
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18583 ghimport-source-id: 2b1941449827f4ab632fa0f5c8cf0791a6be0845 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18583 Added indexing for bool tensors and bool Indices** * #18505 Added numpy conversion * #18166 Bool Tensor for CUDA ----------- This PR enables bool tensor indexing and indexing with bool indices. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU [Done] b) CUDA [In review] 3. Tensor Conversions. [In review] 4. Tensor Indexing. [This PR] 5. Tensor Operations. 6. Back compatibility related changes. TODO: as a follow up, we should move nonzero method from TH to Aten to make code cleaner. Change: ``` v = torch.tensor([True, False, True], dtype=torch.bool) boolIndices = torch.tensor([True, False, False], dtype=torch.bool) v[boolIndices] -> tensor([True], dtype=torch.bool) v = torch.randn(5, 7, 3) boolIndices = torch.tensor([True, False, True, True, False], dtype=torch.bool) v[boolIndices] -> tensor([[[ 0.5885, -0.3322, 0.7388], [ 1.1182, 0.7808, -1.1492], [-0.7952, 0.5255, -0.0251], [ 0.7128, 0.8099, 1.2689], [-0.7018, -1.4733, -0.3732], [ 0.4503, 0.4986, -1.1605], [ 0.3348, -1.3767, -0.2976]], [[-2.0303, -0.4720, -0.1448], [-0.1914, -0.6821, 2.0061], [-1.0420, -0.1872, -0.3438], [ 1.7587, -0.4183, -0.7577], [ 1.0094, -0.1950, -0.2430], [ 0.1174, 0.3308, -0.5700], [ 0.1110, -0.2714, 1.3006]], [[-0.1946, -1.4747, -0.4650], [-1.0567, 1.0110, -0.2809], [ 0.3729, -0.5699, 0.0815], [-0.7733, -0.8316, 0.1674], [ 1.2000, -0.3745, -1.1679], [ 1.7105, 0.9851, -0.1907], [-1.1077, 0.2086, -0.0548]]]) ``` Differential Revision: D14673403 fbshipit-source-id: 2b88ec2c7eb26a4f5ef64f8707fb68068d476fc9
2019-04-03add 'abs' builtinMichael Kösel1-0/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18502 Differential Revision: D14750173 Pulled By: eellison fbshipit-source-id: 359cf08938ada442ca1a3b3ea14022ce10229499
2019-04-03Fix dense Embedding to work with double backward (#9078)kshitij123451-1/+1
Summary: Fixes : #6469 1. `ATen/native/native_functions.yml` had [dispatch](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/native_functions.yaml#L451-L455) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L35-L45) to it, thus leading to error. 2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](https://github.com/pytorch/pytorch/blob/03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L93). Both have been solved and checked against (on CUDA and CPU) 1. As mentioned in the issue ``` import torch class Test(torch.nn.Module): def __init__(self): super(Test,self).__init__() self.embd = torch.nn.Embedding(1000, 100) self.dense = torch.nn.Linear(100, 1) def forward(self, inp): inp = self.embd(inp) return self.dense(inp) test = Test() inp = torch.tensor([0,1,2,1,1]) out = test(inp) raw_loss = out.mean(dim=0) loss_grad = torch.autograd.grad(outputs=raw_loss, inputs=list(test.parameters()), retain_graph=True, create_graph=True, only_inputs=True) norm = sum([param.norm()**2 for param in loss_grad]) loss = raw_loss + norm loss.backward(retain_graph=True) print(test.embd.weight.grad) ``` 2. Test Script ``` import torch import time start = time.time() l = [1,1]*100 input = torch.tensor([[1,0],[1,0]],device='cpu') embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu') sq = embedding_matrix * embedding_matrix emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False) print('Embedding Matrix') print(embedding_matrix) print('-----------------') sum_ = emb.sum()#prod.sum() loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True) print('Gradient') print(loss_grad) print('-----------------') sum2_ = sum_ + loss_grad.sum() print(sum2_) sum2_.backward() print(embedding_matrix.grad) print(time.time() - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078 Reviewed By: ezyang Differential Revision: D14691901 Pulled By: soumith fbshipit-source-id: 78e2612ba39080be564c876311671eb5a0119a0f
2019-04-03Remove THTensor_(newUnfold). (#18773)Gregory Chanan4-16/+0
Summary: It's not used and unfold's use of `device_guard: False` is scary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18773 Differential Revision: D14736526 Pulled By: gchanan fbshipit-source-id: 6281a284bee45fa5038783e4c1ed4d1ed7ca81ab
2019-04-02Add ability to specialize class types to ArgumentSpec (#18314)Zachary DeVito3-1/+24
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314 ghimport-source-id: 8cecb768d476ab19c9460f39c8f94a764e4cb052 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18314 Add ability to specialize class types to ArgumentSpec** * #18226 Add Slot type to abstract the raw pointers being used for slots. Differential Revision: D14574395 fbshipit-source-id: cc3af6e56e9ae52990f4a1ad56ecceaa2d493577
2019-04-02Bool Tensor for CUDA (#18166)Iurii Zdebskyi17-45/+128
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166 ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#18166 Bool Tensor for CUDA** * #18165 Resolved comments from Bool Tensor for CPU PR ------ This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU [Done] b) CUDA [This PR] 3. Tensor Conversions. 4. Tensor Indexing. 5. Tensor Operations. 6. Back compatibility related changes. Change: Enable bool tensor in CUDA with the following operations: torch.zeros torch.tensor torch.ones torch.rand/rand_like/randint/randint_like torch.full torch.full_like torch.empty torch.empty_like Tested via unit tests and local scripts. Differential Revision: D14605104 fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c
2019-04-02torch.cross' dim default changed to c10::optional instead of int=-1 (#17582)Igor Fedan14-84/+173
Summary: Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim. Fixes #17229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582 Differential Revision: D14483063 Pulled By: ifedan fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746
2019-04-02Register operators by passing arguments to RegisterOperators constructor ↵Sebastian Messmer3-4/+45
(#18577) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18577 This is also part of the legacy API and we need to support it if we want to replace it. Reviewed By: dzhulgakov Differential Revision: D14671432 fbshipit-source-id: 007abf4ab816647a509fc08e35d79b6c1aa55b03
2019-04-02Allow registering an operator schema without a kernel (#18551)Sebastian Messmer2-25/+68
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18551 This is helpful for defining a set of operators as an interface but not adding concrete kernels just yet. The registration logic will ensure that any other libraries that add kernels for these schemas exactly match the schema defined here. Reviewed By: dzhulgakov Differential Revision: D14660208 fbshipit-source-id: 7adb5a4876cff5a0ad21d92d8c450cb889f00cc3
2019-04-02Improve compiler error messages of the op registration API (#18550)Sebastian Messmer5-28/+50
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18550 When the operator registration API is used wrongly, in most cases we should now get a nice compiler error instead of weird template error messages. This is done by making the enable_if conditions more broad so they also match error cases, but then having static_asserts against these error cases inside the function. Before that, since the function didn't match, the error message said something like "no function found to match your call", now it will show the error message specified in the static_asserts. Reviewed By: dzhulgakov Differential Revision: D14659178 fbshipit-source-id: 7ca4fb72d9051eadf0a7e2717b962bf1213a52b2
2019-04-02Improve and test error messages for signature mismatches (#18547)Sebastian Messmer8-285/+295
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18547 - Argument indices in the error messages are 1-indexed not 0-indexed. - Add test cases that a mismatching signature actually shows the correct error messages Reviewed By: dzhulgakov Differential Revision: D14656695 fbshipit-source-id: 55e45634baa3117e18b8687ea6b2a2f83715bdf6
2019-04-02Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)Vitaly Fedyunin5-68/+73
Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455 Reviewed By: ezyang Differential Revision: D14672084 Pulled By: VitalyFedyunin fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
2019-04-02Expose alias multinomial methods to ATen (#17904)vishwakftw7-13/+92
Summary: This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods. cc: neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904 Differential Revision: D14700205 Pulled By: ezyang fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3
2019-04-02Kill LegacyBridge functions that don't do multiple dispatch. (#18696)Gregory Chanan3-70/+41
Summary: At some point, we needed these functions to deal with autograd dispatching to the sparse of TH version of a backwards. But we rewrote all backwards definitions in terms of native functions, so this is no longer necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18696 Differential Revision: D14710834 Pulled By: gchanan fbshipit-source-id: b22568c58eefc79d672555bd8832398ccd965cb7