summaryrefslogtreecommitdiff
path: root/aten
AgeCommit message (Collapse)AuthorFilesLines
2019-02-05Document hip-clang and its __HIP__ macro (#16771)Johannes M Dieterich1-0/+11
Summary: In #16085 , we introduced initial hip-clang bring-up code. Document the use of the __HIP__ macro now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16771 Differential Revision: D13961538 Pulled By: ezyang fbshipit-source-id: 67f6226abcbe62e2f4efc291c84652199c464ca6
2019-02-05Rename IntList to IntArrayRef. (#16751)Edward Yang97-827/+827
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751 This was made more complicated by the fact that ivalue::IntList is a thing. So I had to fix all of the sites where we referring to IValue post facto. The following codemods were run, in this order: ``` codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>' codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>' ``` Some manual fixups were done afterwards; they can be reviewed separately at https://github.com/pytorch/pytorch/pull/16752 Reviewed By: dzhulgakov Differential Revision: D13954363 fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05dict values(), keys(), and len() (#16629)David Riazati3-2/+29
Summary: Adds some operations for dicts to match Python and tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/16629 Differential Revision: D13961144 Pulled By: driazati fbshipit-source-id: b31f27a4320ff62cd118b508fb0a13056535dc7c
2019-02-05Add XLA / TPU device type, backend type and type id (#16763)Alex Şuhan5-2/+39
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16763 Replicate the easy bits in https://github.com/pytorch/pytorch/pull/15153 with TPU / XLA instead of MSNPU. Also don't initialize the storage for XLA tensors for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16585 Reviewed By: ezyang Differential Revision: D13912118 Pulled By: gchanan fbshipit-source-id: 4889177e2478768fb281ed075b71146d1d850bd9
2019-02-05Fixes selection of cuDNN algorithm (#15881)Syed Tousif Ahmed2-140/+186
Summary: This PR updates the logic for using cudnnGet* and cudnnFind*. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions. Changelist: - Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`. - Used cudnnGet*_v7 everywhere cudnnGet* was being used. - Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851) Differential Revision: D13957944 Pulled By: ezyang fbshipit-source-id: a88c39d80ae37f2d686665622302b62b50fab404
2019-02-05logsumexp for multiple dimensions (#16475)Brennan Vincent10-40/+162
Summary: Move `logsumexp` and `max_values` to `TensorIterator` and use it to make `logsumexp` work for multiple dimensions. Timings on a tensor of shape `(10,1000000,10)`, for each combination of (cpu, single-threaded cpu, gpu) and dimension: **before** 208 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 279 ms ± 5.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 199 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.25 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 6.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) **after** 199 ms ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 307 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 207 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.16 s ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.26 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.13 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 868 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 21.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/16475 Differential Revision: D13855746 Pulled By: umanwizard fbshipit-source-id: aaacc0b967c3f89073487e1952ae6f76b7bd7ad3
2019-02-05Add an API to set the number of threads in C10 thread pool (#16669)James Reed2-8/+20
Summary: Tested locally on machine translation service Pull Request resolved: https://github.com/pytorch/pytorch/pull/16669 Differential Revision: D13927858 Pulled By: jamesr66a fbshipit-source-id: efcb8c21e0c2f76ac37967e6f52967da515595c3
2019-02-04Revert "Move outplace ops to ATen (#12413)" (#16731)Edward Yang5-88/+0
Summary: This reverts commit f660d3ae19decc64390e894fbaf8de80d87585e0. cc zasdfgbnm Reasoning at https://github.com/pytorch/pytorch/pull/12413#issuecomment-460424129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16731 Differential Revision: D13948022 Pulled By: ezyang fbshipit-source-id: b10669cf03679e306850314b7b5b08bed0839e19
2019-02-04Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) ↵Joshua Meier1-9/+21
in aten/src/TH/generic/THTensor.cpp (#16673) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16673 Replace resize_dim() with set_sizes_and_strides() in THTensor_(unsqueeze1d) in aten/src/TH/generic/THTensor.cpp, as described in T38058642. Reviewed By: ezyang Differential Revision: D13928879 fbshipit-source-id: d593cebcc82589cd362ac78884d4e367d0da0ce6
2019-02-02fix conditional in mean workaround (#16686)James Reed1-1/+1
Summary: When trying to get a test to pass I was missing an exclamation mark. Instead now I just use a different function in the conditional Pull Request resolved: https://github.com/pytorch/pytorch/pull/16686 Differential Revision: D13935182 Pulled By: jamesr66a fbshipit-source-id: 7525a1a829276641dbafe06734f03f6202df6b22
2019-02-01Add @ignore annotation (#16055)David Riazati1-0/+1
Summary: Adds a decorator `torch.jit.ignore` for Python functions that tells the compiler to skip over these Python values, putting a `prim::Error` in their place which always throws an exception when run. This lets you have Python-only code in your model in an explicit way, which is useful for debugging, and still be able to save/load the model. Fixes #15815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16055 Differential Revision: D13797286 Pulled By: driazati fbshipit-source-id: 29d36776608ec101649a702952fc6ff3c27655b1
2019-02-01Implement new c10 dispatcher (#16625)Sebastian Messmer24-801/+524
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16625 This is a squash of multiple PRs that refactored the old c10 dispatcher into a new one that follows the c10 dispatcher design doc. It is now unboxed and follows the Stack semantics from JIT. It also uses the runtime JIT schema instead of its own compile time schema definitions. Reviewed By: ezyang Differential Revision: D13907069 fbshipit-source-id: edcc4806ccd21474fdfb5a98516219b1956db13d
2019-02-01Revert "Fixes selection of cuDNN algorithm (#15881)" (#16484)Syed Tousif Ahmed2-188/+143
Summary: There is a regression in cudnnGet*_v7 that causes slowdown in resnet50 training. I am opening a bug with cuDNN team about this. This reverts commit 38374468832e307ca741901870914857a836dd5d. ezyang :crying_cat_face: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16484 Differential Revision: D13924755 Pulled By: soumith fbshipit-source-id: 8c719345fc443f1289539bfae630eea9224ba4a5
2019-02-01Introduce backend extensions (overriding operators on custom backends)Roy Li9-2/+276
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15153 Reviewed By: gchanan Differential Revision: D13445571 fbshipit-source-id: 62e2ebe0a6e81c4983b47cddb57ee5eb78e96708
2019-02-01Dispatch factory functions on Type (#15093)Roy Li1-25/+14
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15093 Needed for backend extensions. Reviewed By: ezyang Differential Revision: D13427897 fbshipit-source-id: d0b34b0072e597ae599bd3bc25356831d7a18d6a
2019-02-01Better bounds checks in ctcloss (#16269)Asher Mancinelli2-8/+11
Summary: Adds better bounds checks for target lengths in CTC loss, checks for integral types for target and prediction lengths, and adds tests for each, according to #15946 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16269 Differential Revision: D13847567 Pulled By: ezyang fbshipit-source-id: 5d7a975565e02baf78fe388813a1d1ef56dfb212
2019-01-31Workaround unvectorized mean implementation (#16618)James Reed1-0/+17
Summary: Workaround for https://github.com/pytorch/pytorch/issues/16617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16618 Differential Revision: D13904276 Pulled By: jamesr66a fbshipit-source-id: f8b5ea4c5f12dbc405123c9080c55b342c95bcd1
2019-01-31Add torch.backends.openmp.is_available(); fix some cmake messages (#16425)SsnL2-0/+13
Summary: 1. add `torch.backends.openmp.is_available()` 2. Improve various `cmake` outputs 3. Fix LDFLAGS not respected by `caffe2_pybind11_state_*` targets 4. Fix `MKL` warning message, and QUIET flag. 5. Fix various typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/16425 Differential Revision: D13903395 Pulled By: soumith fbshipit-source-id: d15c5d46f53e1ff1c27fca2887b9d23d0bd85b4d
2019-01-31Move outplace ops to ATen (#12413)Xiang Gao5-0/+88
Summary: So that things like below can be JITable, and available in C++ API: ```python import torch torch.jit.script def f(x, y, z): x.index_add(0, y, z) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12413 Differential Revision: D13899948 Pulled By: suo fbshipit-source-id: b0006b4bee2d1085c813733e1037e2dcde4ce626
2019-01-31Fix a lot of C++ build warnings (#16411)James Reed4-9/+8
Summary: I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411 Differential Revision: D13901006 Pulled By: jamesr66a fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033
2019-01-31Add immutable dict support (#16208)David Riazati4-1/+202
Summary: This PR adds basic support (creation and indexing) for immutable dictionaries in Script. This includes Python/string frontend support and a `IValue::GenericDict` type backed by a `std::unordered_map`. Only `str`, `int`, and `float` are supported as keys, any type can be a value. Structure is pretty similar to list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16208 Differential Revision: D13881686 Pulled By: driazati fbshipit-source-id: 29ce9835b953c3456f57bcc2bbdf7fe0cbf941c0
2019-01-31Make the miopen handle part of ConvolutionParams (#16613)Jithun Nair1-4/+6
Summary: so that it's included in the hashed key that decides whether to call Find or not. This is required to ensure that Find is run for all devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/16613 Differential Revision: D13901769 Pulled By: bddppq fbshipit-source-id: 7d29ea9e40231cd4eef80847afa1307efeb0945c
2019-01-31Back out "Revert D13596031: Improve c2-aten tensor interop and add proper ↵Dmytro Dzhulgakov9-54/+400
testing" (#16514) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16514 Original commit changeset: dc371697f14b Relanding https://github.com/pytorch/pytorch/pull/15860 - the problem was that layer_norm was using at::empty which is not yet on mobile Reviewed By: ezyang Differential Revision: D13861480 fbshipit-source-id: e2116da32bc117175c96b9151b1beba9b31eff36
2019-01-31Get more fusion after autodiff uses SumToSize (#14957)Thomas Viehmann1-1/+1
Summary: Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize. - The sum to size operator is now `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`. - `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group. - I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following: - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`, with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin). - For `type_as` the gradient might be giving the type, so just skip SumToSize, - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts. Details: - During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser. - The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step. - In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments. - In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored. Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957 Differential Revision: D13888173 Pulled By: zou3519 fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea
2019-01-31remove unused capture (#16526)Brennan Vincent1-1/+1
Summary: We don't use this in the lambda body anymore. Remove it to fix a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16526 Differential Revision: D13867043 Pulled By: umanwizard fbshipit-source-id: 4c9a9d194fdfcb63fde16823517d2c6c8e2ae93d
2019-01-31Fix cuFFT plan cache size on CUDA 10 cannot be set to > 4096 (#16384)SsnL2-13/+18
Summary: Doc doesn't need to be changed. Also clarifies two inaccurate comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16384 Differential Revision: D13886637 Pulled By: soumith fbshipit-source-id: 227385008211a6f3ad9135c54fd2d3754cc9daaf
2019-01-30Remove redundant declarations (#16463)Iurii Zdebskyi1-4/+0
Summary: As there are no checks that all the functions are actually being used, we can end up with stale entries. This diff removes unused entries from Declarations.cwrap Testing: Successful build via "python setup.py develop" Pull Request resolved: https://github.com/pytorch/pytorch/pull/16463 Differential Revision: D13885815 Pulled By: izdeby fbshipit-source-id: 4e35c2ac9196167af74dff3d4f971210721285f8
2019-01-30Use dispatch tensor for device_guard instead of first Tensor argumentChristian Puhrsch1-8/+6
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16579 Differential Revision: D13886593 Pulled By: cpuhrsch fbshipit-source-id: 0722ec61da13c2541f7de51bf5c1ecfb9a12fad2
2019-01-30Fix uninitialized data and broken broadcasting with sparse.mm and spa… ↵Gregory Chanan1-2/+4
(#16572) Summary: …rse.addmm. Fixes https://github.com/pytorch/pytorch/issues/16543. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16572 Differential Revision: D13884235 Pulled By: gchanan fbshipit-source-id: 308916051364d72f72ec56f0495c6c7c09845131
2019-01-30Move Deprecated.h to c10Edward Yang6-22/+6
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16504 Reviewed By: smessmer Differential Revision: D13860570 fbshipit-source-id: 4742dc30c78d49b0f655b4e9536f51b36a595636
2019-01-30Allow generic containers as module inputs (#16482)Elias Ellison2-0/+34
Summary: Fixes https://github.com/pytorch/pytorch/issues/16326 Previously we didn't handle module inputs which included Generic Lists. When checking whether a generic list if a subvalue of the input arg type, I currently recurse on every element of the list. This shouldn't be too slow since the innermost list will be specialized and we won't have to check it's elements. E.g. Tensor[][] -> GenericList [TensorList ]. The error message could be improved, but extracting the complete type of nested lists would have to deal with unifying types across lists / empty lists & typevars so I'm going to save that for a follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16482 Differential Revision: D13882582 Pulled By: eellison fbshipit-source-id: 3609bc572f0ee9ebf20a77ea5ebc8fa3b165e24b
2019-01-30Explicit pdist captures (#16286)Erik Brinkman1-2/+2
Summary: Per discussion with cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/16286 Differential Revision: D13883001 Pulled By: erikbrinkman fbshipit-source-id: 86f35d35fde5db67e3fbb09abc418da0116c9aac
2019-01-30CUDA histogram implementationJacie Fan4-24/+102
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15842 Reviewed By: zou3519 Differential Revision: D13868982 Pulled By: jaciefan fbshipit-source-id: bce81dc121c4538d204047506f8f14d0b4d8f905
2019-01-29Kernel gets Stack* instead of ArrayRef<IValue> (#16282)Sebastian Messmer4-61/+79
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16282 This changes the core kernel abstraction to be a function taking a stack, popping its arguments from the stack and pushing results to the stack, instead of getting arguments as ArrayRef<IValue> and returning an output IValue. Caffe2 operators need to have a way to pass in preallocated output tensors. The convention for them is to get all inputs *and* outputs on the stack and also return all of them, i.e. a caffe2 op will always have inputs == outputs. This will probably change in later diffs towards making the outputs in-arguments optional in the JIT schema. Reviewed By: ezyang Differential Revision: D13792335 fbshipit-source-id: e9cc2b5e438cc4653e1f701633a154b92b604932
2019-01-29try to get rid of tmp_install (#16414)Zachary DeVito1-0/+5
Summary: Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414 Differential Revision: D13863635 Pulled By: zdevito fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
2019-01-29Add stack & cat support for CPU Half (#16389)SsnL5-125/+130
Summary: Fixes https://github.com/pytorch/pytorch/issues/6968 Needed for #14705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16389 Differential Revision: D13861446 Pulled By: gchanan fbshipit-source-id: 7b8700b95aaf252d9669693dbddccb2302e58409
2019-01-29Revert D13596031: Improve c2-aten tensor interop and add proper testingEdward Yang9-400/+54
Differential Revision: D13596031 Original commit changeset: d20b601e06ba fbshipit-source-id: dc371697f14b3893a9164380a39e7a49d8d68ecf
2019-01-28Improve c2-aten tensor interop and add proper testing (#15860)Dmytro Dzhulgakov9-54/+400
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15860 Few changes (which are harder to split in separate diffs, so together): - make conversion explicit (as they can throw to avoid surprises) - fix tensor legacy dispatch not initialized when tensor is created on C2 side - add a bunch of invariants to enforce Reviewed By: ezyang Differential Revision: D13596031 fbshipit-source-id: d20b601e06ba47aeff2f6e8e15769840e2d46108
2019-01-28Move stack.h to ATen/core (#16247)Sebastian Messmer1-0/+105
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16247 Stack is going to be used by the c10 dispatcher. This just moves the file, also changing the namespace turned out to be more complicated than I thought, I'll leave the namespace for now. Reviewed By: ezyang Differential Revision: D13774189 fbshipit-source-id: 66aeee36425e0ea2b3a4f8159604f38572306d57
2019-01-28Remove state from schema and calling API (#16180)Sebastian Messmer6-63/+167
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16180 Only the kernel knows about its state, the caller doesn't see it anymore. Reviewed By: ezyang Differential Revision: D13744071 fbshipit-source-id: cb00ff1a881508c1b36ac4123bee1f68ca02ca9c
2019-01-28Support Tensor alias annotations for native_functions.yaml (#16239)Christian Puhrsch4-593/+1033
Summary: Adds Tensor alias annotations. This isn't a full implementation of alias annotations, but that isn't required to increase compliance with the JIT signature schema. There are some sanity checks within native_parse.py for their usage, which can also help overall correctness. Otherwise, this exists solely for further alignment between the JIT signature schema and the native_functions.yaml func schema. This gets us to ~85% matches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16239 Differential Revision: D13804133 Pulled By: cpuhrsch fbshipit-source-id: aa5750f2c7e0f08b8c35d6d8f38cb148e9629855
2019-01-28Annotate the bicubic interpolation kernels (#16449)Johannes M Dieterich1-0/+6
Summary: with the correct `__launch_bounds__` for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16449 Differential Revision: D13844111 Pulled By: bddppq fbshipit-source-id: 07ed8552a630f3a6426d9e5648c415f066991e3d
2019-01-28Op-calling API can handle state (#16177)Sebastian Messmer2-32/+51
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16177 Change the API for calling operators so that it can store state in an OpKernel object. This diff doesn't store the state there yet, that comes in a follow up diff. Reviewed By: ezyang Differential Revision: D13742889 fbshipit-source-id: 20511a9a1b9f850074e50634d4b4acf87f8c6ecd
2019-01-28CPU implementation of torch.cdist (#16168)Igor Fedan6-22/+162
Summary: cdist is used for calculating distances between collections of observations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16168 Differential Revision: D13739147 Pulled By: ifedan fbshipit-source-id: 9419c2c166891ac7db40672c72f17848f0b446f9
2019-01-28Don't initialize a new `std::vector` in a loop. (#15850)Brennan Vincent1-19/+25
Summary: Before this diff, we execute `std::vector<optional<acc_t>> buffer((unsigned)max_threads, optional<acc_t> {});` in every iteration of `foreach_reduced_elt`. Change the code to only execute that line if we need it; i.e., we are actually about to parallelize. This overhead is quite significant when we are doing a lot of small reductions in single-threaded code. ``` x=torch.randn((1024,10,1024),dtype=torch.float64) torch.set_num_threads(1) %timeit x.std(1) ``` Before (with #15845 applied): 708.25 ms After: 508 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/15850 Differential Revision: D13612960 Pulled By: umanwizard fbshipit-source-id: f5e61abfe0027775c97ed81ac09c997fbee741df
2019-01-27Fix a typo in Parallel.h (#16419)Gemfield1-1/+1
Summary: Fix a typo in Parallel.h. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16419 Differential Revision: D13833705 Pulled By: soumith fbshipit-source-id: 824ebe753e028fc8e2b5d7a51fdba98a365fd29a
2019-01-26Trace fork and join callsJames Reed1-0/+1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232 Differential Revision: D13772974 Pulled By: jamesr66a fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b
2019-01-25Remove bash from build (#16289)Zachary DeVito2-1/+8
Summary: This commit removes the dependency on `build_pytorch_libs.sh` by moving the remaining functionality that is not expressible in cmake into python. Removing the indirection through bash also removes over 300 lines of environment munging code that is incredibly hard to understand because it passes a lot of secret parameters through `os.env`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16289 Reviewed By: ezyang Differential Revision: D13821662 Pulled By: zdevito fbshipit-source-id: d658d26925e3b1169ac1e3d44a159cf8a1f0d9b1
2019-01-25Remove caffe2::ShareData (#16139)Jerry Zhang1-4/+4
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139 Original commit changeset: 4b15a4c62995 Reviewed By: dzhulgakov Differential Revision: D13677464 fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb
2019-01-24Add thread-local guard: at::AutoNonVariableTypeMode (#15939)Will Feng9-10/+88
Summary: This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable. This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939 Reviewed By: ezyang Differential Revision: D13640980 Pulled By: yf225 fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f