summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2018-05-01Add support for MKLDNN on Windows (#7130)peterjc1231-9/+15
2018-05-01Add more warnings to C++ API build (#7123)Peter Goldsborough2-24/+34
Enables more warnings in the C++ API build. Fixed a bunch of things in torch/csrc/. Mostly taken from c10 * Enable -pedantic for C++ build * Enable more warnings * Include CUDA and library headers with -isystem * Fix sign-promo warning
2018-05-01Make AT_ASSERT/AT_ERROR non-printf based, other tweaks (#7104)Edward Z. Yang2-18/+14
* Make AT_ASSERT/AT_ERROR non-printf based, other tweaks - AT_ASSERT/AT_ERROR don't take printf strings anymore; instead, they take a comma-separated list of things you wanted to print (bringing it inline with Caffe2's conventions). Instead of AT_ASSERT(x == 0, "%d is not zero", x) you write AT_ASSERT(x == 0, x, " is not zero") This is done by way of a new variadic template at::str(), which takes a list of arguments and cats their string reps (as per operator<<) together. - A bunch of the demangling logic that was in Error.h is now moved to Error.cpp (better header hygiene.) Also, demangle has been moved out to its own helper function, and also a new helper demangle_type (from Caffe2) added. - A bunch of AT_ASSERT converted into AT_CHECK, to more properly convey which checks can be caused by user error, and which are due to logic error in ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix test failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More fixes. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * One more fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Try harder Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-30Update C++ API tests to use Catch2 (#7108)Peter Goldsborough2-21/+31
* Update C++ API tests to use Catch2 * Update download_mnist.py to be less verbose
2018-04-30Merge autogradpp into PyTorch (#7074)Peter Goldsborough2-7/+113
* Dump autogradpp into PyTorch * Fixed up CMake for autogradpp/C++ API * Made cereal a submodule * Change search location of autogradpps mnist directory * Add test_api to CI * Download MNIST from the internet instead of storing in repo * Fix warnings
2018-04-30clamp now has subgradient 1 at min and max (#7049)Tongzhou Wang1-3/+6
* subgradient 1 at min and max for clamp * clamp max and clamp min too * add comment
2018-04-30only Tensors of floating point dtype can require gradients (see #7021) (#7034)Thomas Viehmann2-0/+6
2018-04-29[jit] Fix handling of IntList[k] parameters (#6965)Marcin Elantkowski1-10/+41
* squash commits * emit additional declarations and handle positional arg. case * apply minor tweaks * py-2 fix * Address Tom's comments * move logic to gen_jit_dispatch, start adding tests * add test * address review comments * address review comment * fix build issue. change argument indices to argument names. Get rid of deepcopy * py-2 flake8 fix
2018-04-29Add max pooling support to EmbeddingBag (#5725)Ethan Steinberg1-1/+1
* Add max mode support to EmbeddingBag * Lint fix * Fix compilation issue on other platforms * Rebase + don't waste memory when not in max mode * Oops, missed a spot * Fix whitespace from merge * less precision * Lower precision to avoid spurious failures * Minor typo * Switch to size()
2018-04-29Don't build THD/master_worker if not explicitly requested (#7081)Luca Antiga1-0/+9
2018-04-27Fixes for interpreter and ONNX export for translation (#7044)James Reed1-1/+1
Fixes for interpreter and ONNX export for translation Address comments
2018-04-28[WIP] Enable WERROR in tests (#6539)Peter Goldsborough1-0/+6
* Enable WERROR in tests * Also set WERROR=1 for cpp_build in CI * Enable Werror after the compiler checks * Remove -DWERROR because its picked up from the env var * Had to fix some errors in aten/contrib/data * Allow an uninitialized variable in ReduceOpsKernel.cpp * Use CUDNN_DATA_UINT8 in cuDNN type string conversion * Fixes and use target_compile_options * Fix uninitialized variables in THNN * Include Python.h earlier in tensor_types.cpp * Use CUDNN_VERSION 7100 instead of 7000? * More Python.h includes * Make switch case in common_subexpression_elimination.cpp exhaustive * Build with WERROR=0 just to see all the warnings * Remove some Python includes * Enable WERROR=1 again * Bring back switch case default
2018-04-27Prevent stack overflow on deletion of deep graph (#6873)Richard Zou1-1/+1
* Prevent stack overflow on deletion of deep graph Fixes #5534. Sometimes one can end up with a very big computation graph of Functions and Edges. Each std::shared_ptr<Function> contains a list of Edge, and each Edge contains a std::shared_ptr<Function>. Deleting a std::shared_ptr<Function> can trigger the recursive deletion of other std::shared_ptr<Function>'s: this can stack overflow if the graph is deep enough. Here is an example of such a graph: shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function> The solution here is to use a custom deleter with each std::shared_ptr<Function>. The custom deleter keeps track of how many nested deleters it is in. When this number exceeds the maximum allowed depth, the Function* to be deleted are accumulated in a per-thread delete queue and handled by one of the deleters. Example code that could trigger the overflow (set ``depth`` to something > 100000) is below. I also benchmarked the below code before/after the changes to see if there are any significant performance differences. ``` import torch def scope(): depth = 80000 x = torch.randn(9, requires_grad=True) y = x.clone() # build deeply nested computation graph for i in range(depth): y = y + y * 0.000001 %timeit -n 100 scope() 376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) Without changes: 352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` With the change, the above code is 6.8% slower. UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a * WIP * Add custom deleter to PyFunctions created by THPFunction * Address some comments; pick new value * Address some more comments * Add more complicated test; special case the windows depth constant
2018-04-27torch.arange: add numpy-style type inference. (#7016)gchanan2-1/+82
* torch.arange: add numpy-style type inference. This is a backwards-compatibility breaking change. * Fix flake8. * Use at::optional. * Remove unneeded header files. * Use reference wrapper. * Update arange for test. * Address review comments.
2018-04-26Enhance diagonal (fixes #6479) (#6718)Thomas Viehmann3-1/+11
* Enhance diagonal This patch - adds Tensor.diagonal to complement torch.diagonal - implements diagonal natively in ATen - makes diagonal a view - implements taking arbitrary diagonals - implements diagonal backward instead of referring to the (more limited) diag * add tests, copy diagonal code to backward for double differentiability * improve tests and doc comment. Thank you, Adam! * Mark diagonal as view function in gen_autograd.py, use simple backward.
2018-04-26Fix forward and backward for norm/renorm with infty norm (fixes #6817) (#6969)Thomas Viehmann1-0/+3
2018-04-25Consistently check 'out' variants against specified dtype/layout/device ↵gchanan2-8/+13
parameters. (#6973) We were previously doing this in the most common cases, but not consistently.
2018-04-25implement gamma cuda (#6855)Thomas Viehmann1-1/+1
* Refactor standard_gamma and implement CUDA gamma sampling * Attempt fixes for AT_CUDA_ENABLED changes * Gamma cuda and cpu forward as ATen native * implement standard_gamma_grad_cuda * update native_test.cpp, try to fix windows and various cuda version compiles * searching a windows fix via CI... use std:: for math * casting some constants in the calculation, compute at float for half precision * whitespace fixes * add acctype to do half->float computation, include HALF in generation, cast locally rather than tensors * fix cuda8 half compilation * always use scalar_cast with CUDACC, lock CPU generator, CPU acctype = double\nThank you for your review comments!
2018-04-24[aten] Move submodules to third_party (#6866)Orion Reblitz-Richardson2-2/+2
* [aten] Move submodules to third_party * [aten] Update aten_mirror.sh script for third_party * [aten] Move ATen submodules def to root and rename * [aten] Update cpuinfo cmake build * [aten] Fix cpuinfo cmake build * Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1 * [aten] Fix JIT test reference to catch
2018-04-24Fix ATen .travis.yml setup (#6860)Edward Z. Yang1-1/+12
- ATen repo now has a new top-level, so Travis script has to be adjusted to (1) be moved to the top-level and (2) cd into the aten directory before doing anything. - Unfortunately, this makes the import script even slower, because I'm banging on the entire index every commit. If anyone has better suggestions for how to twiddle the index. One possibility is to fold the ATen build into the base\ .travis.yml but only activate it when a file is missing (and then filter out that file.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-23fix SVD backward on non-square matrices when some=False (#6870)Tongzhou Wang1-9/+15
2018-04-20fixed error message (#6820)Dr. Kashif Rasul1-1/+1
2018-04-19Check in ATen mirror script. (#6762)Edward Z. Yang1-0/+22
Fixes #6556. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-19Add a requires_grad_() function to tensors. (#6771)gchanan1-0/+22
2018-04-18Sort declarations when generating Python bindings (#6701)Adam Paszke1-1/+42
* Sort declarations when generating Python bindings This helps resolve ambiguities in argument parsing according to any rules we will need. For now, this allows us to make scalar operations more conservarive wrt. argument types, but makes them commutative again. * Fix inconsistencies between mod with tensor and scalar * Fix a stupid mistake
2018-04-18Add mutex to THC random number generator (#6527)Will Feng1-2/+2
* Add mutex to THC random number generator * Add test for CUDA RNG multithread * fix lint * Rename gen_state to state and remove unnecessary mutex lock * Remove RNG test from cpp_extensions * Add CUDA RNG test to libtorch * Build test_rng only if CUDA exists * Move test to aten/src/ATen/test/ * Separate ATen build and test, and run ATen test in CI test phase * Don't test ATen in ASAN build * Fix bug in ATen scalar_test * Fix bug in ATen native_test * Add FIXME to some CUDA tests in scalar_tensor_test * Valgrind doesn't work well with CUDA, seed the CPU and CUDA RNG separately instead
2018-04-18Implement torch.einsum (fixes #1889) (#6307)Thomas Viehmann2-0/+2
* start at generic trilinear * Implement einsum (fixes #1889) This provides a simple implementation of einsum. It is built on top of the work for computing bilinear (#6110). It uses a naive left-to-right resolution at the moment. Autograd is able to differentiate by itself. The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)). * add tests and docs * fix flake8 * clean diff * rebase on current master to resolve conflicting String wrapping * clean up after rebase * better commentary in einsum and sumproduct_pair * don't say fixme if it's fixed and rename num_outputs to num_output_dims * adapt python wrapper to use std::string instead of String to avoid typedef at::String * typos and some vector to array conversion * fix accidental python<->python3 change * really fix bad rebase
2018-04-16Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573)gchanan5-11/+19
* Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod. By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types. * Don't use optional<ScalarType>, because the jit can't handle it yet. Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>. * Fix keepdim with out parameters. * Fix _cudnn_rnn_flatten_weight. * If dtype is provided to an out function, make sure it matches the dtype of the result. * Fix typo.
2018-04-16Fix bilinear performance regression (#6110)Thomas Viehmann2-0/+15
The current implementation of bilinar uses a matrix multiplication approach. This creates a large intermediate matrix (batch * output dimension * input dimension). Relative to the previous pure python approach, this caused severe performance regression (600ms vs. 18ms for 300x100x200 weights and a batch of 50 on CPU, and also quadratic memory). The attached change restores the performance using the previous strategy of looping over output features. It implements forward, backward, and double backward as native ATen code. Credits: Martin Tutek reported the regression and pinpointed the problem Adam Paszke patiently answered my questions about ATen I would not have been able to prepare this without you, thank you! I referenced the old python implementation, used a python version of the naive implementation, and coded manual functions etc. The tests have gradgradcheck etc. * fix memory use of native bilinear * bilinear double backward * Move bilinear_double_backward to Functions.cpp Addresses review comment by Tongzhou Wang. Thank you! * add WrapDimUtilsMulti.h * start at generic trilinear * move to generic trilinear * catch up on dim_list_to_bitset * switch bilinear to use _trilinear implement _trilinear_backward * add comments to Linear.cpp, move _trilinear in yaml
2018-04-16Make dtype in .to positional rather than kwarg only (#6628)Tongzhou Wang1-1/+1
2018-04-16Add tensor.to(device) method. (#6588)gchanan1-0/+34
* Add tensor.on(device) and tensor.on_device_as(tensor) methods. * Rename {'on', 'on_device_as'} -> 'to'. * Fix test ordinal. * Fix device ordinal again.
2018-04-13[jit][script] Allow tuples to be re-assigned (#6538)Zachary DeVito1-0/+1
* Allow tuples to be re-assigned This commit improves our support of tuples by making them more first-class. In particular, it allows tuples to be re-assigned across loops and ifs. It does this by making them first-class values in the Graph IR, and then removing the tuples in a LowerTuples pass. An alternative approach would have added more support for desugaring tuples in the Environment object as they were emitted. Instead, the current approach was chosen anticipating a future when tuples are fully supported (including the interpreter). In that future, the current code can be completly reused with the LowerTuples pass just becoming a optimization that removes unneeded tuple allocations.
2018-04-12lowercase tools/cpp_build/libtorch/CMakeLists.txt (#6567)Peter Goldsborough1-57/+57
2018-04-12Separate cuda-ness from dtype. (#6470)gchanan6-31/+50
* Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.
2018-04-12[jit][script] Check that each builtin returns the right number of values. ↵Zachary DeVito3-24/+42
(#6492) * Fixes to the way script handles multiple values, and other minor fixes. This commit improves our handling of operators that return multiple values. Builtins are now checked so that they return the right number of values, and support for TupleValue is extended to all things that can return multiple values. This resolves issues where the compiler accepted things like: a, b = c + c This would cause the interpreter to crash. Now each operator knows how many results it will produce and can check it against the number of requested inputs. Notes: * Allow True/False literals in constant expressions * make handling of keyword constants more consistent to support True/False * make parsing constants match the way we construct constants from python * improve the error messages when accessing bad graph attributes. * switch findTensorOp to return an optional. * check that attribute types are correct in findTensorOp * Check the correct number of outputs for builtins This also changes emitExpr to return a single SugaredValue Rather than possibly returning multiple values, emitExpr now always returns a single value, which _might_ be a tuple. This approach more closely follows python making the code easier to follow. Checks for returning the right number of values are now located in the assignment operator, and occur when unpacking the tuple. We still pass `n_binders` to function calls so that calls into python know how many values they should return.
2018-04-12STFT is differentiable out of the box. Fix the regression that marked it as ↵Tongzhou Wang1-3/+0
backward-not-implemented (#6541)
2018-04-11Quote arguments only when possible (#6405)peterjc1231-2/+8
* Quote arguments only when possible * Minor fix * Add no quote conditions
2018-04-10[fft] [3 of 3] Implements backward of fft ifft rfft irfft (#5537)Tongzhou Wang2-1/+112
* change irfft signal_sizes arg to be the last * add docs for fft, ifft, rfft, irfft; update doc for stft * fix typo in window function docs * improve gradcheck error message * implement backward of fft, ifft, rfft, irfft * add grad tests for fft, ifft, rfft, irfft * fix nits and typos from #6118 * address comments
2018-04-09Fixes #6386, Use copies instead of symbolic files (#6396)peterjc1231-1/+0
* Use copies instead of symbolic files * bug fix * Remove useless item
2018-04-09[pytorch] Fix clamp is missing kwarg out (#6028) (#6418)Zhou Chang2-21/+51
torch.clamp is out from template code, add it manually, same with auto generated code.
2018-04-09Use string comparison in OS check (#6420)Luca Antiga1-1/+1
2018-04-08[pytorch] add static linkage support for CuDNN and NCCL (#6410)Soumith Chintala2-4/+15
* when linking static CUDA libs, additional dep on culibos.a * add USE_STATIC_NCCL option * add USE_STATIC_CUDNN option * remove libATen soversion * add caffe, caffe2 folders to setup.py exclude list
2018-04-07Fix typos in docs (#6389)Kento NOZAWA1-1/+1
2018-04-07Several minor fixes for Windows build (#6332)peterjc1231-1/+1
* Several minor fixes for Windows build * Use version_info instead of version
2018-04-06Simplify and extend cpp build (#6343)Peter Goldsborough5-58/+132
* Modify cpp build * Use absolute path in .jenkins/pytorch/build.sh
2018-04-06Add string-style devices to all tensors. (#6283)gchanan2-10/+17
* Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06INCULDE typofix. (#6354)Edward Z. Yang1-1/+1
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-05[ready] Implement log2 and log10 in PyTorch (#6272)Vishwak Srinivasan1-0/+6
* Implemented log2 and log10 * Re-add incorrectly removed files * Fix minor bugs * Fix log1p docs * Add a try-except for python2 math module in log2 test * Revert changes made to aten/doc/* * Fix docstring errors * Fix windows build
2018-04-04Add max_values and argmax convenience functions to ATen (#6201)Peter Goldsborough1-1/+1
* Add max_values and argmax convenience functions to ATen * Add documentation for torch.argmax/argmin and skip max_values * Add tests for argmax/argmin * Dont default the dim argument * Use dim=0 in test_torch.py for argmax tests * Implement argmin() and argmax() without dim * Call .contiguous() before .view(-1)
2018-04-03Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785)Sam Gross1-1/+5
This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>