summaryrefslogtreecommitdiff
path: root/setup.py
AgeCommit message (Collapse)AuthorFilesLines
2018-02-12Improve Variable interface (#5127)Peter Goldsborough1-0/+1
* Improve Variable interface * Address comments from @apaszke and @colesbury * string ::operator= is not noexcept * Remove ir.h from tracer_state.h to improve build times * Make Variable a struct and pack SavedVariable fields * Implement as_variable_ref * grad_fn_ptr() -> grad_fn_unsafe() * Reduce hackiness of set_type hack * Include variable.h and edge.h in tracer_state.h because it uses them * class Variable -> struct Variable because Windows cant even * Make Variable::output_nr uint32_t instead of int * Add comment about tracing state * Replaced more static_cast<Variable&> and improve docs * Remove SavedVariable destructor and construct members in init list * Clarify docs for Variable * Variable::set_version -> set_version_counter
2018-02-09Enable scalars. (#5158)gchanan1-1/+2
* Enable scalars. * Avoid variable name shadowing in list comprehension, because it rebinds in python2, but not python3.
2018-02-07Experimental jit script (#5074)bddppq1-0/+3
2018-02-02Initial GraphExecutor Implementation. (#4982)Zachary DeVito1-0/+1
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
2018-02-01Use distutils.copy_tree/copy_file instead of shutilPeter Goldsborough1-11/+5
2018-02-01[cpp extensions] Create torch.h and update setup.pyPeter Goldsborough1-16/+44
2018-01-28[JIT] Add simple shape analysisZach DeVito1-0/+1
This quick and dirty shape analysis just makes up fake tensors, and runs them through ATen to do shape propagation.
2018-01-26Use variadic templates instead of initializer lists and overloads. (#4772)Edward Z. Yang1-0/+1
Suppose you are given a list of arguments, each of which may be Tensor or TensorList. How can you write a function that can treat these arguments uniformly as a list of tensors? This patch solves the problem using variadic templates. Why variadic templates? Use of variadic templates means anyone working with this code has to understand universal references, perfect forwarding, parameter packs and some idioms of C++ template design. However, I argue that variadic templates are the *right* tool for supporting the implementation of functions which must take an arbitrarily heterogenous set of inputs. We were able to limp by in old code because, for the most part, tensor inputs were homogenous, but this is no longer the case for some non-primitively differentiable functions; and with the upcoming cuDNN RNN in ATen PR, will no longer be the case for primitively differentiable functions too. There are two parts to the PR. First, we add torch/csrc/utils/variadic.h, which defines a mix-in IterArgs that takes any class which supports operator(), and augments with a new variadic function apply() which calls operator() on each argument passed to it. In an original draft of the patch, I wrote the recursion for each parameter pack from scratch for each function; however, it turns out there are no fewer than seven instances where we need this idiom, and the mix-in reduces the lines of code, and also helps centralize the most important (and easy to forget) boilerplate for perfect forwarding. To verify that IterArgs is compiled away into an unrolled form per call site, I inspected the assembly on some synthetic examples. Next, we modify the following functions to make use of IterArgs: - compute_requires_grad - Function::flags (Variable and Tensor variants) - flatten - isTracing - count_tensors / count_variables Finally, the tuple packer is rewritten to be variadic, although we cannot make use of IterArgs (since we are given a tuple). It might make sense to refactor the code into a generic piece which invokes a function with the arguments specified by a tuple, and then an appropriate IterArgs, but we leave this for future work. One thing to note: we cannot write a function with overloads for both Tensor and Variable, because both ArrayRef<Variable> and Tensor have implicit conversions from Variable, making such an overload ambiguous. It may be interesting to remove the implicit conversion from ArrayRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-25fix binary version scheme to be PEP compliant (#4847)Soumith Chintala1-2/+4
2018-01-24Enabling Infiniband support for Gloo data channel with auto IB detection (#4795)Teng Li1-2/+5
2018-01-23[JIT] add create_autodiff_subgraphs (#4822)Zachary DeVito1-0/+1
This pass splits differentiable subgraphs into their own Node, similar to a fusion group. This initial implementation does not create optimal subgraphs, but it works well in the case where most things are differentiable, and has the building blocks (`mergeNodes`) to extend to the better implementation.
2018-01-23Enable scalars if compiled with WITH_SCALAR environment variable. (#4806)gchanan1-0/+5
* Enable scalars if compiled with WITH_SCALAR environment variable. We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on for development purposes and to be able to write code that works both with and without scalars enabled. WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn. * Fix unsqueeze. * Fix wrap dim, wrapping with Scalar.
2018-01-21Check submodules only in build_deps (#4770)Adam Paszke1-10/+10
2018-01-20Scaffolding for source-to-source AD in the JITAdam Paszke1-0/+1
2018-01-18Move broadcast and broadcast_coalesced to C++Adam Paszke1-0/+3
2018-01-18Base for pure C++ NCCL interfaceAdam Paszke1-0/+1
2018-01-17Bind functions with out= arguments in VariableType (#4565)Sam Gross1-0/+1
This adds overrides in VariableType for the xxx_out ATen functions and implements Python bindings. There is no support for automatic differentiation. If any of the inputs (or outputs) requires grad, then the function will throw an exception unless it's running in "no-grad" mode. The bindings for calling torch.xxx functions on Variables are moved to a different object. Previously, they were static method on VariableBase. This change prevents users from accidentally calling static methods as if they were instance methods.
2018-01-17Implement MM fusion (MM with add reduction tree) (#4615)Adam Paszke1-0/+1
Implement MM fusion (MM with add reduction tree) A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)
2018-01-11Fixed setup.py to handle CUDNN_LIBRARY envvar with aten (#4597)Jon Crall1-17/+20
* Fixed setup.py to handle CUDNN_LIBRARY envvar with aten * undo changes * Added CUDNN_LIBRARY to bat file
2018-01-04Delete a pile of dead code (#4295)Edward Z. Yang1-1/+0
* Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-30Enable ninja during python build process for MSVC (#3993)peterjc1231-0/+2
2017-12-29Support NO_NNPACK environment variable (#4401)Edward Z. Yang1-0/+3
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21Batchnorm in ATen (#4285)Edward Z. Yang1-2/+0
* Batchnorm in ATen This commit moves BatchNorm derivatives into ATen, eliminating torch/csrc/autograd/functions/batch_normalization.cpp Some refactoring along the way: - Functions got renamed to remove _forward from their names - CuDNN batchnorm forward was modified to return save_mean/save_std instead of take it as parameters. To avoid returning undefined Variables, these return (small) uninitialized tensors when they are not used. - THNN batch normalization takes care of resizing save_mean and save_std on forward. - There are some shenanigans re batchnorm backwards in eval mode. I'm tracking that in #4284 - I decided not to introduce buffers as a proper concept in ATen, which means that tensors like running_mean/running_var are variables in ATen. This meant there needed to be some adjustments to how we *trace* such variables; the new strategy is if we can't find a Value for a variable, we look and see if we have a Value for the buffer pointed to by the variable, before finally falling back on constant. - This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing the number of parallel jobs. - Stop using std::string when it's not necessary. - Remove training parameter from cudnn_batch_norm_backward, because it doesn't make sense; cuDNN doesn't implement the math for evaluation mode batchnorm backwards. - batchnorm_double_backward is now in an anonymous namespace, as it no longer needs to be called from torch/csrc Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-20Convolution derivatives in ATen (#4116)Edward Z. Yang1-19/+0
* Convolution derivatives in ATen This PR introduces ATen implementation of convolution, which dispatches to THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose this function out of the various forward-backward pairs of specific implementations, rather than write a monolithic function with backwards (which is what we did before because the boilerplate of doing it otherwise would have been very high.) The new API provides the following functions: - _convolution, which is a fully generic, native convolution implementation that dispatches to various other convolution implementations depending on input characteristics. This is prefixed with an underscore because it explicitly takes benchmark, deterministic and cudnn_enabled which are implementation details for CuDNN. The intent is to eventually provide a convolution that reads these parameters out of the context using #4104. - _convolution_nogroup is a convolution implementation for non-CuDNN algorithms which don't support group convolution natively. - _convolution_double_backward is the generic double-backwards implementation for convolution. In more detail: - Most functionality from torch/csrc/autograd/functions/convolution.cpp has been moved into aten/src/ATen/native/Convolution.cpp - We continue to make use of ConvParams, but we now construct the parameters upon entry to a function from the function signature (which does not use ConvParams; having convolution take ConvParams directly would require teaching the code generator how to accept these as parameters, complicating ATen's API model) and destruct them when making subprocedure calls. - I introduce a new idiom, input_r, which represents a const Tensor& reference, which will subsequently be assigned to a local Tensor input. This is helpful because a lot of the existing algorithms relied on being able to assign to locals, which is not permitted with a const reference. - The native argument parser now supports std::array<bool,2> inputs (NB: there MUST NOT be a space; this is the same hack as is applied to derivatives.yaml) - Native parser now supports Tensor? arguments, which indicates a nullable tensor. Previously this function was only used by NN methods. - Documentation updates on THNN library - I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up with the backward argument list. This makes it possible to write derivative for conv3d which previously was not supported (commented out in derivatives.yaml) - Extra double_backward declarations for all convolution backwards functions was added. - You can now use the syntax Tensor? in native_functions.yaml to indicate that a tensor argument is nullable. There are adjustments to propagate this to the Python argument parser. - NNPACK was ported to ATen, and ATen now builds and links against ATen if possible. New AT_NNPACK_ENABLED macro. The nnpack functions are nnpack_spatial_convolution. - Some modest CuDNN convolution refactoring to remove _forward from names. - There's a new cudnn_convolution_backward function to deal with the fact that CuDNN convolution double backward requires you to have computed all gradients in one go. - Variable set_flags now checks if the tensor is undefined, fixing a silent memory corruption. - checkSameType updated to not raise an exception if called with Variable arguments - "no ATen declaration found for" error message is improved to say what available declarations are - make_variable now accepts undefined tensors, and returns an undefined tensor in this case.
2017-12-20Add build support for Python 2.7 using MSVC (#4226)peterjc1231-0/+4
2017-12-18Replace Variable.volatile with torch.no_grad() (#3970)Sam Gross1-0/+1
This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627
2017-12-18Enable ext build for Windows (#3935)peterjc1231-1/+1
* Enable ext build for Windows * Include the static libs to make the compiling of the extension easier
2017-12-18Implement Variable.cuda and Variable.type using ATen (#4139)Sam Gross1-0/+1
* Implement Variable.cuda using ATen This adds an optional async flag to Tensor::copy_, which attempts to do a non-blocking copy if the one of the tensors is in pinned memory and the other is a CUDA tensor. * Perform cross-device copy in CopyBackwards Also call torch.cuda._lazy_init() from Variable.cuda() * Implement Variable.type via ATen * Changes from review: - remove copy_out - remove unnecessary include - fix default device for .cuda() * Combine if statements in dispatch_type
2017-12-15Trace ATen native functions as themselves, not their implementations. (#4127)Edward Z. Yang1-0/+1
* Trace ATen non-primitive functions as themselves, not their implementations. Previously, if I invoked an ATen non-primitive function foo, which in turn called subfoo, I would always see 'subfoo' in the trace (e.g., tracing 'inlines' all of these operations.) Such inlining is bad for ONNX (and can be bad for optimization) as it prevents high-level optimizations from taking advantage of the structure. It might be right to inline, but give the optimizer a chance to work before inlining happens! The implementation here is surprisingly simple, because it uses the "DCE trick". Essentially, it doesn't matter if the constituent calls perform tracing, because you can always trace it again, and override the trace nodes associated with the returned variables. The original trace becomes dead and can be DCE'd. While implementing this, I also refactored how 'isTracing' and 'trace_outputs' works: - isTracing was previously a single function with overloads for both Tensor and Variable arguments. Unfortunately, such overloads are not safe, because of how C++ implicit conversions work. You would think that C++ should never confuse an overload for Variable with ArrayRef<Tensor>, but this is exactly what can happen: Tensor is convertible to both Variable and ArrayRef<Tensor>, thus it's ambiguous and C++ doesn't like it. The last time I ran into this problem, I applied initializer lists to everything and called it a day. A more robust fix is to separate out the Variable and Tensor overloads, which I have done in this patch. - trace_outputs was fed as an initializer list, which doesn't work when you have heterogenous inputs. So instead we first feed everything through 'flatten', which has overloads for each of the argument patterns in ATen, which then goes on to the recordTrace (which takes an ArrayRef). This is *no less efficient*, because we were allocating a vector anyway (to do the conversion from vector of Tensor to vector of Variable). This fixes mean that 'index' can properly be traced... although the JIT still does not support it. A failing test case has been added to this effect. Some knock-on effects: - The fuser now knows about chunk as well as split. They're pretty similar so there is no problem. - There is a new 'canonicalize' pass in the JIT which renumbers a graph so that all structurally equivalent graphs render the same. - We run DCE before the fuser tests, to make sure dead nodes don't block fusion. - There are new ONNX exports for the newly introduced higher level ATen operations. This includes type_as (no-op case only), chunk, select. Zach didn't like the extra use of 'native' in the new codegen, so we've introduced a new concept, 'abstract'. An abstract function is one that is implemented in derived types (e.g., CPUDoubleType), where as a concrete one is implemented in the base type (Type). Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-15Fix issues with Windows 7 & 10 CPU build (#4065)Will Feng1-6/+7
2017-12-11Implement Variable.new (#4080)Sam Gross1-0/+1
2017-12-07Implement apply_, map_, and map2_ in Variable (#4057)Sam Gross1-0/+1
2017-12-06Implement Variable.from_numpy (#4043)Sam Gross1-1/+1
Implements from_numpy using ATen tensors. Variable.from_numpy is a convenient placeholder for the variant that returns Variables until we merge Tensor and Variable. The behavior is slightly changed: - from_numpy() on an empty array now returns an empty tensor instead of throwing an exception. The shape may not be preserved. - CharTensor(ndarray) used to throw an exception. It now copies the ndarray. Copying is implemented via ATen toType.
2017-12-06Implement Variable.tolist() (#4038)Sam Gross1-0/+1
Tensor.tolist() now dispatches through Variable.tolist() so that we only have one code path to test until we merge Variable and Tensor.
2017-12-05Implement Variable.numpy() (#4006)Sam Gross1-0/+1
Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy() Variable.numpy() is disallowed on variables that require grad.
2017-12-04Use ninja as the cmake backend as well.Zachary DeVito1-0/+6
2017-12-04Add a CPU Fuser (single core)Zach DeVito1-1/+1
This adds a simple fusion backend for the CPU. * Refactors CompiledFusionFunction to have two subclasses that handle the compilation details of each backend. * emit-compile-link-run cycle for the CPU * simple single core loop to run the operation * lift CUDA-only restrictions in the fuser, checks that fusion groups are only on a single backend.
2017-12-03Fix warnings and add alert to enable ninja when developing.Zach DeVito1-0/+4
2017-11-30CuDNN bindings rewrite (into ATen) (#3666)Edward Z. Yang1-11/+1
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra The executive summary is that this moves the torch/csrc/cudnn library into ATen, adding a number of new cudnn_ methods to ATen for batchnorm, convolution, affine grid generator and grid sampler. ATen infra changes: - TensorGeometry was moved to ATen - TensorGeometry was modified to make its interface resemble that of Tensor; in particular, sizes is no longer a field, it's a method. - AT_CUDA_ENABLED macro is set via ATen/Config.h header which is generated at cmake configure time. Fixes https://github.com/zdevito/ATen/issues/168 - Change AT_CUDA_ENABLED macro to be a function macro, so that we error if it is not defined - Introduce a new TensorArg class, which is a Tensor plus a little metadata. This helps us give good error messages when checking dimensions/shapes of tensors. Fixes https://github.com/zdevito/ATen/issues/169 - Also introduce a TensorGeometryArg class, for when you don't need the actual tensor data (which is most of the time.) - Add ATen/Check.h, which contains a number of utility functions for testing shapes, types and devices of input tensors. This will be particulary useful for native methods, which don't get code generated input testing code. These functions take a 'CheckedFrom' argument, at the moment just a string, which specifies some extra information about what function was doing the actual checking; this greatly improves error messages. - Many check functions take initializer lists, which let you test that all tensors have some property. This API is peculiar, in that we IGNORE undefined tensors in this case. This is handled by filterDefined. - Add AT_CUDNN_ENABLED macro - CuDNN linking from ATen was improved; for example, we now actually add the CuDNN headers to our include path. - Add some missing override specifiers to some methods - We now actually build tests with CUDA functionality accessible (previously, AT_CUDA_ENABLED was not defined, meaning that the headers were missing all CUDA-only functionality.) - Native functions now support giving explicit names to return outputs in yaml. This makes it possible to hook into the NN autogenerated derivatives codepath using native functions. CuDNN rewrite changes: - torch/csrc/cudnn now uses ATen (rather than passing around THVoidTensor) and lives in ATen. This lets us remove tensorPointer shenanigans. The functions are exposed to ATen as native functions described in aten/src/ATen/cudnn/cuDNN.yaml - ATen now builds and links against CuDNN when enabled. The cmake package script was taken from Caffe2. - Some header reorganization was done to help reduce dependencies on headers (this reorg is no longer used but I've kept it) - Rename CHECK to CUDNN_CHECK - Rip out old shape/type testing code in favor of modern ATen/Check.h interface using TensorArg. In many cases, increase the robustness of the checking code. - Change the inputs of the public facing functions, so that they can be bound by ATen - Delete THCState*; this is retrieved from the global ATen context - Delete cudnnHandle_t, this is retrieved from the global Handles.h - Delete cudnnDataType_t, this is retrieved from the Tensor type - Delete Convolution class, instead its constituent arguments are passed individually - Change functions to return tensors, rather than take an appropriately sized output tensor as an input. - Redo how transposed convolution / backward convolution is implemented (knock on effect of returning tensors). Previously it was assumed that you would always pass an appropriately sized output tensor, but we don't want to do this anymore. For backwards, we instead give the desired output tensor (input, really) size, because that is readily available. For *transposed* convolution, however, we take output_padding, and otherwise do the shape calculation. - Redo how legacy group convolution is implemented (knock on effect from porting cudnn to ATen.) Previously, group convolution was implemented by manually constructing sizes and strides and then outputting appropriate, with macros switching between individual groups and all-at-once based on CuDNN version. Now, the code looks exactly what you'd expect: there's a top-level wrapping function that supports group convolution no matter the version of CuDNN, and a low-level wrapper which supports only what CuDNN supports. The top-level function conditions on CuDNN version, and invokes the low-level interface 1 or n times. - There is now a debugging printer for tensor descriptors. - Convolution struct is replaced with ConvolutionArgs, which is not part of the public API but is used internally to conveniently pass around all of the arguments needed for Convolution. - Add some constexprs for well-known dimensions, reduce amount of magic numbers in code. - Put 'deterministic' in to ConvParams. Fixes #3659 - Lots more comments. - Some pessimizations, in the name of code clarity: - The descriptors are initialized on every invocation of convolution forward/backward. Previously, the descriptors were cached, so that you didn't have to initialize them again on backwards. This is difficult to support in the ATen interface so I didn't support it. - Legacy group convolution initializes its workspace for *every* group it performs. I did not feel motivated to fix this because the legacy codepath is already quite slow. - Affine grid generator and grid sampler automatically call contiguous on their arguments as necessary. - Batchnorm input checking is greatly beefed up, it now checks for the following input characteristics: - Definedness - GPU location - Type - Contiguity - Size PyTorch binding code changes - batchnorm now uses consistent var/data naming - batchnorm and convolution make use of new ATen bindings - Affine grid generator and grid sampler make use of ATen CuDNN bindings via derivatives.yaml. This means I had to restructure the code a little, since the THNN bindings still go through a legacy Python class. - I fixed some warnings: - s/friend class/friend struct/ on InterpreterStateImpl - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp - Removed unused pack_list on Scalar Signed-off-by: Edward Z. Yang <ezyang@fb.com> GCC 4.8 buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> Add TensorGeometry to ATen.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> CUDNN_CHECK Signed-off-by: Edward Z. Yang <ezyang@fb.com> Update TODO comment Signed-off-by: Edward Z. Yang <ezyang@fb.com> Delete return in cudnn_grid_sampler Signed-off-by: Edward Z. Yang <ezyang@fb.com> s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g Signed-off-by: Edward Z. Yang <ezyang@fb.com> Don't allocate a new vector when filtering defined. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Remove Check overloads, convert to pass references. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Some more microbenchmarking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30Add support to emit compile_commands.json from CMake/ninjaZachary DeVito1-0/+15
files.
2017-11-30Significantly speed up the incremental build.Zachary DeVito1-79/+64
This commit adds code to setup.py to use ninja to manage C++ and code generator dependencies rather than use raw setuptools. This is based on similar code added to ONNX. Enabled optionally when ninja is installed. On my computer speed for a do-nothing build drops from 10s to 1.5 seconds. Speed of other compilation steps is significantly improved as well. Dependencies are tracked correctly so the need for ccache is reduced.
2017-11-29Add interpreter support for Handles/PythonOp/CppOp (#3866)Zachary DeVito1-1/+1
* Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```
2017-11-21Implement indexing in ATen (#3725)Sam Gross1-0/+1
Implements basic and advanced indexing using ATen tensors/variables. Basic indexing is translated at the Python-binding level (python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls. Advanced indexing is implemented in ATen in terms of take() and put() calls.
2017-11-20Reflect renaming of OS X to macOS (#3795)Scott Stevenson1-1/+1
2017-11-19Correct JIT interpreter autograd function (#3760)Adam Paszke1-0/+1
2017-11-16Add cudaEvent support to the profiler (#3734)Zachary DeVito1-1/+1
* Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place
2017-11-15fix OSX cuda build (#3722)Soumith Chintala1-1/+5
2017-11-13Add a JIT interpreter (#3634)Zachary DeVito1-0/+1
* Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests
2017-11-11Bump version in master (#3605)Sam Gross1-1/+1
2017-11-11Fix setup scripts for Windows CUDA buildspeter1-1/+4