summaryrefslogtreecommitdiff
path: root/.gitmodules
AgeCommit message (Collapse)AuthorFilesLines
2018-04-03Expunge ATen submodule; use the in-tree copy. (#6235)Edward Z. Yang1-3/+0
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-30Merge caffe2 with pytorch.Edward Z. Yang1-9/+72
2018-03-27Reorganize third-party libraries into top-level third_party directory (#6025)Edward Z. Yang1-3/+3
- gloo, pybind11, nanopb and nccl now live in third_party. - ATen builds in aten/build rather than torch/lib/build/aten - A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools - Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-19Port ATen and JIT C++ tests to Catch2 (#5788)Luca Antiga1-0/+3
This PR addresses #5648. In particular, following the discussion at #5648: - it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils - it ports all ATen tests to Catch - it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)
2018-03-15ATen ReduceOps (#5776)cpuhrsch1-0/+7
#5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-13Revert "ATen ReduceOps (#5481)" (#5765)Edward Z. Yang1-7/+0
* Revert "ATen ReduceOps (#5481)" This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.
2018-03-12ATen ReduceOps (#5481)cpuhrsch1-0/+7
This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-02-23Caffe2 ARM ComputeLibrary integration (#2015)Jerry Zhang1-0/+3
Caffe2 ARM Compute Library Integration
2018-02-21Add onnx as a submodule (#1998)Yinghai Lu1-0/+3
2018-02-08Vendor Python dependencies of NNPACKMarat Dukhan1-0/+9
Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1917 Reviewed By: orionr Differential Revision: D6938735 Pulled By: Maratyszcza fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
2018-02-08Back out "[caffe2][PR] Vendor Python dependencies of NNPACK"Marat Dukhan1-9/+0
Summary: Original commit changeset: d0c1c7681605 Reverting due to broken OSS build due to this commit Reviewed By: bddppq Differential Revision: D6935666 fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
2018-02-07Vendor Python dependencies of NNPACKMarat Dukhan1-0/+9
Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1901 Differential Revision: D6930731 Pulled By: Maratyszcza fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
2018-01-31Vendor NNPACK dependencies with Caffe2Marat Dukhan1-4/+7
2018-01-31Remove android-cmake submoduleYangqing Jia1-3/+0
2017-11-13Adding zstd to buildYangqing Jia1-0/+3
Summary: This is in order for us to share compression ops to oss. Closes https://github.com/caffe2/caffe2/pull/1463 Reviewed By: hlu1 Differential Revision: D6319101 Pulled By: Yangqing fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe
2017-09-25Add an aten_op to contrib.Zachary DeVito1-0/+3
Summary: This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN) through the ATen tensor library. Use of the operator is described in the README. The operator itself is generated from ATen's Declarations.yaml file which describes its public API. Closes https://github.com/caffe2/caffe2/pull/1235 Reviewed By: dzhulgakov Differential Revision: D5876944 Pulled By: zdevito fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a
2017-09-12Remove references to cnmemPieter Noordhuis1-3/+0
Summary: TSIA Reviewed By: Yangqing Differential Revision: D5815624 fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed
2017-09-05Add nanopb submodule.Edward Z. Yang1-0/+3
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05Add pybind11 as a submodule.Zach DeVito1-0/+3
2017-08-30Add gloo submodule.Zach DeVito1-0/+3
We make gloo a submodule because it contains submodules itself, and Git cannot handle subtrees with nested submodules. Fixes https://github.com/pytorch/pytorch/issues/2426
2017-08-09Add build support for opengl and latest nnpack.Yangqing Jia1-1/+1
Summary: (1) Changed android-cmake to use Yangqing/android-cmake, which supports NEON fp16. (2) Added cmake scripts to build opengl. (3) Updated nnpack to master, and changed the corresponding build files. Closes https://github.com/caffe2/caffe2/pull/1061 Differential Revision: D5591387 Pulled By: Yangqing fbshipit-source-id: 1d3f28511d33c09df6ecef5041448ac9a3246601
2017-03-24Add nnpack specific dependencies under third_partyYangqing Jia1-0/+12
2017-03-24CMake support for Gloo dependencyPieter Noordhuis1-0/+3
Summary: This also requires a change to cmake/External/nccl.cmake to use the static NCCL binary instead of the shared object. When the Caffe2/Gloo build uses the bundled NCCL version it should be packaged up in the resulting libraries and not cause another runtime dependency on a library that has to be installed separately. Closes https://github.com/caffe2/caffe2/pull/218 Differential Revision: D4769926 Pulled By: pietern fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
2017-02-01Add nnpackYangqing Jia1-0/+3
2017-01-27Add ios-cmakeYangqing Jia1-0/+3
2017-01-26Cmake for androidYangqing Jia1-0/+3
Summary: Added cmake for android script under scripts, and set up the travis contbuild target. Closes https://github.com/caffe2/caffe2/pull/109 Reviewed By: bwasti Differential Revision: D4468767 Pulled By: Yangqing fbshipit-source-id: 709f3eb6be24727b0a989d0901dbf377871b122a
2017-01-04add back third_party/protobuf, but it won't be used in normal builds.Yangqing Jia1-0/+3
Pinned protobuf to v3.1.0 Removed the USE_SYSTEM_PROTOBUF option in cmake. It is no longer used.
2017-01-04Removed protobuf from third_partyBram Wasti1-3/+0
2016-12-27added google/benchmark and tidied up Cuda buildbwasti1-0/+3
2016-10-07nervana build filesYangqing Jia1-0/+3
2016-09-16move third_party/gtest to use git submodules. As a result the folder name is ↵Yangqing Jia1-0/+3
now googletest
2016-09-16move third_party/eigen3 to use git submodules. As a result the folder name ↵Yangqing Jia1-0/+3
is now eigen.
2016-09-16move third_party/google/protobuf to use git submodules.Yangqing Jia1-0/+3
2016-08-04add submodule cubYangqing Jia1-0/+3
2016-08-02more build updates:Yangqing Jia1-0/+6
(1) nccl submodule, cnmem submodule (2) mpi ops fallback test (3) a bit more blob interface (4) fixed tests (5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts (6) In the build system autogen __init__.py instead of having manual rules just to copy over an empty __init__.py.
2016-07-21changes to make c2 build.Yangqing Jia1-0/+3