platform/upstream/pytorch - Domain: Machine Learning / ML Framework; Licenses: BSD-3-Clause;

Age	Commit message (Collapse)	Author	Files	Lines
2018-04-03	Expunge ATen submodule; use the in-tree copy. (#6235)	Edward Z. Yang	1	-3/+0
	Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-30	Merge caffe2 with pytorch.	Edward Z. Yang	1	-9/+72

2018-03-27	Reorganize third-party libraries into top-level third_party directory (#6025)	Edward Z. Yang	1	-3/+3
	- gloo, pybind11, nanopb and nccl now live in third_party. - ATen builds in aten/build rather than torch/lib/build/aten - A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools - Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-19	Port ATen and JIT C++ tests to Catch2 (#5788)	Luca Antiga	1	-0/+3
	This PR addresses #5648. In particular, following the discussion at #5648: - it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils - it ports all ATen tests to Catch - it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)
2018-03-15	ATen ReduceOps (#5776)	cpuhrsch	1	-0/+7
	#5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-13	Revert "ATen ReduceOps (#5481)" (#5765)	Edward Z. Yang	1	-7/+0
	* Revert "ATen ReduceOps (#5481)" This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.
2018-03-12	ATen ReduceOps (#5481)	cpuhrsch	1	-0/+7
	This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-02-23	Caffe2 ARM ComputeLibrary integration (#2015)	Jerry Zhang	1	-0/+3
	Caffe2 ARM Compute Library Integration
2018-02-21	Add onnx as a submodule (#1998)	Yinghai Lu	1	-0/+3

2018-02-08	Vendor Python dependencies of NNPACK	Marat Dukhan	1	-0/+9
	Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1917 Reviewed By: orionr Differential Revision: D6938735 Pulled By: Maratyszcza fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
2018-02-08	Back out "[caffe2][PR] Vendor Python dependencies of NNPACK"	Marat Dukhan	1	-9/+0
	Summary: Original commit changeset: d0c1c7681605 Reverting due to broken OSS build due to this commit Reviewed By: bddppq Differential Revision: D6935666 fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
2018-02-07	Vendor Python dependencies of NNPACK	Marat Dukhan	1	-0/+9
	Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1901 Differential Revision: D6930731 Pulled By: Maratyszcza fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
2018-01-31	Vendor NNPACK dependencies with Caffe2	Marat Dukhan	1	-4/+7

2018-01-31	Remove android-cmake submodule	Yangqing Jia	1	-3/+0

2017-11-13	Adding zstd to build	Yangqing Jia	1	-0/+3
	Summary: This is in order for us to share compression ops to oss. Closes https://github.com/caffe2/caffe2/pull/1463 Reviewed By: hlu1 Differential Revision: D6319101 Pulled By: Yangqing fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe
2017-09-25	Add an aten_op to contrib.	Zachary DeVito	1	-0/+3
	Summary: This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN) through the ATen tensor library. Use of the operator is described in the README. The operator itself is generated from ATen's Declarations.yaml file which describes its public API. Closes https://github.com/caffe2/caffe2/pull/1235 Reviewed By: dzhulgakov Differential Revision: D5876944 Pulled By: zdevito fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a
2017-09-12	Remove references to cnmem	Pieter Noordhuis	1	-3/+0
	Summary: TSIA Reviewed By: Yangqing Differential Revision: D5815624 fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed
2017-09-05	Add nanopb submodule.	Edward Z. Yang	1	-0/+3
	Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05	Add pybind11 as a submodule.	Zach DeVito	1	-0/+3

2017-08-30	Add gloo submodule.	Zach DeVito	1	-0/+3
	We make gloo a submodule because it contains submodules itself, and Git cannot handle subtrees with nested submodules. Fixes https://github.com/pytorch/pytorch/issues/2426
2017-08-09	Add build support for opengl and latest nnpack.	Yangqing Jia	1	-1/+1
	Summary: (1) Changed android-cmake to use Yangqing/android-cmake, which supports NEON fp16. (2) Added cmake scripts to build opengl. (3) Updated nnpack to master, and changed the corresponding build files. Closes https://github.com/caffe2/caffe2/pull/1061 Differential Revision: D5591387 Pulled By: Yangqing fbshipit-source-id: 1d3f28511d33c09df6ecef5041448ac9a3246601
2017-03-24	Add nnpack specific dependencies under third_party	Yangqing Jia	1	-0/+12

2017-03-24	CMake support for Gloo dependency	Pieter Noordhuis	1	-0/+3
	Summary: This also requires a change to cmake/External/nccl.cmake to use the static NCCL binary instead of the shared object. When the Caffe2/Gloo build uses the bundled NCCL version it should be packaged up in the resulting libraries and not cause another runtime dependency on a library that has to be installed separately. Closes https://github.com/caffe2/caffe2/pull/218 Differential Revision: D4769926 Pulled By: pietern fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
2017-02-01	Add nnpack	Yangqing Jia	1	-0/+3

2017-01-27	Add ios-cmake	Yangqing Jia	1	-0/+3

2017-01-26	Cmake for android	Yangqing Jia	1	-0/+3
	Summary: Added cmake for android script under scripts, and set up the travis contbuild target. Closes https://github.com/caffe2/caffe2/pull/109 Reviewed By: bwasti Differential Revision: D4468767 Pulled By: Yangqing fbshipit-source-id: 709f3eb6be24727b0a989d0901dbf377871b122a
2017-01-04	add back third_party/protobuf, but it won't be used in normal builds.	Yangqing Jia	1	-0/+3
	Pinned protobuf to v3.1.0 Removed the USE_SYSTEM_PROTOBUF option in cmake. It is no longer used.
2017-01-04	Removed protobuf from third_party	Bram Wasti	1	-3/+0

2016-12-27	added google/benchmark and tidied up Cuda build	bwasti	1	-0/+3

2016-10-07	nervana build files	Yangqing Jia	1	-0/+3

2016-09-16	move third_party/gtest to use git submodules. As a result the folder name is ↵	Yangqing Jia	1	-0/+3
	now googletest
2016-09-16	move third_party/eigen3 to use git submodules. As a result the folder name ↵	Yangqing Jia	1	-0/+3
	is now eigen.
2016-09-16	move third_party/google/protobuf to use git submodules.	Yangqing Jia	1	-0/+3

2016-08-04	add submodule cub	Yangqing Jia	1	-0/+3

2016-08-02	more build updates:	Yangqing Jia	1	-0/+6
	(1) nccl submodule, cnmem submodule (2) mpi ops fallback test (3) a bit more blob interface (4) fixed tests (5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts (6) In the build system autogen __init__.py instead of having manual rules just to copy over an empty __init__.py.
2016-07-21	changes to make c2 build.	Yangqing Jia	1	-0/+3