Age | Commit message (Collapse) | Author | Files | Lines |
|
Instead of checking just USE_XNNPACK, check for all NNPACKS
Change-Id: Idc566383322b0cad201e7d0e4c32b4da0c91c1ea
Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>
|
|
Update to vesion 1.6.0-rc1
Change-Id: I53e568f805ea7d787c7cc013ed6b858e031160f9
Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612
optimize BatchMatmulOp
Reviewed By: houseroad
Differential Revision: D14681665
fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19531
Reviewed By: hlu1
Differential Revision: D15024541
fbshipit-source-id: cd8249a6d529afb65fa8afd74a05dbfe73eb1fb0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19621
Comments for group_norm_op is not accurate (i.e., the math part), this diff will fix it.
Reviewed By: BIT-silence
Differential Revision: D15048695
fbshipit-source-id: 27d41d3ae21054257967815254134849944d56ca
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19087
att
Reviewed By: jackm321
Differential Revision: D14863112
fbshipit-source-id: 2680161b9f05391e73bb8dac4fbbeabb87a82c05
|
|
Summary:
Often times, we want to experiment with loss per element (image etc.). This changeset allows getting per element loss as well. This output is optional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19579
Reviewed By: jerryzh168
Differential Revision: D15035797
Pulled By: prigoyal
fbshipit-source-id: 562dea514f49c1f2f1cbbc083a1938dc019a75c4
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18499
If the init op is not fp16 compatible, it should throw.
However, in the special case where the original init op is UniformFill,
we replace it with Float16UniformFill
Reviewed By: kennyhorror
Differential Revision: D14627209
fbshipit-source-id: eb427772874a732ca8b3a25d06670d119ce8ac14
|
|
(#19568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19568
Previous import was 83dd62659fc07d5b7fa93b5d1c1879f93509c7db
Included changes:
- **[0e8d2bc5](https://github.com/onnx/onnx/commit/0e8d2bc5)**: [Minor need to be in 1.5]Fix an issue in NMS test data which introduce wrong shape. (#1953) <Hector Li>
- **[9346dd5d](https://github.com/onnx/onnx/commit/9346dd5d)**: adding modulus operator (#1874) <Jeff Saremi>
- **[414dbc73](https://github.com/onnx/onnx/commit/414dbc73)**: Fix shape inference for slice (#1950) <Hariharan Seshadri>
- **[6fb0775d](https://github.com/onnx/onnx/commit/6fb0775d)**: Fix shape inference for ConstantOfShape op (#1951) <Ashwini Khade>
Reviewed By: bddppq, zrphercule, benoitsteiner
Differential Revision: D15033070
fbshipit-source-id: f7eb90b142cbdc9bf1600cfd33e5a8df709045fb
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19442
For cases like CV, some of ops like transpose and tile will mangle the batch size so that we don't know how to adjust output batch size. In this case, the current solution is just fix the input batch statically and do not adjust output batch size.
Reviewed By: zrphercule
Differential Revision: D15007237
fbshipit-source-id: a21b943a52ee5462d9d7804dfae44360f579f8cf
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19359
Even with file IO exception handling, some of the sandcastle c2_ref_tests are still failing in length-check assert, as can be seen here:
https://our.intern.facebook.com/intern/test/844424932589974?ref_report_id=0
This is an attempt to add printing logic to debug what's going on.
Reviewed By: dzhulgakov
Differential Revision: D14966274
fbshipit-source-id: adce6d4780d664c5ef59f9341b6133b0d09324cb
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19567
fix variable shadowing
Reviewed By: bddppq, wx1988
Differential Revision: D15032114
fbshipit-source-id: 895ea21f22b87db8c7c8684f54fa186d22f24d10
|
|
(#19454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19454
Previous import was ad7313470a9119d7e1afda7edf1d654497ee80ab
Included changes:
- **[83dd6265](https://github.com/onnx/onnx/commit/83dd6265)**: Add NonMaxSuppression operator (#1703) <Hector Li>
- **[31ca5d6f](https://github.com/onnx/onnx/commit/31ca5d6f)**: add node tests for quantized ops (#1944) <Ashwini Khade>
- **[e6076c1d](https://github.com/onnx/onnx/commit/e6076c1d)**: Fix test stat coverage script (#1948) <Raymond Yang>
- **[ad036405](https://github.com/onnx/onnx/commit/ad036405)**: Add IsInf to detect infinity values (#1884) <Wei-Sheng Chin>
Reviewed By: benoitsteiner
Differential Revision: D15010015
fbshipit-source-id: 4b29de21de60f8e6a2db75309809a4e619c92532
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18498
Reviewed By: kennyhorror
Differential Revision: D14626755
fbshipit-source-id: d8a0b3c02920ab3835911a21bf05e8956853fcd7
|
|
Summary:
In this PR, the fusion alogrithms are improved to support DNNLOWP.
1. Enabled conv fusions for DNNLOWP
2. Fused order switch op into following quantize op
3. Improve conv+sum fusion to parse larger scope/window
4. re-org fusion code to fix random crash issue due to changing graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18843
Differential Revision: D15021030
Pulled By: yinghai
fbshipit-source-id: 88d2199d9fc69f392de9bfbe1f291e0ebf78ab08
|
|
Summary:
There are two corrections in this pull request.
The first is specific to gcc-7.4.0.
compiled with -std=c++14 gcc-7.4.0 has __cplusplus = 201402L
This does not meet the check set in Deprecated.h, which asks for >201402L.
The compiler goes down to the __GNUC__ check, which passes and sets C10_DEPRECATED_MESSAGE to a value that c++14 does not appear to support or even recognize, leading to a compile time error.
My recommended solution, which worked for my case, was to change the = into a >=
The second correction comes in response to this error:
caffe2/operators/crash_op.cc: In member function ‘virtual bool caffe2::CrashOp::RunOnDevice()’:
caffe2/operators/crash_op.cc:14:11: error: ‘SIGABRT’ was not declared in this scope
I am merely committing to the repository the solution suggested here (which worked for me)
https://discuss.pytorch.org/t/building-pytorch-from-source-in-conda-fails-in-pytorch-caffe2-operators-crash-op-cc/42859
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19470
Differential Revision: D15019529
Pulled By: ailzhang
fbshipit-source-id: 9ce9d713c860ee5fd4266e5c2a7f336a97d7a90d
|
|
Summary:
This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5
And I got ~8 GB/s before this change, but ~14 GB/s after this change.
This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166):
== Before ==
time_per_iter 0.0001298875093460083
GB/s 3.082544287868467
== After ==
time_per_iter 0.00010104801654815674
GB/s 3.9623142905451076
The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that.
EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression:
before
time_per_iter 8.983819484710693e-05
GB/s 4.456723564864611
After no axpy
time_per_iter 7.19951868057251e-05
GB/s 5.56126065872172
AFter perfkernels
time_per_iter 5.6699180603027346e-05
GB/s 7.061548257694262
After perfkernels no grad
time_per_iter 4.388842582702637e-05
GB/s 9.122769670026413
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329
Reviewed By: dzhulgakov
Differential Revision: D14969630
Pulled By: jamesr66a
fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19451
Fix relu bug for empty tensor
Reviewed By: xianjiec
Differential Revision: D15009811
fbshipit-source-id: b75e567c3bec08d7d12b950d8f1380c50c138704
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19458
The algorithm in https://fburl.com/ggh9iyvc fails to really ensure topological ordering of nodes. The fix is ugly but effective. I think we need a real topological sort to fix this issue more nicely. Mikhail Zolotukhin, Bram Wasti.
Differential Revision: D15011893
fbshipit-source-id: 130c3aa442f5d578adfb14fbe5f16aa722434942
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19388
The old implementation forced a refcount bump when converting at::Tensor to caffe2::Tensor.
Now, it is possible to move it without a refcount bump.
Reviewed By: dzhulgakov
Differential Revision: D14986815
fbshipit-source-id: 92b4b0a6f323ed38376ffad75f960cad250ecd9b
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19287
Since we now have a string-schema-based op registration API, we can also use it when exposing caffe2 operators.
Reviewed By: dzhulgakov
Differential Revision: D14931925
fbshipit-source-id: ec162469d2d94965e8c99d431c801ae7c43849c8
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18497
Reviewed By: kennyhorror
Differential Revision: D14614738
fbshipit-source-id: beddd8349827dcc8ccae36f21e5d29627056afcd
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19362
`float` type is never used in OnnxifiOp....
Reviewed By: bddppq
Differential Revision: D14977970
fbshipit-source-id: 8fee02659dbe408e5a3e0ff95d74c04836c5c281
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19083
As we have discussed, there are too many of AdjustBatch ops and they incur reallocation overhead and affects the performance. We will eliminate these ops by
- inling the input adjust batch op into Glow
- inling the output adjust batch op into OnnxifiOp and do that only conditionally.
This is the C2 part of the change and requires change from Glow side to work e2e.
Reviewed By: rdzhabarov
Differential Revision: D14860582
fbshipit-source-id: ac2588b894bac25735babb62b1924acc559face6
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19328
Plans changed and we don't want this class anymore.
Reviewed By: dzhulgakov
Differential Revision: D14966746
fbshipit-source-id: 09ea4c95b352bc1a250834d32f35a94e401f2347
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19298
Proper testing for conv_bn_relu folding
Differential Revision: D13998891
fbshipit-source-id: ceb58ccec19885cbbf38964ee0d0db070e098b4a
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19009
Move the definition of `MulFunctor<>::Backward()` into a header file.
Reviewed By: BIT-silence
Differential Revision: D14823230
fbshipit-source-id: 1efaec01863fcc02dcbe7e788d376e72f8564501
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19170
As title
The quantized resnext3d model in production got the following failures without the fix:
```
Caffe2 operator Int8ConvRelu logging error: [enforce fail at conv_pool_op_base.h:463] order == StorageOrder::NCHW. 1 vs 2. Conv3D only supports NCHW on the production quantized model
```
Reviewed By: jspark1105
Differential Revision: D14894276
fbshipit-source-id: ef97772277f322ed45215e382c3b4a3702e47e59
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19118
A bug introduced by D14700576 reported by Yufei (fixed by D14778810 and D14785256) was not detected by our units tests.
This diff improves unit tests to catch such errors (with this diff and without D14778810, we can reproduce the bug Yufei reported).
This improvement also revealed a bug that affects the accuracy when we pre-pack weight and bias together and the pre-packed weight/bias are used by multiple nets. We were modifying the pre-packed bias in-place which was supposed to be constants.
Reviewed By: csummersea
Differential Revision: D14806077
fbshipit-source-id: aa9049c74b6ea98d21fbd097de306447a662a46d
|
|
Summary:
Fix the return value of ParseFromString.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19262
Differential Revision: D14937605
Pulled By: ezyang
fbshipit-source-id: 3f441086517186a075efb3d74f09160463b696b3
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19176
Add some amenities for debugging.
Reviewed By: llyfacebook
Differential Revision: D14901740
fbshipit-source-id: 2c4018fdbf7e3aba2a754b6b4103a72893c229c2
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19090
Reviewed By: BIT-silence
Differential Revision: D14864737
fbshipit-source-id: 8debd53171f7068726f0ab777a13ca46becbfbdf
|
|
is_variable_ (#19139)
Summary:
Currently, a TensorImpl's `is_variable_` is true if and only if the TensorImpl has AutogradMeta. This PR unifies these two concepts by removing `is_variable_` and change `is_variable()` to check existence of AutogradMeta instead.
Removing `is_variable_` is part of the work in Variable/Tensor merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19139
Differential Revision: D14893339
Pulled By: yf225
fbshipit-source-id: ceb5e22c3c01f79b5d21d5bdbf4a7d1bc397796a
|
|
Summary:
It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath.
This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device.
Instead, we materialize a device by allocating 0 bytes via the allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605
Differential Revision: D14680620
Pulled By: gchanan
fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19154
I recently saw some weird workflow error due to empty but set net_type. Maybe we should just fallback to simple net in this case.
Reviewed By: dzhulgakov
Differential Revision: D14890072
fbshipit-source-id: 4e9edf8232298000713bebb0bfdec61e9c5df17d
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17236
Following the paper in https://papers.nips.cc/paper/7141-what-uncertainties-do-we-need-in-bayesian-deep-learning-for-computer-vision.pdf, approximate the classification case with the regression formulation. For the LRLoss, add penalty based on the variance and regularization on the variance with a tunable parameter lambda.
Reviewed By: chocjy
Differential Revision: D14077106
fbshipit-source-id: 4405d8995cebdc7275a0dd07857d32a8915d78ef
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18635
Optimize SoftmaxOp on CPU
Reviewed By: houseroad
Differential Revision: D14689516
fbshipit-source-id: d2dcee2476d1a3a21f428e99bce9835f1d229d64
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19059
Reviewed By: bwasti
Differential Revision: D14849735
fbshipit-source-id: fefd1887d38e51151c07a8b187e9c7c50ef02c6e
|
|
Summary:
Implement operators for DNNLOWP, including int8_conv, int8_FC, int8_pooling, int8_relu, int8_sum, quantize/dequantize, and order_swtich operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18656
Differential Revision: D14767092
Pulled By: yinghai
fbshipit-source-id: 1f3e24929a358a42214da333bd304c593ea4468f
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19085
This is a bug where input_shapes_ and output_shapes_ will grow indefinitely. Fix it here.
Reviewed By: bertmaher, rdzhabarov
Differential Revision: D14861695
fbshipit-source-id: d59116f27c3b54f5cc5a33533de4b9222dbb7afc
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18902
Fix in D14778810 had an issue that when we fallback to acc32 because the density of outlier is too high W_quantized_ is already modified. In this diff we first just count the number of outliers (without modifying W_quantized_) and only when density is low enough and no need for fallback we modify W_quantized_ and construct an outlier matrix.
Reviewed By: jspark1105
Differential Revision: D14785256
fbshipit-source-id: 03933110a4ca7409686a06b18a9bb921f8657950
|
|
Summary:
Fixes the problem of #18391
The issue is that when we code gen the ATenOp, we always generated static number of outputs for each operator. E.g. If there's operator from a old model that only requires two outputs, in its createOperator it will only allocate two output blobs, while the newer version of the operator (`unique` in this case) requires more output blob to be allocated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18581
Differential Revision: D14865647
Pulled By: wanchaol
fbshipit-source-id: 85f63fe16d6fe408a09eca84798c7e8cab3070e9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080
OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test
Reviewed By: dzhulgakov
Differential Revision: D14814194
fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061
remove the deps on interned_string.h
Reviewed By: BIT-silence
Differential Revision: D14850078
fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042
show the model saving step in the log.
Reviewed By: kennyhorror
Differential Revision: D14809385
fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156
Differential Revision: D14831246
Pulled By: ezyang
fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992
Add gelu op
Reviewed By: houseroad
Differential Revision: D14814811
fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491
|
|
Summary:
Almost there, feel free to review.
these c10 operators are exported to _caffe2 domain.
TODO:
- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210
Reviewed By: zrphercule
Differential Revision: D14600916
Pulled By: houseroad
fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886
Expose tensor filler util to Python and add a unit test (both C++/Python)
Reviewed By: salexspb
Differential Revision: D14784470
fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004
Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38
Reviewed By: jspark1105
Differential Revision: D14822193
fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386
|