Age | Commit message (Collapse) | Author | Files | Lines |
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166
ghimport-source-id: a8e2ba2d966e49747a55701c4f6863c5e24d6f14
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18166 Bool Tensor for CUDA**
* #18165 Resolved comments from Bool Tensor for CPU PR
------
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [This PR]
3. Tensor Conversions.
4. Tensor Indexing.
5. Tensor Operations.
6. Back compatibility related changes.
Change:
Enable bool tensor in CUDA with the following operations:
torch.zeros
torch.tensor
torch.ones
torch.rand/rand_like/randint/randint_like
torch.full
torch.full_like
torch.empty
torch.empty_like
Tested via unit tests and local scripts.
Differential Revision: D14605104
fbshipit-source-id: b7d7340a7d70edd03a109222d271e68becba762c
|
|
Summary:
Argument dim=-1 doesn't work for torch.cross. The signature of the torch.cross has been changed to c10::optional<int64_t> dim instead of int64_t. So based on document "If dim is not given, it defaults to the first dimension found with the size 3." and if dim is specified (even negative) it will use the correspondent dim.
Fixes #17229
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17582
Differential Revision: D14483063
Pulled By: ifedan
fbshipit-source-id: f9699093ec401cb185fd33ca4563c8a46cdcd746
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18727
Differential Revision: D14724887
Pulled By: ifedan
fbshipit-source-id: 8c1db6460303e746e4aea0142302b8d61277c067
|
|
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.
Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```
Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455
Reviewed By: ezyang
Differential Revision: D14672084
Pulled By: VitalyFedyunin
fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
|
|
Summary:
This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods.
cc: neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904
Differential Revision: D14700205
Pulled By: ezyang
fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3
|
|
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639
Differential Revision: D14711565
Pulled By: soumith
fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19
|
|
Summary:
Peephole optimize ops that just require Dimensioned Tensor Type, which is what we specialize graphs on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18549
Differential Revision: D14690827
Pulled By: eellison
fbshipit-source-id: 9d7439eb584f0a5b877f5aa53cf80150f00e7e5f
|
|
Summary:
If none of the outputs require_grad, we don't actually check gradgrad, instead we will check that their numerical gradients are 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18190
Differential Revision: D14563388
Pulled By: ifedan
fbshipit-source-id: a4eb94c9eb60f14dbe6986cd8cef1fe78a7bc839
|
|
Summary:
Adds support for string indexing (`"a"[0]`) and slicing (`"abc"[1:3]`)
to script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18247
Differential Revision: D14574486
Pulled By: driazati
fbshipit-source-id: 4b42aa0881e5398ea7f112be46c0335e6e19dced
|
|
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.
Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570
Reviewed By: driazati
Differential Revision: D14707285
Pulled By: eellison
fbshipit-source-id: 3a0265928aa8cad78961723d8bf0fbf871fdb71d
|
|
Summary:
Breakage in #18188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18667
Differential Revision: D14700133
Pulled By: driazati
fbshipit-source-id: 4cc26bd579fc1b074b3bef6046cc1030facee130
|
|
Summary:
If a test triggers autodiff, it must have a `DifferentiableGraph` in its differentiated forward graph, and this subgraph must have either the original aten node, or the corresponding nodes used in AD formula.
Typically a forward differentiable graph looks like this:
```
graph(%i0 : Float(),
%i1 : Float()):
%3 : Float() = prim::DifferentiableGraph_0(%i0, %i1)
return (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Float(),
%1 : Float()):
%2 : Float() = aten::max(%0, %1)
return (%2)
```
which tells us `aten::max(Tensor self, Tensor other) -> Tensor` is symbolically differentiable.
Update: there're a lot of cases (fusions/ConstantChunk/python implementations) that breaks it so I decided to make the check optionally take node names if different from function name.
~~[OLD]Theoretically I could also check if `aten::max` is in the differentiable block or not to be more precise, but there're also cases like `chunk` where in a differentiable block it's replaced with a prim node (ConstantChunk) and we will have to special case them. Any suggestions here (to be more precise or no) is very welcome!~~
We used to have a list containing nn tests should be run against AD, I moved it to an field when constructing our test(either torch or nn). I think it's cleaner this way, and it matches the fact that for the same op we support one schema of it but not all, in this way we could just turn on the corresponding test which triggers that supported schema.
cc: apaszke zdevito wanchaol ngimel for a review
[Done] :
- Going through a manual second pass of all tests to check if they should enable AD test or not....
- Add a readme about how to add AD for an op and how to add/enable its test in test_jit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18509
Differential Revision: D14696811
Pulled By: ailzhang
fbshipit-source-id: c5e693277baac585cd3aed5ab2c0e7faa5e6f29f
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
|
|
Summary:
* adds attributes to `ScriptModule.__getattr__` so they can be accessed in Python after re-importing
* full support for all the possible values for an `int64_t`
* this necessitated a bunch more `pushWhatever` functions, so re-introduced a templated version to cut down on duplicate code
* tests to validate references / value sharing works
* adds `torch.jit.Unpickler` which people can use to de-serialize the pickle files into Python / have a quick reference on how to do this without PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18188
Differential Revision: D14527490
Pulled By: driazati
fbshipit-source-id: efd15579cc04aa2e28c4b2c9490d82d849dee559
|
|
Summary:
This adds `hash()` which supports `int`, `str`, and `float`. It relies on `std::hash` which is implementation defined, so the result of `hash()` in TorchScript is not the same as in Python, but should satisfy the same properties.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18258
Differential Revision: D14692317
Pulled By: driazati
fbshipit-source-id: 909df5d024bb3feea157d5a203b7de53c72261c9
|
|
Summary:
Start of breaking up test_jit.py
New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports.
I am adding a test that's expected to fail to be sure it's running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590
Reviewed By: wanchaol
Differential Revision: D14677094
Pulled By: eellison
fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522
|
|
Summary:
This defines a generic counters API that users can utilize to provide monitoring functionality in e.g. a production service. We expose both counters for runtime internals as well as a TorchScript API to create user-defined counters. Synopsis of the API:
- `torch/csrc/jit/script/logging.h` specifies the externally-facing API in C++
- `torch/jit/_logging.py` specifies the Python API
We use an interface, `LoggerBase`, to define the interactions between users and a logging backend. Implementing a subclass of `LoggerBase` allows the user to handle these events in a custom way, such as logging into a DB or calling into an infra-specific counters API.
From the frontend perspective, we can create log events in two ways:
1. We provide an `add_stat_value(name, val)` function. This calls into the Logger backend with a key/value pair. For example, we might call `add_stat_value('foo', 1)` to bump an event counter.
2. We provide a `time_point()` function to record a timestamp in nanoseconds. This can be used in conjunction with `add_stat_value` to record runtime wall clock durations.
Examples of frontend usage can be found in `test_jit.py TestLogging`.
We provide a trivial `LockingLogger` implementation as an example and for testing purposes. It is likely not ready for production usage. It demonstrates that a backend implementing the API can do things like specify aggregation types and report these aggregate stats via the `get_counters()` API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18235
Differential Revision: D14545060
Pulled By: jamesr66a
fbshipit-source-id: 04099543a1898cfdd411511e46e03d5dce9b4881
|
|
Differential Revision:
D14668859
Original commit changeset: 3825a35ddc61
fbshipit-source-id: f3343ec6b63fe8f1f04959adfac4331865990047
|
|
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.
Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570
Differential Revision: D14668859
Pulled By: eellison
fbshipit-source-id: 3825a35ddc6179a0d433d70d22b5c1a96c20b21a
|
|
(Resubmission) (#17830)
Summary:
houseroad - this is the resubmission of https://github.com/pytorch/pytorch/pull/17420, as suggested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17830
Reviewed By: zrphercule
Differential Revision: D14398714
Pulled By: houseroad
fbshipit-source-id: bda475f1ae8a5273ebdb0f6883fc66036c29d326
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18152
ghimport-source-id: 1dd5e62c4d93394dcd8d8af2871554575c8d3d1a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18152 Initial implementation of InsertObserverNodes pass.**
* #18151 Add quant-passes stubs.
gh-metadata: pytorch pytorch 18150 gh/zolotukhinm@gmail.com/2/head
Differential Revision: D14584223
fbshipit-source-id: 30896acc1a8901d22c6a167eb87d2fbaafbbeb6f
|
|
Summary:
Previously, we were not able to assign names to `nn::Sequential`'s submodules. This PR adds this feature to match the Python API. Example use:
```cpp
Sequential sequential(named_submodule({
{"linear", Linear(10, 3)},
{"conv2d", Conv2d(1, 2, 3)},
{"dropout", Dropout(0.5)},
{"batchnorm", BatchNorm(5)},
{"embedding", Embedding(4, 10)},
{"lstm", LSTM(4, 5)}
}));
```
It also enables loading parameters of Python `nn.Sequential` module with custom submodules names into C++ frontend, unblocking https://github.com/pytorch/vision/pull/728#issuecomment-466661344.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17552
Differential Revision: D14246834
Pulled By: yf225
fbshipit-source-id: 3030b5c5d68f6dd5d3e37ac4b4f98dc6d6d9ba72
|
|
Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529
Differential Revision: D14683161
Pulled By: soumith
fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1
|
|
Summary:
Fix lint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18623
Differential Revision: D14686265
Pulled By: eellison
fbshipit-source-id: 4bbe0f5bc58f508cbf4bc1baef2029ce1eaa42d8
|
|
Summary:
Changelog:
- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435
Differential Revision: D14680352
Pulled By: soumith
fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
|
|
Summary:
Deleting batch tensor since we are no longer maintaining the project and keeping it functional is blocking other improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18575
Differential Revision: D14671126
Pulled By: eellison
fbshipit-source-id: b42d5b699c4d12171ed95e6d3a977532167f0d2c
|
|
Summary:
Add a way to insert external callbacks into PT's RecordFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17844
Differential Revision: D14399664
Pulled By: ilia-cher
fbshipit-source-id: 76654799811fefd3ffed4abfb46ed95b492cebab
|
|
Differential Revision:
D14652372
Original commit changeset: 7430b9d1dc2b
fbshipit-source-id: fa3d0f68515fe53447746469844d2db20c1292e0
|
|
Summary:
This implements a cyclical learning rate (CLR) schedule with an optional inverse cyclical momentum. More info about CLR: https://github.com/bckenstler/CLR
This is finishing what #2016 started. Resolves #1909.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18001
Differential Revision: D14451845
Pulled By: sampepose
fbshipit-source-id: 8f682e0c3dee3a73bd2b14cc93fcf5f0e836b8c9
|
|
Summary:
Adds slow test annotation to the following very slow tests -
70.33s test/test_jit.py::TestScript::test_script_module_script_resnet
32.33s test/test_jit.py::TestBatched::test_beam_search
17.70s test/test_jit.py::TestBatched::test_greedy_search
15.58s test/test_jit.py::TestScript::test_script_module_trace_resnet18
The list of remaining slow tests is below. Let me know if you think any of the others should be added to slow tests as well. Slow tests will only run on master.
15.28s call test/test_jit.py::TestJit::test_export_batchnorm
12.96s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_snli
11.65s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_neural_style
6.38s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_1d
5.96s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_2d_uneven_pad
5.91s call test/test_jit.py::TestJitGeneratedModule::test_nn_LocalResponseNorm_3d_custom_params
4.76s call test/test_jit.py::TestJit::test_alexnet
3.82s call test/test_jit.py::TestScript::test_number_math
3.81s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_no_bias
3.76s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups_thnn
3.65s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_pad1circular
3.49s call test/test_jit.py::TestBatched::test_lstm
3.33s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_pad2circular
3.19s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad2circular
3.11s call test/test_jit.py::TestEndToEndHybridFrontendModels::test_dcgan_models
3.11s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride_padding
3.11s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_stride
3.08s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_no_bias
3.08s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv1d_stride1_pad1circular
3.07s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_groups
3.05s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_dilated
3.05s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_with_multiplier
3.04s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_groups
3.03s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated
3.02s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv2d_depthwise_dilated
3.02s call test/test_jit.py::TestJitGeneratedModule::test_nn_Conv3d_dilated_strided
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18545
Differential Revision: D14656064
Pulled By: eellison
fbshipit-source-id: d17ee23c3b3679276cee983555d43e83ce099356
|
|
Summary:
This allows you to embed checks in IR, making the test more readable.
E.g.
```
graph_str = 'graph(%0 : Double(5, 5)):
# CHECK: aten::relu
%1 : Double(5, 5) = aten::relu(%0)
return (%1)'
FileCheck().run(graph_str, parseIR(graph_str))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18304
Differential Revision: D14652372
Pulled By: eellison
fbshipit-source-id: 7430b9d1dc2b7584704375aac02d7392ecec76a0
|
|
Summary:
Previously we were moving nodes with writers into differentiable subgraphs, without necessarily preserving whether or not they were written to. This can lead to bugs with CSE, which needs that context.
I'm not completely sure if there's anything else we can do to be more aggresive here - inline these nodes and not run CSE and just run constant pooling, or possibly something else, but I think we should land this correctness condition first and then possibly think further.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18491
Differential Revision: D14648562
Pulled By: eellison
fbshipit-source-id: bc1e444774ccdb708e22f0e06a477a221a231f9e
|
|
Summary:
Trying to reland https://github.com/pytorch/pytorch/pull/18298
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18535
Differential Revision: D14652391
Pulled By: eellison
fbshipit-source-id: 699e30045dd5f14f0a2b98378272045a292e1e2a
|
|
Summary:
Enable unit tests working with ROCm 2.3. In particular, these are unit tests where we skipped for double data types previously and some tests for multi-GPU setups.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18537
Differential Revision: D14651822
Pulled By: ezyang
fbshipit-source-id: 7dd575504ebe235a91489866c91000e9754b1235
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507
ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18507 Upgrade flake8-bugbear to master, fix the new lints.**
It turns out Facebobok is internally using the unreleased master
flake8-bugbear, so upgrading it grabs a few more lints that Phabricator
was complaining about but we didn't get in open source.
A few of the getattr sites that I fixed look very suspicious (they're
written as if Python were a lazy language), but I didn't look more
closely into the matter.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14633682
fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8
|
|
Summary:
Fixes #16448
bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18479
Differential Revision: D14635360
Pulled By: ezyang
fbshipit-source-id: 4010319fbce050dd0bdf4da3cd1171b9737f3c4c
|
|
enforce tensors.rst no longer miss anything (#16057)
Summary:
This depend on https://github.com/pytorch/pytorch/pull/16039
This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`.
When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057
Differential Revision: D14619550
Pulled By: ezyang
fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc
|
|
Differential Revision:
D14605905
Original commit changeset: 555f5a12a8e2
fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db
|
|
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449
Differential Revision: D14611638
Pulled By: soumith
fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf
|
|
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462
Differential Revision: D14620263
Pulled By: soumith
fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb
|
|
Summary:
Set value as tensor of 1 element instead of scalar, according to ONNX spec.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18199
Reviewed By: dzhulgakov
Differential Revision: D14542588
Pulled By: houseroad
fbshipit-source-id: 70dc978d870ebe6ef37c519ba4a20061c3f07372
|
|
Summary:
More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~
cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093
Differential Revision: D14620068
Pulled By: ezyang
fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165
ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18166 Bool Tensor for CUDA
* **#18165 Resolved comments from Bool Tensor for CPU PR**
-------
------------
This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs.
gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies:
**[utils/python_scalars.h]** why is this converting from uint8_t and not bool? (comment?)
When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool.
**[ATen/Dispatch.h]**better name?.
fixed.
**[test/test_torch.py]** what about other factories, such as full? (and more).
There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR.
**[generic/THTensorMath.h]** any changes in this file actually needed?
Bad merge. Fixed.
**[TH/THTensor.h]** this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool?
Added
**[c10/core/ScalarType.h]** I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here?
Added bool to the macro and created a similar one without for a single case which fails the build with errors:
_./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator*’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value*’)
return (*this) * insertConstant(rhs);_
Differential Revision: D14605105
fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18445
ghimport-source-id: 30d018737bf6989bc68b7e3676f44e0ca6141fde
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18445 Unify cudaGetDeviceCount implementations.**
I went about doing this by searching for calls to cudaGetDeviceCount,
and then methodically replacing them with references to c10::cuda::device_count()
or at::cuda::device_count().
There is a point to doing this: the various implementations wildly differed
in their handling of what to do when cudaGetDeviceCount returns an error.
The final standardized behavior is that **all errors are swallowed** and
we return device count of zero. This indirectly fixes running CUDA builds
on CPU, which was broken in #17847.
I added 'noexcept' to the 'deviceCount' virtual method on DeviceGuardImpl.
This is a BC-breaking change for anyone inheriting from DeviceGuardImpl
but all you need to do is put 'noexcept' on your method and it is backwards
compatible with older libtorch.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14612189
fbshipit-source-id: 3c8d186e3dd623c0e27625212c7ce30f75d943cb
|
|
Summary:
`SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.
Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332.
This closes #9332 .
cc: soumith Balandat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505
Reviewed By: zou3519
Differential Revision: D9330179
Pulled By: ezyang
fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d
|
|
Summary:
Simplify or eliminate boolean and/or expressions, optimize unwrapping a value that cannot be None, and optimize using `is` with a None and a non-None value
Since peephole optimize is now introducing constants, i added another constant propagation pass after running it.
Previously i had a PR that did this & optimized shape ops - i will add the shape optimizations in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18259
Differential Revision: D14602749
Pulled By: eellison
fbshipit-source-id: 1c3f5a67067d8dfdf55d7b78dcb616472ea8a267
|
|
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/18263
cc: houseroad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18264
Reviewed By: ezyang
Differential Revision: D14559234
Pulled By: houseroad
fbshipit-source-id: c5b8623752d6c6af41c6d715fd9585a65294868d
|
|
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/12598
This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`.
The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is.
cc: ptrblck ezyang ngimel colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391
Reviewed By: soumith
Differential Revision: D14605905
Pulled By: VitalyFedyunin
fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce
|
|
Summary:
Dropout is now eligible for fusion, and generated fused kernels are just as fast as dropout in ATen. Change its lowering in symbolic script so that it can actually be fused. Still special-cased for cuda, because without fusion this lowering is less efficient than current (bernoulli_ * input). Testing is covered by the test case that ailzhang added (test_dropout_cuda).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18375
Differential Revision: D14611938
Pulled By: soumith
fbshipit-source-id: 11b18f4784e6c9265e382a8f8deca7add8df3b37
|
|
Summary:
related to https://github.com/pytorch/pytorch/issues/16608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18458
Differential Revision: D14611718
Pulled By: soumith
fbshipit-source-id: 6dc903ff2d32b9c3b76470869d1f4e9a67f706df
|