Age | Commit message (Collapse) | Author | Files | Lines |
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19515
This is still done by default, but can now be disabled by specifying
`find_unused_parameters=False`. There are use cases where finding
unused parameters results in erroneous behavior, because a subset of
model parameters is used *outside* the `forward` function. One can
argue that doing this is not a good idea, but we should not break
existing use cases without an escape hatch. This configuration
parameter is that escape hatch.
Reviewed By: bddppq
Differential Revision: D15016381
fbshipit-source-id: f2f86b60771b3801ab52776e62b5fd6748ddeed0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19360
We'll return the output object verbatim since it is a freeform object.
We need to find any tensors in this object, though, because we need to
figure out which parameters were used during this forward pass, to
ensure we short circuit reduction for any unused parameters.
Before this commit only lists were handled and the functionality went
untested. This commit adds support for dicts and recursive structures,
and also adds a test case.
Closes #19354.
Reviewed By: mrshenli
Differential Revision: D14978016
fbshipit-source-id: 4bb6999520871fb6a9e4561608afa64d55f4f3a8
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19271
allow DDP to take multi-gpu models
Reviewed By: pietern
Differential Revision: D14822375
fbshipit-source-id: 1eebfaa33371766d3129f0ac6f63a573332b2f1c
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18953
This removes Python side bucketing code from DistributedDataParallel
and replaces it with calls to the new C++ based bucketing and reducing
code. To confirm this is working well, we ran a test with both the
previous implementation and the new implementation, and confirmed they
are numerically equivalent.
Performance is improved by a couple percent or more, including the
single machine multiple GPU runs.
Closes #13273.
Reviewed By: mrshenli
Differential Revision: D14580911
fbshipit-source-id: 44e76f8b0b7e58dd6c91644e3df4660ca2ee4ae2
|
|
Summary:
~Sometimes, `init_process_group()`, `store.get()`, and `destory_process_group()` can take more than a few seconds. Hence, removing thread join timeout.~
The error was due to `Address already in use` when starting TPC backend. The solution is to catch the error and report it to the `retry_on_address_already_in_use_error` decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19114
Reviewed By: ezyang
Differential Revision: D14872680
Pulled By: mrshenli
fbshipit-source-id: fc504d02853ca73f76288c0ade564ab20bc01f7e
|
|
Summary:
closes #16520
Hi pietern, I am not sure if this is the expected way to pass timeout to `Store`, could you please help take a look? Thanks!
Questions:
1. How do I write tests for this? I wanted to do something like `test_barrier_timeout_global`, but it seems I need to set the pg's timeout larger than the `Store`'s default timeout (3 min) to see a difference, which is too long for a unit test. And I do not want to change the `Store`'s default timeout either. Any suggestion?
2. Should I also propagate timeout configuration down to `PrefixStore` in `_new_process_group_helper`?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16571
Differential Revision: D13954527
Pulled By: mrshenli
fbshipit-source-id: 77f2653903f24255207233eb298f7c0321119a87
|
|
Summary:
See #18659.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18916
Differential Revision: D14808749
Pulled By: pietern
fbshipit-source-id: 9a9c8beddb2dbbb1bf4c5e575743d9e1fa3f07fa
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18845
This adds a few CPU only test cases for the reducer class.
Reviewed By: mrshenli
Differential Revision: D14768432
fbshipit-source-id: c008a52206826304e634a95bc14167ed94c97662
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18291
ghimport-source-id: d6e95e899bd320407967df41435801e54864ba62
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)
* **#18291 Correctly call superclass setUp in TestCase subclasses.**
This makes PYTORCH_TEST_SKIP_FAST work correctly for more
tests, reducing the wasted testing effort on our slow_test job.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14567643
fbshipit-source-id: 40cf1d6556e0dd0a0550ff3d9ffed8b6000f8191
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16274
Reviewed By: pietern
Differential Revision: D13788036
fbshipit-source-id: a9b7353fb0655908e6d47387cc77af33e9471aed
|
|
Summary:
This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it.
This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit). The master rank (who is the server) will always wait until all the ranks to complete before complete itself.
This should fix: https://github.com/pytorch/pytorch/issues/15638
Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage.
I had to make rendezvous test in c10d the world size of 1, since it is a single process code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684
Differential Revision: D13570904
Pulled By: teng-li
fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14916
as titled
Reviewed By: pietern
Differential Revision: D13267832
fbshipit-source-id: 3b89d08af93f74941f17ff892c33fc2a4a023c19
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14917
as titled
Reviewed By: pietern
Differential Revision: D13271560
fbshipit-source-id: 0187a3390f8ebd72a2c074e7a651432159d427c0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14576
as titled
Reviewed By: pietern
Differential Revision: D13266063
fbshipit-source-id: e262f77d63724a7504a7112907bbfba49612fe75
|
|
Summary:
Otherwise, these tests will fail, even though there are never meant to run on single GPU machines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14860
Differential Revision: D13369060
Pulled By: teng-li
fbshipit-source-id: 8a637a6d57335491ba8602cd09927700b2bbf8a0
|
|
Summary:
It is possible that some sort of contention causes process scheduling
delays which in turn cause the timeout to *not* be hit.
Increased sleep here will decrease the probability of this happening.
Fixes #14555.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14814
Differential Revision: D13351924
Pulled By: pietern
fbshipit-source-id: 1222cf0855408dfcb79f30f94694c790ee998cf9
|
|
Summary:
Thanks nairbv for the suggestion.
Also see #14589.
Fixes #14703.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14815
Differential Revision: D13351913
Pulled By: pietern
fbshipit-source-id: d11a4152505d0ce15592b13e417bb80551476a61
|
|
Summary:
If multiple arguments are specified to c10d allreduce, they are
interpreted as if they are expanding the ranks in the process group.
Therefore, not only is every argument to allreduce an input that must
be considered, it is also an output. The problem that this commit
fixes is that they were not correctly considered as outputs.
The upstream problem is tracked in facebookincubator/gloo#152. Once
this is fixed there we can remove the copies that this commit adds.
This fixes #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14688
Differential Revision: D13294405
Pulled By: pietern
fbshipit-source-id: 078a2a0a0ff12d051392461438f1496201ec3cb9
|
|
Summary:
Fixing: https://github.com/pytorch/pytorch/issues/14446
This was a supported behavior in old torch.distributed. We want to support it in the new release.
Test should cover all combination of scenario when we have either env or arg set up for rank or size or both
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14494
Differential Revision: D13253433
Pulled By: teng-li
fbshipit-source-id: c05974d84f1bdf969f74ec45763e11a841fe4848
|
|
Summary:
as titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14443
Reviewed By: pietern
Differential Revision: D13222907
Pulled By: janewangfb
fbshipit-source-id: f418c5d84880196f97089114d02957cf739243f8
|
|
Summary:
Fixed: https://github.com/pytorch/pytorch/issues/14445
Also bumped up timeout to 30 seconds, since on 8-GPU machines, DDP test will take more than 15 seconds sometimes.
Tested on 8 GPU machines:
```
tengli@learnfair062:~/pytorch/test$ python test_c10d.py --verbose
test_dist_broadcast_coalesced_gloo (__main__.DistributedDataParallelTest) ... ok
test_dist_broadcast_coalesced_nccl (__main__.DistributedDataParallelTest) ... skipped 'Test skipped due to known issues'
test_fp16 (__main__.DistributedDataParallelTest) ... ok
test_gloo_backend (__main__.DistributedDataParallelTest) ... ok
test_nccl_backend (__main__.DistributedDataParallelTest) ... ok
test_queue_reduction (__main__.DistributedDataParallelTest) ... ok
test_sync_params_no_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_params_with_buffers (__main__.DistributedDataParallelTest) ... ok
test_sync_reduction (__main__.DistributedDataParallelTest) ... ok
test_set_get (__main__.FileStoreTest) ... ok
test_set_get (__main__.PrefixFileStoreTest) ... ok
test_set_get (__main__.PrefixTCPStoreTest) ... ok
test_allgather_basics (__main__.ProcessGroupGlooTest) ... ok
test_allgather_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress (__main__.ProcessGroupGlooTest) ... ok
test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_checks (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress (__main__.ProcessGroupGlooTest) ... ok
test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... ok
test_gather_basics (__main__.ProcessGroupGlooTest) ... ok
test_gather_checks (__main__.ProcessGroupGlooTest) ... ok
test_reduce_basics (__main__.ProcessGroupGlooTest) ... ok
test_reduce_checks (__main__.ProcessGroupGlooTest) ... ok
test_scatter_basics (__main__.ProcessGroupGlooTest) ... ok
test_scatter_checks (__main__.ProcessGroupGlooTest) ... ok
test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... ok
test_timeout_kwarg (__main__.ProcessGroupGlooTest) ... ok
test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok
test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_barrier (__main__.ProcessGroupNCCLTest) ... ok
test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok
test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok
test_common_errors (__main__.RendezvousEnvTest) ... ok
test_nominal (__main__.RendezvousEnvTest) ... ok
test_common_errors (__main__.RendezvousFileTest) ... ok
test_nominal (__main__.RendezvousFileTest) ... ok
test_common_errors (__main__.RendezvousTCPTest) ... ok
test_nominal (__main__.RendezvousTCPTest) ... ok
test_unknown_handler (__main__.RendezvousTest) ... ok
test_address_already_in_use (__main__.TCPStoreTest) ... ok
test_set_get (__main__.TCPStoreTest) ... ok
----------------------------------------------------------------------
Ran 46 tests in 162.980s
OK (skipped=1)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14452
Differential Revision: D13230652
Pulled By: teng-li
fbshipit-source-id: 88580fe55b3a4fbc7a499ca3b591958f11623bf8
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14386
See #13573, #14142, and #14271 for discussion.
This change updates ProcessGroupGloo to ensure that all prior
operations have completed before executing the barrier.
Reviewed By: manojkris
Differential Revision: D13205022
fbshipit-source-id: 673e7e6ca357dc843874d6dd8da590832e1de7fa
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14298
This is a breaking API change for users of the C++ c10d API. The work
object defined wait() to return a boolean. If the work completed
successfully it would return true, if it didn't it would return false.
It was then up to the user to call the exception() function to figure
out what went wrong. This has proven suboptimal as it allows users to
forget about failure handling and errors may be ignored.
The work class is semantically very similar to std::future, where a
call to get() may throw if the underlying std::promise has set an
exception. This commit changes the semantic of the work class to be
similar to this and turns wait() into a void function that throws if
the work completes with an exception.
The exception() function can still be used to retrieve the exception
if isSuccess() returns false, but now returns an std::exception_ptr
instead of a reference to a std::exception.
Reviewed By: manojkris
Differential Revision: D13158475
fbshipit-source-id: 9cd8569b9e7cbddc867a5f34c6fd0b7be85581b8
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14294
This is the final collective to be ported to the new style where there
is no longer a need to keep a cached algorithm instance around. There
is a follow up change incoming to remove the algorithm caching
functionality in ProcessGroupGloo.
Reviewed By: manojkris
Differential Revision: D13111509
fbshipit-source-id: f3ea0d955a62029fc4e7cfc09055e4957e0943ac
|
|
Summary:
Most likely a typo.
Tested on 8-GPU machine
```
tengli@learnfair062:~/pytorch/test$ python test_c10d.py ProcessGroupNCCLTest.test_barrier
.
----------------------------------------------------------------------
Ran 1 test in 29.341s
OK
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14389
Differential Revision: D13207207
Pulled By: teng-li
fbshipit-source-id: aaffe14237076fe19d94e2fa4d9c093397f07bb9
|
|
Summary:
This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream.
Test covered as well. Tested on 8-GPU machine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271
Differential Revision: D13164993
Pulled By: teng-li
fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1
|
|
Summary:
This fixes #13907.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13911
Differential Revision: D13046256
Pulled By: pietern
fbshipit-source-id: bab70cd73ef868e23d4857b06e72830ad29ddb4f
|
|
Summary:
This addressed: https://github.com/pytorch/pytorch/issues/11874
and we will have the identical file init_method behavior as the previous THD file init.
Also the FileStore::add bug is pretty annoying.
Two bugs:
(1) Add doesn't append to the end of the file.
(2) Cache doesn't get updated.
Both are fixed and tests are covered.
I examined the /tmp to ensure that all temp files are auto deleted after test_c10d.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13708
Reviewed By: pietern
Differential Revision: D12972810
Pulled By: teng-li
fbshipit-source-id: 917255390aa52845f6b0ad0f283875a7a704da48
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13816
If common.find_free_port() returns the same port over and over again,
and the TCPStore fails to bind to it over and over again, this
function has the potential to loop forever. If we can't find a free
port after 10 tries, we are safe to assume something is wrong...
Differential Revision: D13017700
fbshipit-source-id: 2139a0ea0f30ce08b5571f80ae0551f1fa7ba4a2
|
|
Summary:
We only need this for backward, for FWD cast, the non-fine-grained bucketing should be better since it's sequential anyway.
Test should be covered all by c10d test, reduced bucket size to make bucketing happen in c10d test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13607
Differential Revision: D12944515
Pulled By: teng-li
fbshipit-source-id: d982e8dca2874c91d39b30b73a85bfbeb768c508
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13623
Moves the bulk of shared argument checkers in the gloo backend to Utils.hpp.
Reviewed By: teng-li
Differential Revision: D12934598
fbshipit-source-id: 7b80e67ccc3425f21498c30fbe7837af314f96f2
|
|
Summary:
Functionality test shouldn't be affected since we have both backends testing for the same thing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13606
Differential Revision: D12937185
Pulled By: teng-li
fbshipit-source-id: 03d897b6690f7932654fdb7d11a07016dfffa751
|
|
Summary:
When go to mixed precision fp16 training, DDP randomly hangs. Initially, I thought this smells like a similar NCCL bug I filed a while ago. It turns out it's not. Again, I am seeing different rank process has different size. How could this even happen?
It turns out that take_tensors will generate a list of bucketed tensors in an un deterministic order, because, the key to the map is a pointer. An interesting bug digging and fix.
Now fp16 DDP training should be fully working now.
Also, added another take_tensor fine grained helper that aims to improve the performance of DDP, making it a TODO to replace the DDP take_tensors with that.
Fixed: https://github.com/pytorch/pytorch/issues/12150
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13496
Differential Revision: D12920985
Pulled By: teng-li
fbshipit-source-id: 26f3edae7be45a80fa7b2410a2e5a1baab212d9c
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13497
This replaces the existing broadcast implementation with the new style collective call in the gloo backend. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU broadcast implementation.
Reviewed By: teng-li
Differential Revision: D12890013
fbshipit-source-id: 43f346fb2814f421bedc7babf89169703a46bb9c
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13426
This replaces the existing allreduce implementation with the new style collective call in the gloo backend. This is the first one to include both a CPU and a CUDA path. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU allreduce implementation. This is not much different from the current situation in the case where there is a single input tensor per call (which is the case when called from DistributedDataParallel).
Reviewed By: teng-li
Differential Revision: D12855689
fbshipit-source-id: 574281d762dd29149fa7f634fb71f8f6a9787598
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13425
This adds support for the new style reduce collective call in the gloo backend.
Reviewed By: teng-li
Differential Revision: D12869404
fbshipit-source-id: 93c641e6aba3b03c796bda80737547c565cfa571
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13424
This adds support for the allgather collective call in the gloo backend. The gloo implementation does not support multiple inputs per rank (nor one or more outputs per rank), so we use a temporary flattened buffer and unflatten once the collective finishes.
Reviewed By: teng-li
Differential Revision: D12832009
fbshipit-source-id: 2f5c1934a338589cef1d3192bd92ada135fecd7a
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13423
This adds support for the gather collective call in the gloo backend. The gloo implementation does not yet support the mode where the root has multiple output tensors (one per rank), so we use a temporary flattened buffer and unflatten on the root once the collective finishes.
Reviewed By: teng-li
Differential Revision: D12811647
fbshipit-source-id: 90fe8af8c390090b7d4ef43aa74f4e3e67ab9d0b
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13422
This adds support for the scatter collective call in the gloo backend. This is the first of the new style collectives that do not expect to be created once and used many times. This commit contains some shortcuts to make this new style work side by side with the existing implementations (such as the std::tuple with nullptr's). These shortcuts are temporary until we have moved over all collectives to this new style.
Reviewed By: teng-li
Differential Revision: D12310219
fbshipit-source-id: 32e68717f819d5980f0e469d297204948351cefc
|
|
Summary:
1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it.
2. Adds an assertion in `load_tests` that each script only runs cases defined in itself.
cc yf225 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250
Differential Revision: D12823734
Pulled By: SsnL
fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160
Reduces pytorch_core build from 2 hours to 30 minutes
Reviewed By: soumith, dzhulgakov
Differential Revision: D10524261
fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13159
These lint violations are intentional.
Reviewed By: ezyang
Differential Revision: D10862131
fbshipit-source-id: 70ad4b0a360cb12d050805fd7b1080dfe4566e86
|
|
Summary:
The existing default timeout was set at 10 seconds, which is too low
for asynchronous tasks that depend on a barrier to resynchronize.
Having a single timeout for all operations is not ideal and this will
be addressed in future commits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13056
Reviewed By: teng-li
Differential Revision: D10558746
Pulled By: pietern
fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb
|
|
memcpy (#12954)
Summary:
- Moved sync_reduction to C++
- Use a dedicated CUDA stream for memcpy
- Also use a dedicated CUDA stream for memcpy in queue_reduction
Added test as well.
CI should cover both DDP and unittest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12954
Differential Revision: D10520069
Pulled By: teng-li
fbshipit-source-id: 64348e4e43c15f9695a4c28b036c232587ecfb65
|
|
Summary:
fully working version by using continuing on goldsborough 's initial version.
waiting on the stream guard to be merged before adding more stream perf logics into the c++ version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852
Differential Revision: D10468696
Pulled By: teng-li
fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12782
We have seen the "Address already in use" error popup a few times when instantiating the TCPStore. The port that it uses is dynamically generated through common.find_free_port(), which binds a new socket to a random port, closes the socket, and returns the port that the OS had assigned. If some other process grabs that port in the time between closing the socket and the TCPStore binding to it, the bind error shows up. This commit changes most tests to use the FileStore instead and includes a retry when testing the TCPStore.
Differential Revision: D10433401
fbshipit-source-id: 8dd575ac91a3cddd1cc41ddb0ff4311ddc58c813
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794
common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.
Reviewed By: orionr
Differential Revision: D10438204
fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
|
|
Summary:
A missing environment variable raised a missing key error. Now it
raises a more descriptive error of the actual problem, for example:
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11782
Differential Revision: D9888962
Pulled By: pietern
fbshipit-source-id: 5947e7a7bf7aa45f13bbd7b5e997529f26cc92d6
|
|
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490
Reviewed By: teng-li
Differential Revision: D9828116
Pulled By: pietern
fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7
|
|
Summary:
- fixes #9984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189
Differential Revision: D9545390
Pulled By: weiyangfb
fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e
|