diff options
author | Dmytro Dzhulgakov <dzhulgakov@fb.com> | 2019-02-12 21:13:25 -0800 |
---|---|---|
committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | 2019-02-12 21:16:34 -0800 |
commit | 51dd2000cdaf6934663260b8f6832a8c2863710d (patch) | |
tree | e9b2c2a174474708342ce8cb908cd420ba83a0ff /third_party | |
parent | f87022bf2f9514c17f876eefc2aeffea4564912a (diff) | |
download | pytorch-51dd2000cdaf6934663260b8f6832a8c2863710d.tar.gz pytorch-51dd2000cdaf6934663260b8f6832a8c2863710d.tar.bz2 pytorch-51dd2000cdaf6934663260b8f6832a8c2863710d.zip |
unify c2 and TH allocator (#16892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892
Replaces https://github.com/pytorch/pytorch/pull/14517
Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators.
`memset` of caffe2 allocator is gone now. These two allocators should be almost the same.
Baseline:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 256K (x24)
L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl 148 ns 148 ns 4676594
BM_StorageImplCtor 54 ns 54 ns 12957810
BM_MallocStorageImpl 62 ns 62 ns 11254745
BM_TensorImplCtor 22 ns 22 ns 31939472
BM_MallocTensorImpl 105 ns 105 ns 6505661
BM_Malloc_1 43 ns 43 ns 16464905
BM_MakeTensorFromStorage 126 ns 126 ns 5586116
BM_MakeVariableFromTensor 236 ns 236 ns 2995528
BM_ATenCPUTensorAllocationSmall1 319 ns 319 ns 2268884
BM_ATenCPUTensorAllocationSmall2 318 ns 318 ns 2163332
BM_ATenCPUTensorAllocationMedium1 403 ns 403 ns 1663228
BM_ATenCPUTensorAllocationMedium2 448 ns 448 ns 1595004
BM_ATenCPUTensorAllocationBig1 532 ns 532 ns 1352634
BM_ATenCPUTensorAllocationBig2 4486 ns 4486 ns 160978
```
Changed:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 256K (x24)
L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl 141 ns 141 ns 4803576
BM_StorageImplCtor 55 ns 55 ns 13129391
BM_MallocStorageImpl 64 ns 64 ns 11088143
BM_TensorImplCtor 23 ns 23 ns 31616273
BM_MallocTensorImpl 101 ns 101 ns 7017585
BM_Malloc_1 39 ns 39 ns 18523954
BM_MakeTensorFromStorage 118 ns 118 ns 5877919
BM_MakeVariableFromTensor 452 ns 452 ns 1565722
BM_ATenCPUTensorAllocationSmall1 384 ns 384 ns 1819763
BM_ATenCPUTensorAllocationSmall2 389 ns 389 ns 1857483
BM_ATenCPUTensorAllocationMedium1 425 ns 425 ns 1646284
BM_ATenCPUTensorAllocationMedium2 430 ns 430 ns 1561319
BM_ATenCPUTensorAllocationBig1 508 ns 508 ns 1309969
BM_ATenCPUTensorAllocationBig2 3799 ns 3799 ns 173674
```
lstm benchmark:
Before:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```
After:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```
Reviewed By: ezyang
Differential Revision: D13202632
fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d
Diffstat (limited to 'third_party')
0 files changed, 0 insertions, 0 deletions