Os seguintes comandos foram executados no Ubuntu 16.04 e a versão do python é 3.5
. Quando eu corri a rotina python sem redirecionamento de dados
python3 opt_CNN2_dense.py
Erro de saída de telaExaurecido como este:
/usr/local/lib/python3.5/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from 'float' to 'np.floating' is deprecated. In futu
re, it will be treated as 'np.float64 == np.dtype(float).type'.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Train on 123200 samples, validate on 30800 samples
Epoch 1/10
2018-04-07 11:14:44.279768: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-07 11:14:44.978444: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at leas
t one NUMA node, so returning NUMA node zero
2018-04-07 11:14:44.979036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-04-07 11:14:44.979273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-07 11:14:59.113240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10750 MB memo
ry) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2018-04-07 11:15:21.405519: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.88GiB. Current allocation summ
ary follows.
2018-04-07 11:15:21.405695: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (256): Total Chunks: 55, Chunks in use: 55. 13.8KiB allocated for chunks. 13.8KiB in u
se in bin. 1.2KiB client-requested in use in bin.
2018-04-07 11:15:21.405785: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0
B client-requested in use in bin.
2018-04-07 11:15:21.405804: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (1024): Total Chunks: 13, Chunks in use: 13. 20.8KiB allocated for chunks. 20.8KiB in u
se in bin. 18.4KiB client-requested in use in bin.
2018-04-07 11:15:21.405850: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0
B client-requested in use in bin.
2018-04-07 11:15:21.405866: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (4096): Total Chunks: 1, Chunks in use: 1. 4.0KiB allocated for chunks. 4.0KiB in use i
n bin. 4.0KiB client-requested in use in bin.
2018-04-07 11:15:21.405926: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (8192): Total Chunks: 1, Chunks in use: 1. 8.0KiB allocated for chunks. 8.0KiB in use i
n bin. 8.0KiB client-requested in use in bin.
2018-04-07 11:15:21.405971: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (16384): Total Chunks: 6, Chunks in use: 6. 118.5KiB allocated for chunks. 118.5
KiB in use in bin. 117.2KiB client-requested in use in bin.
2018-04-07 11:15:21.406013: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (32768): Total Chunks: 1, Chunks in use: 1. 44.0KiB allocated for chunks. 44.0Ki
B in use in bin. 44.0KiB client-requested in use in bin.
2018-04-07 11:15:21.406055: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use i
n bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406096: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use i
n bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406135: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use i
n bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406175: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use i
n bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406209: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (1048576): Total Chunks: 7, Chunks in use: 7. 11.92MiB allocated for chunks. 11.92
MiB in use in bin. 11.30MiB client-requested in use in bin.
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 173, in run
2018-04-07 11:15:21.406261: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406292: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406323: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (8388608): Total Chunks: 6, Chunks in use: 6. 72.66MiB allocated for chunks. 72.66MiB in use in bin. 72.66MiB client-requested in use in bin.
2018-04-07 11:15:21.406354: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406398: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (33554432): Total Chunks: 7, Chunks in use: 6. 399.58MiB allocated for chunks. 366.21MiB in use in bin. 366.21MiB client-requested in use in bin.
2018-04-07 11:15:21.406436: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406467: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-07 11:15:21.406497: I tensorflow/core/common_runtime/bfc_allocator.cc:630] Bin (268435456): Total Chunks: 3, Chunks in use: 2. 10.03GiB allocated for chunks. 9.77GiB in use in bin. 9.77GiB client-requested in use in bin.
2018-04-07 11:15:21.406529: I tensorflow/core/common_runtime/bfc_allocator.cc:646] Bin for 4.88GiB was 256.00MiB, Chunk State:
2018-04-07 11:15:21.406563: I tensorflow/core/common_runtime/bfc_allocator.cc:652] Size: 266.07MiB | Requested Size: 1.72MiB | in_use: 0, prev: Size: 4.88GiB | Requested Size: 4.88GiB | in_use: 1
2018-04-07 11:15:21.406672: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405140000 of size 1280
2018-04-07 11:15:21.406751: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405140500 of size 256
2018-04-07 11:15:21.406803: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405140600 of size 256
2018-04-07 11:15:21.406848: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405140700 of size 20224
2018-04-07 11:15:21.406875: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405145600 of size 256
2018-04-07 11:15:21.406928: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405145700 of size 256
2018-04-07 11:15:21.406950: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405145800 of size 1792
2018-04-07 11:15:21.407015: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405145f00 of size 256
2018-04-07 11:15:21.407027: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405146000 of size 256
2018-04-07 11:15:21.407043: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405146100 of size 256
2018-04-07 11:15:21.407051: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405146200 of size 256
2018-04-07 11:15:21.407114: I tensorflow/core/common_runtime/bfc_allocator.cc:665] Chunk at 0x405146300 of size 256
...
2018-04-07 11:15:21.410385: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 20224 totalling 118.5KiB
2018-04-07 11:15:21.410435: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 45056 totalling 44.0KiB
2018-04-07 11:15:21.410475: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 1698304 totalling 1.62MiB
2018-04-07 11:15:21.410487: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 1800192 totalling 10.30MiB
2018-04-07 11:15:21.410531: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 12697600 totalling 72.66MiB
2018-04-07 11:15:21.410544: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 64000000 totalling 366.21MiB
2018-04-07 11:15:21.410586: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 5242880000 totalling 9.77GiB
2018-04-07 11:15:21.410599: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 10.21GiB
2018-04-07 11:15:21.410644: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 11272650752
InUse: 10958659072
MaxInUse: 11055887872
NumAllocs: 108
MaxAllocSize: 5242880000
...
2018-04-07 11:15:31.415584: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 1 Chunks of size 1698304 totalling 1.62MiB
2018-04-07 11:15:31.415597: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 1800192 totalling 10.30MiB
2018-04-07 11:15:31.415639: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 12697600 totalling 72.66MiB
2018-04-07 11:15:31.415744: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 6 Chunks of size 64000000 totalling 366.21MiB
2018-04-07 11:15:31.415763: I tensorflow/core/common_runtime/bfc_allocator.cc:674] 2 Chunks of size 5242880000 totalling 9.77GiB
2018-04-07 11:15:31.415771: I tensorflow/core/common_runtime/bfc_allocator.cc:678] Sum Total of in-use chunks: 10.21GiB
2018-04-07 11:15:31.415797: I tensorflow/core/common_runtime/bfc_allocator.cc:680] Stats:
Limit: 11272650752
InUse: 10958659072
MaxInUse: 11055887872
NumAllocs: 108
MaxAllocSize: 5242880000
2018-04-07 11:15:31.415859: W tensorflow/core/common_runtime/bfc_allocator.cc:279] **************************************************************************************************__
2018-04-07 11:15:31.415928: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at cwise_ops_common.cc:70 : Resource exhausted: OOM when allocating tensor with shape[1024,2,128,5000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,2,128,5000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: training/Adam/gradients/zeros_4 = Fill[T=DT_FLOAT, _class=["loc:@conv1/Relu"], index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/Shape_5, training/Adam/gradients/zeros_4/Const)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: loss/mul/_129 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_845_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "a.py", line 97, in <module>
return_argmin=True
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "/usr/local/lib/python3.5/dist-packages/hyperopt/base.py", line 635, in fmin
return_argmin=return_argmin)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 320, in fmin
rval.exhaust()
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 199, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 173, in run
self.serial_evaluate()
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 92, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/base.py", line 840, in evaluate
rval = self.fn(pyll_rval)
File "a.py", line 48, in f_nn
callbacks=callbacks)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1705, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1235, in _fit_loop
outs = f(ins_batch)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2478, in __call__
**self.session_kwargs)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,2,128,5000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: training/Adam/gradients/zeros_4 = Fill[T=DT_FLOAT, _class=["loc:@conv1/Relu"], index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/Shape_5, training/Adam/gradients/zeros_4/Const)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: loss/mul/_129 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_845_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'training/Adam/gradients/zeros_4', defined at:
File "a.py", line 97, in <module>
return_argmin=True
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 307, in fmin
return_argmin=return_argmin,
File "/usr/local/lib/python3.5/dist-packages/hyperopt/base.py", line 635, in fmin
return_argmin=return_argmin)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 320, in fmin
rval.exhaust()
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 199, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 173, in run
self.serial_evaluate()
File "/usr/local/lib/python3.5/dist-packages/hyperopt/fmin.py", line 92, in serial_evaluate
result = self.domain.evaluate(spec, ctrl)
File "/usr/local/lib/python3.5/dist-packages/hyperopt/base.py", line 840, in evaluate
rval = self.fn(pyll_rval)
File "a.py", line 48, in f_nn
callbacks=callbacks)
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1682, in fit
self._make_train_function()
File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 990, in _make_train_function
loss=self.total_loss)
File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/keras/optimizers.py", line 445, in get_updates
grads = self.get_gradients(loss, params)
File "/usr/local/lib/python3.5/dist-packages/keras/optimizers.py", line 78, in get_gradients
grads = K.gradients(loss, params)
File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2515, in gradients
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 602, in gradients
out_grads[i] = control_flow_ops.ZerosLikeOutsideLoop(op, i)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1477, in ZerosLikeOutsideLoop
return array_ops.zeros(zeros_shape, dtype=val.dtype)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1570, in zeros
output = fill(shape, constant(zero, dtype=dtype), name=name)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1713, in fill
"Fill", dims=dims, value=value, name=name)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/iamshg8/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,2,128,5000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: training/Adam/gradients/zeros_4 = Fill[T=DT_FLOAT, _class=["loc:@conv1/Relu"], index_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/Shape_5, training/Adam/gradients/zeros_4/Const)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: loss/mul/_129 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_845_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
e uma vez eu queria enviar informações de erro para um arquivo usando o seguinte comando. Não havia informações de erro do recurso esgotado no arquivo opt_CNN2.dense.error
.
python3 opt_CNN2_dense.py > opt_CNN2.dense.inf.txt 2> opt_CNN2.dense.error
O conteúdo do arquivo opt_CNN2.dense.error
foi:
2018-04-07 11:21:47.313671: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-07 11:21:47.410104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at leas
t one NUMA node, so returning NUMA node zero
2018-04-07 11:21:47.410530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-04-07 11:21:47.410551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-04-07 11:21:47.704597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10750 MB memo
ry) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
O que eu não consegui entender foi onde as informações de erro Resource Exhausted foram?
update:
E eu posso ter certeza que a segunda vez que o programa aconteceu com error.Because o inf.txt
estava vazio. (a linha de fundo)
iamshg8@instance-1:~$ !cat
cat rml/py/inf.error
2018-04-07 11:21:47.313671: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions t
hat this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-07 11:21:47.410104: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read
from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-07 11:21:47.410530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properti
es:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-04-07 11:21:47.410551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices:
0
2018-04-07 11:21:47.704597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/
job:localhost/replica:0/task:0/device:GPU:0 with 10750 MB memory) -> physical GPU (device: 0, name: Tesla K80, pc
i bus id: 0000:00:04.0, compute capability: 3.7)
iamshg8@instance-1:~$ cat rml/py/inf.txt
iamshg8@instance-1:~$