Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
252 views
in Technique[技术] by (71.8m points)

python - Keras/Tensorflow 2.3.0 Convolution model poor traning performance

I have a Tensorflow/Keras 2.3.0 environemnt all set up with CUDA on i7/GTX-1070 machine. Doin a successfull U-net/U-net++ segmentation tasks on this setup, but with relatively small input sizes like 32x32 or 64x64.

My CUDA acceleration is actually doing job because I do see a performance drop when forcing CUDA off via "visible_devices"

also had a TensorBoard profiler run over my model and it's not input bound. Using a Keras DataGenerators with input data preparation threading and overall a good score in TensorBoard. Not input bound.

BUT! Stuff that make me suspicious:

  1. Low GPU utilization in windows 10 profiler - near 0%, but good pile of GPU memory used. Also semi-ok CPU utilization - 30-40-50%, seems reasonoable as data preparation is going on CPU with multipile threads.
  2. No luck with huge models with 512x512 input size. I mean I saw a couple of articles over internet, people are training 512x512 U-net++ within a hour/epoch. (Thats really depends on how much samples in epoch, mine is hundred thousands of samples with ~1sec per sample time on 512x512 Unet++ large net)

So, am I doing something wrong? or its totaly okay and 1070 is really that slow? Any advices how to overcome to success with large 512x512 inputs?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...