Accelerate PyTorch training with torch-ort

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

It is available via the torch-ort python package.

This repository contains the source code for the package, as well as instructions for running the package.

To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.

github.com

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge. LVIS is available at this http URL.

arxiv.org

www.lvisdataset.org

WARNING: `pyenv init -` no longer sets PATH. Run `pyenv init` to see the necessary changes to make to your configuration.

先日pyenv updateして.bachrcを読み込み直すと以下のWARNIINGが発生。pyenv でセットアップしたpythonの環境が使えなくなった。

WARNING: `pyenv init -` no longer sets PATH.
Run `pyenv init` to see the necessary changes to make to your configuration.

github.com

.bashrcで記載していた

eval "$(pyenv init -)"

eval “$(pyenv init --path)” 

に変更して解決。

"Visual Studio Code is unable to watch for file changes in this large workspace" (error ENOSPC)

"Visual Studio Code is unable to watch for file changes in this large workspace" (error ENOSPC)

When you see this notification, it indicates that the VS Code file watcher is running out of handles because the workspace is large and contains many files. Before adjusting platform limits, make sure that potentially large folders, such as Python .venv, are added to the files.watcherExclude setting (more details below). The current limit can be viewed by running:

cat /proc/sys/fs/inotify/max_user_watches

The limit can be increased to its maximum by editing /etc/sysctl.conf (except on Arch Linux, read below) and adding this line to the end of the file:

fs.inotify.max_user_watches=524288

The new value can then be loaded in by running sudo sysctl -p.

While 524,288 is the maximum number of files that can be watched, if you're in an environment that is particularly memory constrained, you may want to lower the number. Each file watch takes up 1080 bytes, so assuming that all 524,288 watches are consumed, that results in an upper bound of around 540 MiB.

code.visualstudio.com

RuntimeError: CUDA error: no kernel image is available for execution on the device

pytorch 1.8.0がリリースされました。

github.com

早速アップデート

pip install --upgrade torch torchvision torchaudio

すると,

Traceback (most recent call last):
  File "/home/satoharu/benchmark_NN/classification/src/train.py", line 95, in <module>
    main(config, dset_config)
  File "/home/satoharu/benchmark_NN/classification/src/train.py", line 77, in main
    trainer.train(
  File "/home/satoharu/benchmark_NN/classification/src/trainers/BaseTrainer.py", line 27, in train
    self._train(
  File "/home/satoharu/benchmark_NN/classification/src/trainers/clf.py", line 78, in _train
    output = self.model(x)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/satoharu/benchmark_NN/classification/src/models/torchhub.py", line 41, in forward
    x = layer(x)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/model.py", line 311, in forward
    x = self.extract_features(inputs)
  File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/model.py", line 286, in extract_features
    x = self._swish(self._bn0(self._conv_stem(inputs)))
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/utils.py", line 270, in forward
    x = self.static_padding(x)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 23, in forward
    return F.pad(input, self.padding, 'constant', self.value)
  File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/functional.py", line 3997, in _pad
    return _VF.constant_pad_nd(input, pad, value)
RuntimeError: CUDA error: no kernel image is available for execution on the device

CUDA error: no kernel image is available for execution on the device????

RTX3090+CUDA11.2では以下のコマンドでpytorchをインストールします。

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

RTX2080Ti系はCUDA11でもCUDA10系のpytorchで大丈夫なんですが, RTX30番台ではできない様子。

少々面倒ですね。

↓深層学習が簡単に実行できるレポジトリ。 github.com

Ubuntu 20.04 CUDA インストール

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

RTX3090 vs RTX2080ti

前回から少し実行コマンドを変更しました.

  • RTX3090(×4)
 time python  -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --data coco.yaml --weights yolov5s.pt --epochs 1
Evaluating pycocotools mAP... saving runs/train/exp11/_predictions.json...
loading annotations into memory...
Done (t=0.38s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.68s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=72.68s).
Accumulating evaluation results...
DONE (t=15.97s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.502
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.335
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.183
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.390
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.527
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.579
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655

real    8m44.646s
user    82m46.021s
sys     4m21.447s
  • RTX2080ti(×3)+ Volta 100(×1)
time python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --data coco.yaml --weights yolov5s.pt --epochs 1
Evaluating pycocotools mAP... saving runs/train/exp16/_predictions.json...
loading annotations into memory...
Done (t=0.41s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.98s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=90.50s).
Accumulating evaluation results...
DONE (t=17.48s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.501
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.336
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.355
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.387
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.273
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.471
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.525
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651

real    9m36.666s
user    84m36.826s
sys     8m9.428s