Accelerate PyTorch training with torch-ort
ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.
It is available via the torch-ort python package.
This repository contains the source code for the package, as well as instructions for running the package.
To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Progress on object detection is enabled by datasets that focus the research community's attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis'): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge. LVIS is available at this http URL.
WARNING: `pyenv init -` no longer sets PATH. Run `pyenv init` to see the necessary changes to make to your configuration.
先日pyenv update
して.bachrcを読み込み直すと以下のWARNIINGが発生。pyenv でセットアップしたpythonの環境が使えなくなった。
WARNING: `pyenv init -` no longer sets PATH. Run `pyenv init` to see the necessary changes to make to your configuration.
.bashrcで記載していた
eval "$(pyenv init -)"
を
eval “$(pyenv init --path)”
に変更して解決。
"Visual Studio Code is unable to watch for file changes in this large workspace" (error ENOSPC)
"Visual Studio Code is unable to watch for file changes in this large workspace" (error ENOSPC)
When you see this notification, it indicates that the VS Code file watcher is running out of handles because the workspace is large and contains many files. Before adjusting platform limits, make sure that potentially large folders, such as Python .venv, are added to the files.watcherExclude setting (more details below). The current limit can be viewed by running:
cat /proc/sys/fs/inotify/max_user_watches
The limit can be increased to its maximum by editing /etc/sysctl.conf
(except on Arch Linux, read below) and adding this line to the end of the file:
fs.inotify.max_user_watches=524288
The new value can then be loaded in by running sudo sysctl -p
.
While 524,288 is the maximum number of files that can be watched, if you're in an environment that is particularly memory constrained, you may want to lower the number. Each file watch takes up 1080 bytes, so assuming that all 524,288 watches are consumed, that results in an upper bound of around 540 MiB.
RuntimeError: CUDA error: no kernel image is available for execution on the device
pytorch 1.8.0がリリースされました。
早速アップデート
pip install --upgrade torch torchvision torchaudio
すると,
Traceback (most recent call last): File "/home/satoharu/benchmark_NN/classification/src/train.py", line 95, in <module> main(config, dset_config) File "/home/satoharu/benchmark_NN/classification/src/train.py", line 77, in main trainer.train( File "/home/satoharu/benchmark_NN/classification/src/trainers/BaseTrainer.py", line 27, in train self._train( File "/home/satoharu/benchmark_NN/classification/src/trainers/clf.py", line 78, in _train output = self.model(x) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/satoharu/benchmark_NN/classification/src/models/torchhub.py", line 41, in forward x = layer(x) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/model.py", line 311, in forward x = self.extract_features(inputs) File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/model.py", line 286, in extract_features x = self._swish(self._bn0(self._conv_stem(inputs))) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/satoharu/.cache/torch/hub/lukemelas_EfficientNet-PyTorch_master/efficientnet_pytorch/utils.py", line 270, in forward x = self.static_padding(x) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/modules/padding.py", line 23, in forward return F.pad(input, self.padding, 'constant', self.value) File "/home/satoharu/.pyenv/versions/3.9.2_bench/lib/python3.9/site-packages/torch/nn/functional.py", line 3997, in _pad return _VF.constant_pad_nd(input, pad, value) RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA error: no kernel image is available for execution on the device
????
RTX3090+CUDA11.2では以下のコマンドでpytorchをインストールします。
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
RTX2080Ti系はCUDA11でもCUDA10系のpytorchで大丈夫なんですが, RTX30番台ではできない様子。
少々面倒ですね。
↓深層学習が簡単に実行できるレポジトリ。 github.com
Ubuntu 20.04 CUDA インストール
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" sudo apt-get update sudo apt-get -y install cuda
RTX3090 vs RTX2080ti
前回から少し実行コマンドを変更しました.
- RTX3090(×4)
time python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --data coco.yaml --weights yolov5s.pt --epochs 1
Evaluating pycocotools mAP... saving runs/train/exp11/_predictions.json... loading annotations into memory... Done (t=0.38s) creating index... index created! Loading and preparing results... DONE (t=6.68s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=72.68s). Accumulating evaluation results... DONE (t=15.97s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.313 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.502 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.335 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.183 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.358 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.390 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.272 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.472 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.527 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.346 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.579 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.655 real 8m44.646s user 82m46.021s sys 4m21.447s
- RTX2080ti(×3)+ Volta 100(×1)
time python -m torch.distributed.launch --nproc_per_node 4 train.py --batch-size 64 --data coco.yaml --weights yolov5s.pt --epochs 1
Evaluating pycocotools mAP... saving runs/train/exp16/_predictions.json... loading annotations into memory... Done (t=0.41s) creating index... index created! Loading and preparing results... DONE (t=6.98s) creating index... index created! Running per image evaluation... Evaluate annotation type *bbox* DONE (t=90.50s). Accumulating evaluation results... DONE (t=17.48s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.501 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.336 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.355 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.387 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.273 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.471 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.525 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.577 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651 real 9m36.666s user 84m36.826s sys 8m9.428s