wujpbb7-CSDN博客

import torcha = torch.randn(3,5)print(a)# 下行会有错误 IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [2], [3]#b = a[[0,2],[1,3,4]] # 改成 b = a[[0,2],:][:,[1,3,4]] print(b)输出是：tensor([[ 0.3627, -0.7073, -0.39.

2022-05-09 15:05:27 1516

原创 [8] Assertion failed: dims.nbDims == 4 || dims.nbDims == 5

onnx 转 trt 的时候出现错误：[04/22/2022-15:45:13] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.[04/22/2022-15:45:13] [E] [TRT] (Unnamed

2022-04-22 16:18:55 1717

原创 enforce fail at inline_container.cc:222

执行 torch.load 时出现错误：“RuntimeError: [enforce fail at inline_container.cc:222] . file not found: archive/data/94479765723472”或者“RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/94100453134480: inval

2022-04-14 15:54:40 3443 3

原创使用 trt 的int8 量化和推断 onnx 模型

目录生成 trt 模型1、使用代码2、onnx模型和图片3、修改代码4、结果推断 trt 模型生成 trt 模型1、使用代码https://github.com/rmccorm4/tensorrt-utils.git2、onnx模型和图片模型：动态batch输入（假设为mob_w160_h160.onnx，输入是 [batchsize, 3, 160, 160]）。图片：一堆图片（假设有1024张），不需要其他描述文件。在tensorrt-u...

2022-03-27 22:55:41 8036 2

原创画 ArcFace 中的 margin 曲线

效果如下：代码如下：from math import cos, sin, piimport numpy as npimport matplotlib.pyplot as plt'''# https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/losses.pyclass ArcFace(torch.nn.Module): """ ArcFace (https:/..

2022-03-17 20:12:22 493

原创 Unable to determine the device handle for GPU 0000:02:00.0: GPU is lost.

TITAN X (Pascal) 的显卡，当 batch size 过大爆显存时，就会出现 GPU丢失的错误。

2022-02-17 18:49:51 1010

原创 unhandled system error, NCCL version 2.7.8

在宿主机上运行基于 DDP 的 pytorch 训练程序没问题，进入 docker 后运行，出现 "unhandled system error, NCCL version 2.7.8" 的错误。解决方法：在 python -m torch.distributed.launch --nproc_per_node=4 ...前加上 NCCL_DEBUG=INFO可以看到：s215:623:649 [3] include/shm.h:48 NCCL WARN Error while cr

2022-02-16 17:58:14 4255 1

原创在两台 ubuntu 上安装 K8S

参考：1、ubuntu 安装 k8s2、报错：The connection to the server localhost:8080 was refused - did you specify the right host or port?3、Connecting to raw.githubusercontent.com failed: Connection refused. 解决办法安装 flannel 时使用：(python38) ai200@ubuntu16:/$ kubectl

2022-02-15 16:27:35 836

原创在两台 ubuntu 之间传输大文件

方法1：scp -c [email protected] usrname@ip:dir加上 -c [email protected]，可以加速。方法2：rsync -avP a.tar.gz usrname@ip:dir参考：1、linux中scp传文件速度慢原因2、为什么scp这么慢，如何使它更快？...

2022-01-25 10:38:36 1460

原创 ubuntu14.04 升级到 ubuntu16.04

参考：将Ubuntu 16.04 LTS 升级到 18.04 LTS | 以及问题汇总# 升级前(base) root@s215:~# cat /proc/versionLinux version 3.13.0-147-generic (buildd@lcy01-amd64-024) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.4) ) #196-Ubuntu SMP Wed May 2 15:51:34 UTC 2018# 升级后(bas

2022-01-24 16:21:20 967

原创多机多卡训练时的错误

错误1：“NCCL WARN Connect to failed : Network is unreachable”解决方法：设置环境变量NCCL_SOCKET_IFNAME=enp（有可能是eno，可以先用ifconfig 查看）

2021-12-23 19:51:50 1682 1

原创 onnxruntime-gpu 1.7 出现的警告“Force fallback to CPU execution for node: Gather_191”等

使用 onnxruntime-gpu（简称ORT）1.7 推断 onnx模型时出现如下警告，2021-12-01 15:50:30.792327215 [W:onnxruntime:Default, fallback_cpu_capability.cc:135 GetCpuPreferredNodes] Force fallback to CPU execution for node: Gather_1912021-12-01 15:50:30.792374122 [W:onnxruntime:De

2021-12-01 17:16:00 2092

空空如也

空空如也