Karl Lok-CSDN博客

原创 Cenos 7 下编译Tensorflow 2.2.0 GPU版本的源码

# 下载镜像sudo docker pull centos:7.2.1511# 运行镜像sudo docker run -itd --gpus all --privileged --network host --cap-add=SYS_ADMIN --cap-add=SYS_PTRACE --security-opt seccomp:unconfined $(find /dev/ -regex ".*/nvidia$1$$" | awk '{print " --device "$0}') $(f.

2020-09-05 12:29:00 184

原创 Centos 7 conda安装特定CUDA版本的Pytorch 1.4.0 和 Apex

## 目标安装支持CUDA 10.0的 Pytorch 1.4.0 以及 Apex#### 1. 安装Conda```bashwget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2019.03-Linux-x86_64.sh```或者```bashcurl -O Anaconda3-2019.03-Linux-x86_64.sh \ https://mirrors.tuna.tsinghua.ed.

2020-08-28 22:15:15 1106

原创 Nvidia Jetson 边缘计算系列性能比较最全图表

截止至Xavier的Jetson硬件与性能的图表，可用于技术选型参考P.S. 任何GPU技术相关问题欢迎加本人微信Karl34787333一起讨论

2020-06-06 17:04:54 3425

原创手把手教你用Docker看自动驾驶的激光点云：loam_velodyne激光点云可视化工具

目标在现有的Ubuntu系统上部署ROS耗时长而且冲突很多而且由于将ROS安装到Ubuntu 16.04上会遇到ROS依赖包与catkin_make冲突的问题，将ROS部署到Docker的container中，外部机器不受影响，避免包冲突container运行结果直接输出到host主机上效果图如下:loam_velodyne的项目地址安装&部署&运行先在hos...

2019-08-30 15:24:42 1200 3

NVIDIA A100 Customer Deck.pdf

英伟达A100产品资料，技术参数 A100：面积最大，性能最强具体提升了多少？还记得三年前推出、至今仍然业界领先的 Volta 架构芯片 Tesla V100 吗？V100 用 300W 功率提供了 7.8TFLOPS 的推断算力，有 210 亿个晶体管，但 A100 的算力直接是前者的 20 倍。「A100 是迄今为止人类制造出的最大 7 纳米制程芯片，」黄仁勋说道。A100 采用目前最先进的台积电（TSMC）7 纳米工艺，拥有 540 亿个晶体管，它是一块 3D 堆叠芯片，面积高达 826mm^2，GPU 的最大功率达到了 400W。这块 GPU 上搭载了容量 40G 的三星 HBM2 显存（比 DDR5 速度还快得多，就是很贵），第三代 Tensor Core。同时它的并联效率也有了巨大提升，其采用带宽 600GB/s 的新版 NVLink，几乎达到了 10 倍 PCIE 互联速度。

2020-07-08

NVIDIA GPU CUDA代码性能优化基础

Fundamental Optimizations in CUDA Optimization Overview GPU architecture Kernel optimization — Memory optimization — Latency optimization — Instruction optimization CPU-GPU interaction optimization — Overlapped execution using streams

2020-06-06

Nvidia 2020 安培架构GPU特性介绍

NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION AT EVERY SCALE Introduction The diversity of compute-intensive applications running in modern cloud data centers has driven the explosion of NVIDIA GPU-accelerated cloud computing. Such intensive applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, cloud gaming, and many more. From scaling-up AI training and scientific computing, to scaling-out inference applications, to enabling real-time conversational AI, NVIDIA GPUs provide the necessary horsepower to accelerate numerous complex and unpredictable workloads running in today’s cloud data centers. NVIDIA® GPUs are the leading computational engines powering the AI revolution, providing tremendous speedups for AI training and inference workloads. In addition, NVIDIA GPUs accelerate many types of HPC and data analytics applications and systems, allowing customers to effectively analyze, visualize, and turn data into insights. NVIDIA’s accelerated computing platforms are central to many of the world’s most important and fastest-growing industries. HPC has grown beyond supercomputers running computationally-intensive applications such as weather forecasting, oil & gas exploration, and financial modeling. Today, millions of NVIDIA GPUs are accelerating many types of HPC applications running in cloud data centers, servers, systems at the edge, and even deskside workstations, servicing hundreds of industries and scientific domains. AI networks continue to grow in size, complexity, and diversity, and the usage of AI-based applications and services is rapidly expanding. NVIDIA GPUs accelerate numerous AI systems and applications including: deep learning recommendation systems, autonomous machines (self-driving cars, factory robots, etc.), natural language processing (conversational AI, real-time language translation, etc.), smart city video analytics, software-defined 5G networks (that can deliver AI-based services at the Edge), molecular simulations, drone control, medical image analysis, and more.

2020-06-06

使用NCCL进行多GPU训练(MULTI-GPU TRAINING WITH NCCL)

使用NCCL进行多GPU深度学习训练，其中涉及多机多卡，单机多卡等技术。 Optimized inter-GPU communication for DL and HPC Optimized for all NVIDIA platforms, most OEMs and Cloud Scales to 100s of GPUs, targeting 10,000s in the near future. Aims at covering all communication needs for multi-GPU computing. Only relies on CUDA. No dependency on MPI or any parallel environment.

2020-06-06