???/cy-CSDN博客

即使你学习到了很好的特征，当你运用到下游任务的时候，你还是需要有标签的数据去做微调。把对比学习中的temperature这个超参数，设计为可学习的标量了，直接在模型训练过程中就被优化了，不需要当成一个超参数去调参。之前的自监督或者无监督的方法，主要研究的是特征学习的能力，他们的目标是学习一个泛化比较好的特征。现在的监督信号是一个文本，而不是n选1的这种标签了，所以模型的输入输出啊，自由度就大很多。训练出来的模型又大又好，而且又简单，泛化性又好，为多模态的训练铺平了道路。

2024-01-09 16:34:27 961

原创 Softmax中温度（temperature）参数

在Softmax中，温度（temperature）的作用是调整模型的输出分布的“平热程度”或“软硬程度”。因此，通过调整温度参数，可以在Softmax中平衡模型的“软硬”输出，从而影响模型的鲁棒性和泛化能力。

2024-01-09 14:53:42 929

原创 EVA:Exploring the Limits of Masked Visual Representation Learning at Scale

EVA是一个基础的Transformer视觉模型预训练任务：训练的图片是masked掉的50%的patches, 模型的任务是预测被遮挡的图像特征。模型经过预训练，学会了通过图像和文本的对齐关系来重构被遮挡的部分，使其能够理解图像和文本之间的关联。通过这个预训练任务，我们能够高效地将EVA扩展到十亿个参数。这样就可以得到很大的模型，在下游任务上会有很好的表现通过MIM 预训练，使得基于CLIP的预训练模型变大，得到1B param的EVA，这个EVA模型迁移在下游任务中表现非常好。

2024-01-08 17:24:18 505

原创 Gradient checkpointing

在深度学习中，反向传播（Backpropagation）是训练神经网络的关键步骤，其中需要保留前向传播时的中间结果以计算梯度。Gradient checkpointing通过在前向传播过程中将某些中间结果临时存储在内存中，而不是一直保留，从而显著减少了内存需求。具体而言，它通过在前向传播期间记录计算图的某些部分，然后在反向传播时重新计算这些部分，从而避免了在反向传播过程中保留所有中间结果。这种技术的主要优势是能够在有限的内存条件下训练更大的模型或处理更长的序列，从而提高了模型的训练效率。

2024-01-08 16:36:02 420

原创 EVA-CLIP

EVA-CLIP: 一系列显著的提升CLIP训练时的效率和有效性。用最新的表征学习，优化策略，增强使得EVA-CLIP在同样数量的参数下比之前的CLIP模型要好，且花费更小的训练资源。pre-trained EVA 来初始化CLIP的训练在ImageNet-1k val的成绩。

2024-01-08 13:55:19 567

原创大模型概述

420B token的数据集让模型可以通过上下文学习来理解并执行下游任务，并且统一了图片/视频、有监督/无监督、合成/真实、2D/3D/4D等几乎所有的数据形式。：模型结构大，参数量大，训练数据量大。

2023-12-15 14:29:55 369

原创天气大模型

然而，ERA5 包括降水，而 HRES 不包括。我们标记为“ERA5”的模型将降水量作为输入，并期望 ERA5 数据作为输入，而标记为“ERA5-HRES”的模型不将降水量作为输入，并经过专门训练以将 HRES-fc0 作为输入。这是因为均方误差对差异的平方进行了求和和平均，提供了一个单一的度量，用于表示模型的预测性能。这个均方误差是在垂直层级上加权平均的，说明在考虑不同层级的情况下，对误差的处理更加细致。通过调整 𝑁 的值，可以评估模型在不同时间范围内的预测性能，因为 𝑁 控制了自回归模型的步长。

2023-11-22 14:49:55 1095

原创 Group Convolution / Depthwise Convolution 轻量模型的必杀

我们的模型越深的时候，channel变的越来越多（更多的filter），然后h,w也因为pool，stride变的越来越小。Group Convolution 的 output也可以大幅度的增加channel，并且不需要大量的运算。

2023-11-02 14:42:16 92 1

原创 TensorRT加速的原因：量化+网络结构优化

TensorRT可以帮助你把训练好的AI模型，部署到边端Nvidia的设备，并实现对模型的量化与加速。TensorRT基于CUDA和cudnnCUDA看作是一个工作台，上面配有很多工具，如锤子、螺丝刀等。cuDNN是基于CUDA的深度学习GPU加速库，它就是个锤子。CUDA这个工作台买来的时候，并没有送锤子。想要在CUDA上运行深度神经网络，就要安装cuDNN，这样才能使GPU进行深度神经网络的工作，工作速度相较CPU快很多。

2023-11-02 11:18:52 162

原创 Videos

文章目录Video ClassificationEarly Fusion, Late Fusion, 3D CNN,Recognizing Actions from Motion 从动作中识别行为接下来介绍新技术回顾一下：非常多的video工作Video ClassificationEarly Fusion, Late Fusion, 3D CNN,Recognizing Actions from Motion 从动作中识别行为Measuring Motion: Optical FlowSe

2023-11-01 17:21:17 62

原创目标检测(Object Detection): 你需要知道的一些概念

注意，在后处理阶段使用NMS（Non-Max Suppression）哦！去除网络输出的重叠框。

2023-11-01 16:20:30 79

原创 Cross-Entropy Loss(多分类损失函数)

文章目录1. 网络输出output：score2. Cross-Entropy Loss(多分类损失函数)1. 网络输出output：score2. Cross-Entropy Loss(多分类损失函数)先用softmax function把score 变成 probabilities。再用交叉熵损失函数来进行Loss的计算

2023-11-01 11:24:35 1921 4

原创卷积模型的Memory, Params, Flop是如何计算的?

（乘法 + 加法）= 每一层输出的元素个数 * 每一个元素的运算量 = (c_out * H * W) * (c_in * K * K) = (64 * 56 * 56) * (3 * 11 * 11) = 72855552。权重的形状 = c_out * (c_in * k * k) = 64 * 3 * 11 * 11。= C * H * W = 64 * 56 * 56 = 200704 个元素。KB = 输出元素的个数 * 每个元素的大小 / 1024。偏差 = c_out = 64。

2023-10-31 09:51:14 102

原创 Batch Normalization

把数据拉回标准正态分布，因为神经网络的Block大部分都是矩阵运算，一个向量经过矩阵运算后值会越来越大，为了网络的稳定性，我们需要及时把值拉回正态分布。当testing的时候batchnorm就变成了一个线性运算（linear operator），可以跟前一层的全连接层或者卷积层融合起来计算。具体来说，当一个层的输入分布发生变化时，该层需要不断地适应新的输入分布，这会使得网络的训练过程变得不稳定，同时也会影响收敛速度和性能。我们的均值和方差是在Minibatch的基础上做的。

2023-10-30 18:32:58 81

原创剪枝 Pruning | 剪枝系统性的介绍（持续更新）

基于缩放的剪枝通常与剪枝阈值结合使用，权重的缩放因子与阈值比较，如果权重的缩放因子低于阈值，则相应的权重将被剪枝。“Second-Order-based Pruning”（基于二阶导数的剪枝）是一种神经网络剪枝技术，它利用神经网络中参数的二阶导数信息来确定哪些参数应该被剪枝。“Magnitude-based pruning”（基于权重大小的剪枝）是一种常见的神经网络剪枝技术，它基于神经网络中的参数（通常是权重）的大小来确定哪些参数应该被剪枝，以减小模型的大小和复杂性。通常，参数的零值比例与阈值进行比较。

2023-10-26 18:13:35 399

原创 AI算法sdk

为模型的输出开辟了内存空间之后，我们就可以开始对tensorRT序列化的引擎进行反序列化得到图片的输出了。本文介绍了SDK的流程，和主要的类。

2023-10-19 21:12:31 97

原创 C++传输图片给服务端

【代码】C++传输图片给服务端。

2023-03-13 18:57:17 454 2

原创 list(zip(*out))

# 声明一个列表nums = [['a1', 'a2', 'a3'], ['b1', 'b2', 'b3']]# 参数为list数组时，是压缩数据，相当于zip()函数iters = zip(*nums) # 输出zip(*zipped)函数返回对象的类型# print("type of iters is %s" % type(iters))# 因为zip(*zipped)函数返回一个zip类型对象，所以我们需要对其进行转换# 在这里，我们将其转换为列表print(list(ite..

2021-12-26 14:51:56 585

原创 ETH行人检测数据集，annotation转换xml

import globimport osfrom shutil import copyimport xml.dom.minidom as minidomwith open("F:/pedestrian/seq03-img-left.idl") as file_obj: for line in file_obj: line_break = line.strip(";\n").split(":") img_name = line_break[0].strip(.

2021-10-22 10:03:22 451 1

原创 python|axis

axis是将矩阵进行分组原始矩阵的shape=[3,4,5]，取axis=0再进行操作后，得到的矩阵shape=[4,5]。取axis=1再进行操作后，得到的矩阵shape=[3,5]。取axis=-1（axis=2）再操作后，shape=[3,4]。掌握这一点，能有利于你在神经网络中的变换或是数据操作中明确矩阵变换前后的形状...

2021-07-28 10:03:14 257

原创 C++|Passing arguments by reference

目的：While pass by value is suitable in many cases, it has a couple of limitations. First, when passing a large struct or class to a function, pass by value will make a copy of the argument into the function parameter. In many cases, this is a needless per

2021-07-06 15:42:18 355

原创 C++|Const

Classesare an expanded concept ofdata structures: like data structures, they can contain data members, but they can also contain functions as members.class class_name { access_specifier_1: member1; access_specifier_2: member2; ...} ob...

2021-07-06 13:51:38 63

原创 C++|Other data types

Type aliases (typedef / using)In C++, any valid type can be aliased so that it can be referred to with a different identifier.//using new_type_name = existing_type ;using C = char;using WORD = unsigned int;using pChar = char *;using field = char.

2021-06-28 15:47:48 161

原创 C++|Data structures

Data structuresAdata structureis a group of data elements grouped together under one name. These data elements, known asmembers, can have different types and different lengths. Data structures can be declared in C++ using the following syntax:struc...

2021-06-25 20:09:28 90

原创 C++|Dynamic memory

In the programs seen in previous chapters, all memory needs were determined before program execution by defining the variables needed. But there may be cases where the memory needs of a program can only be determined during runtime. For example, when the m

2021-06-25 17:36:34 131

原创 C++|Pointer

Address-of operator (&)The address of a variable can be obtained by preceding the name of a variable with an ampersand sign (&), known asaddress-of operator. For example:int *foo;foo = &myvar; //foo 必须是个指针，才能存&This would assign t..

2021-06-25 16:35:44 105

原创 C++|Pointer arithmetics

*p++ // same as *(p++): increment pointer, and dereference unincremented address*++p // same as *(++p): increment pointer, and dereference incremented address++*p // same as ++(*p): dereference pointer, and increment the value it points to(*p)++ .

2021-06-25 11:26:57 122

原创 C++|Array

A typical declaration for an array in C++ is:type name [elements];wheretypeis a valid type (such asint,float...),nameis a valid identifier and theelementsfield (which is always enclosed in square brackets[]), specifies the length of the array i...

2021-06-24 11:27:43 138

原创 C++|Namespaces,Storage classes

Namespaces allow us to group named entities that otherwise would haveglobal scopeinto narrower scopes, giving themnamespace scope. This allows organizing the elements of programs into different logical scopes referred to by names.namespace identifier...

2021-06-24 10:10:04 70

原创 C++|Templates

Defining a function template follows the same syntax as a regular function, except that it is preceded by thetemplatekeyword and a series of template parameters enclosed in angle-brackets <>:template <template-parameters> function-declarati..

2021-06-24 09:32:21 72

原创 C++|inline function,Declaring functions,Recursivity

inline functionCalling a function generally causes a certain overhead (stacking arguments, jumps, etc...), and thus for very short functions, it may be more efficient to simply insert the code of the function where it is called, instead of performing the

2021-06-23 16:17:13 100

原创 C++|Efficiency considerations and const references

string concatenate (string a, string b){ return a+b;}This function takes two strings as parameters (by value), and returns the result of concatenating them. By passing the arguments by value, the function forcesaandbto be copies of the arguments...

2021-06-23 15:36:16 84

空空如也

空空如也