lan_12138-CSDN博客

原创 Pytorch安装遇到的坑

注意：版本匹配很重要！AttributeError: ‘NoneType’ object has no attribute ‘origin’这个是因为直接使用pip install torch-sparse==某个版本``pip install torch-scatter==某个版本,导致安装的torch-geometric不含有-cuda.so链接库导致的。详情可以参考：https://github.com/pyg-team/pytorch_geometric/issues/2304如果你使用如下

2022-01-04 20:16:24 1496

原创 Driving Behavior Modeling Using Naturalistic Human Driving Data With Inverse Reinforcement Learning

数学建模The state st∈S\mathbf{s}_t \in \mathcal{S}st∈S: the driver observes at timestep ttt consists of the position, orientations, and velocities of itself and surrounding vehiclesThe action at∈A\mathbf{a}_t \in \mathcal{A}at∈A the driver takes is compos

2021-10-11 19:30:43 412 3

原创最大熵逆强化学习

逆强化学习给定expert (专家)的一组demonstration (示范) D={τi}i=1nD=\{\tau_i\}_{i=1}^nD={τi}i=1n, 其中τi={(si1,ai1),(si2,ai2),...,(si(n−1),ai(n−1)),sn}\tau_i = \{(s_{i1}, a_{i1}), (s_{i2}, a_{i2}), ..., (s_{i(n-1)}, a_{i(n-1)}), s_n\}τi={(si1,ai1),(si2,ai2),...,(si(

2021-08-17 15:13:51 2615

原创为什么要最大化熵？

为什么要最大化熵？What entropy represents？The entropy H(p)\mathbf{H}(p)H(p) of some event probability distribution ppp is defined as:H(p)=−∑x∈Xp(x)log⁡2p(x)(1)\mathbf{H}(p) = -\sum_{x\in \mathcal{X}}p(x) \log_2 p(x) \tag{1}H(p)=−x∈X∑p(x)log2p(x)(1)where X\mat

2021-08-02 17:05:06 629 1

原创逆强化学习经典算法复现(一)

**前言：**这篇博客复现的是文章“Algorithms for Inverse Reinforcement Learning”中有限状态空间的Grid World的相关实验，重点是如何将非线性规划模型转化为线性规划模型。环境模型首先，构造环境模型Gridworld，代码如下所示：import numpy as npimport randomimport copyclass MyGirdWorld(object): size = 5 reward_grid = np.zero

2021-07-27 16:32:59 1848

原创逆强化学习论文笔记 (一)

Algorithm for Inverse Reinforcement Learning摘要：这篇文章解决了马尔可夫决策过程中的逆强化学习问题，也就是，从一个给定被观察的、最优的行为中提取出reward function。IRL也许可以帮助apprenticeship learning获得熟练的行为，以及确定由自然系统优化的reward function。我们首先刻画给定最优策略的reward function的集合，然后我们推导出三个IRL的算法。前面两个算法解决知道entire policy的情形；我

2021-07-05 19:48:56 998

原创 ROS 系列教程（三）

用C++写一个简单的publisher和subscriberwriting the publisher nodeInitialize the ROS systemAdvertise that we are going to be publishing std_msgs/String messages on the chatter topic to the masterLoop while publishing messages to chatter 10 times a second.# in

2020-09-07 14:52:41 133

原创 ROS 系列教程（二）

理解ROS Topicsturtlesim_node和turtle_teleop_key 节点之间是通过一个ROS Topic进行交流的。turtle_teleop_key将键盘敲击发送到一个Topic上，turtlesim订阅相同的Topic来接收键盘敲击。让我们使用rqt_graph，它展示了当前正在运行着的node和Topic。使用rqt_graphrqtgraphrqt_graphrqtgraph creates a dynamic graph of what’s going on

2020-09-07 10:22:22 413

原创 ROS 系列教程（一）

文件系统概念Packages: Packages are the software organization unit of ROS code. Each package can contain libraries, executables, scripts, or other artifacts.Manifests (package.xml): A manifest is a description of a package. It serves to define dependencies bet

2020-09-01 20:45:40 289

lan_12138的博客

原创 Pytorch安装遇到的坑

原创 Driving Behavior Modeling Using Naturalistic Human Driving Data With Inverse Reinforcement Learning

原创最大熵逆强化学习

原创为什么要最大化熵？

原创逆强化学习经典算法复现(一)

原创逆强化学习论文笔记 (一)

原创 ROS 系列教程（三）

原创 ROS 系列教程（二）

原创 ROS 系列教程（一）

原创 C++ Primer Plus 处理数据笔记

原创算法导论：分治策略

原创 C primer 学习笔记（一）

转载 Java 反射机制笔记

原创算法设计系列笔记（一）

空空如也

空空如也