自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(1)
  • 资源 (17)
  • 收藏
  • 关注

原创 反向传播算法

反向传播算法约定首先我们对数学符号做一些约定。 粗体的小写字母表示列向量,如x,xi,δ\renewcommand{\vec}[1]{\boldsymbol{#1}} \vec{x}, \vec{x}^i, \vec{\delta}等; 粗体的大写字母表示矩阵,如A,W,Δ\vec{A},\vec{W}, \vec{\Delta}等; 常规字体表示标量或函数,如α,i,xjp,f\alpha, i

2015-10-21 10:49:41 373

Graph-based Natural Language Processing and Information Retrieval

Graphs are ubiquitous. There is hardly any domain in which objects and their relations cannot be intuitively represented as nodes and edges in a graph. Graph theory is a well-studied sub-discipline of mathematics, with a large body of results and a large number of efficient algorithms that operate on graphs. Like many other disciplines, the fields of natural language processing (NLP) and information retrieval (IR) also deal with data that can be represented as a graph. In this light, it is somewhat surprising that only in recent years the applicability of graph-theoretical frameworks to language technology became apparent and increasingly found its way into publications in the field of computational linguistics. Using algorithms that take the overall graph structure of a problem into account, rather than characteristics of single objects or (unstructured) sets of objects, graph-based methods have been shown to improve a wide range of NLP tasks. In a short but comprehensive overview of the field of graph-based methods for NLP and IR, Rada Mihalcea and Dragomir Radev list an extensive number of techniques and examples from a wide range of research papers by a large number of authors. This book provides an excellent review of this research area, and serves both as an introduction and as a survey of current graph-based techniques in NLP and IR. Because the few existing surveys in this field concentrate on particular aspects, such as graph clustering (Lancichinetti and Fortunato 2009) or IR (Liu 2006), a textbook on the topic was very much needed and this book surely fills this gap. The book is organized in four parts and contains a total of nine chapters. The first part gives an introduction to notions of graph theory, and the second part covers natural and random networks. The third part is devoted to graph-based IR, and part IV covers graph-based NLP. Chapter 1 lays the groundwork for the remainder of the book by introducing all necessary concepts in graph theory, including the notation, graph properties, and graph representations. In the second chapter, a glimpse is offered into the plethora of graph-based algorithms that have been developed independently of applications in NLP and IR. Sacrificing depth for breadth, this chapter does a great job in touching on a wide variety of methods, including minimum spanning trees, shortest-path algorithms, cuts and flows, subgraph matching, dimensionality reduction, random walks, spreading activation, and more. Algorithms are explained concisely, using examples, pseudo-code, and/or illustrations, some of which are very well suited for classroom examples. Network theory is presented in Chapter 3. The term network is here used to refer to naturally occurring relations, as opposed to graphs being generated by an automated process. After presenting the classical Erdo ̋s-Re ́nyi random graph model and showing its inadequacy to model power-law degree distri- butions following Zipf’s law, scale-free small-world networks are introduced. Further, several centrality measures, as well as other topics in network theory, are defined and exemplified. Establishing the connection to NLP, Chapter 4 introduces networks constructed from natural language. Co-occurrence networks and syntactic dependency networks are examined quantitatively. Results on the structure of semantic networks such as WordNet are presented, as well as a range of similarity networks between lexical units. This chapter will surely inspire the reader to watch out for networks in his/her own data. Chapter 5 turns to link analysis for the Web. The PageRank algorithm is de- scribed at length, variants for undirected and weighted graphs are introduced, and the algorithm’s application to topic-sensitive analysis and query-dependent link analysis is discussed. This chapter is the only one that touches on core IR, and this is also the only chapter with content that can be found in other textbooks (e.g., Liu 2011). Still, this chapter is an important prerequisite for the chapter on applications. It would have been possible to move the description of the algorithms to Chapter 2, however, omitting this part. The topic of Chapter 6 is text clustering with graph-based methods, outlining the Fiedler method, the Kernighan–Lin method, min-cut clustering, betweenness, and random walk clustering. After defining measures on cluster quality for graphs, spectral and non-spectral graph clustering methods are briefly introduced. Most of the chapter is to be understood as a presentation of general graph clustering methods rather than their application to language. For this, some representative methods for different core ideas were selected. Part IV on graph-based NLP contains the chapters probably most interesting to readers working in computational linguistics. In Chapter 7, graph-based methods for lexical semantics are presented, including detection of semantic classes, synonym detection using random walks on semantic networks, semantic distance on WordNet, and textual entailment using graph matching. Methods for word sense and name disambiguation with graph clustering and random walks are described. The chap- ter closes with graph-based methods for sentiment lexicon construction and subjectivity classification. Graph-based methods for syntactic processing are presented in Chapter 8: an unsupervised part-of-speech tagging algorithm based on graph clustering, minimum spanning trees for dependency parsing, PP-attachment with random walks over syn- tactic co-occurrence graphs, and coreference resolution with graph cuts. In the final chapter, many of the algorithms introduced in the previous chapters are applied to NLP applications as diverse as summarization, passage retrieval, keyword extraction, topic identification and segmentation, discourse, machine translation, cross-language IR, term weighting, and question answering. As someone with a background in graph-based NLP, I enjoyed reading this book. The writing style is concise and clear, and the authors succeed in conveying the most important points from an incredibly large number of works, viewed from the graph- based perspective. I also liked the extensive use of examples—throughout, almost half of the space is used for figures and tables illustrating the methods, which some readers might perceive as unbalanced, however. With just under 200 pages and a topic as broad as this, it necessarily follows that many of the presented methods are exemplified and touched upon rather than discussed in great detail. Although this sometimes leads to the situation that some passages can only be understood with background knowledge, it is noteworthy that every chapter includes a section on further reading. In this way, the book serves as an entry point to a deeper engagement with graph-based methods for NLP and IR, and it encourages readers to see their NLP problem from a graph-based view. For a future edition, however, I have a few wishes: It would be nice if the figures and examples were less detached from the text and explained more thoroughly. At times, it would be helpful to present deeper insights and to connect the methodologies, rather than just presenting them next to each other. Also, some of the definitions in Chapter 2 could be less confusing and structured better. Because this book emphasizes graph-based aspects for language processing rather than aiming at exhaustively treating the numerous tasks that benefit from graph-based methods, it cannot replace a general introduction to NLP or IR: For students without prior knowledge in NLP and IR, a more guided and focused approach to the topic would be required. The target audience is, rather, NLP researchers and professionals who want to add the graph-based view to their arsenal of methods, and to become inspired by this rapidly growing research area. It is equally suited for people working in graph algorithms to learn about graphs in language as a field of application for their work. I will surely consult this volume in the future to supplement the preparation of lectures because of its comprehensive references and its richness in examples.

2014-02-22

算法作业(中国科学院研究生院,算法设计与分析)

中科院研究生院算法设计与分析课程的部分课后作业。

2012-03-18

图像复原程序(C和Matlab混合mex编程)

总变分(TV) 图像复原模型的C语言源码. 使用C语言与Matlab 混合编程. 主要计算由C语言完成. 图像的读, 输出由Matlab实现. 包含梯度下降流法, Chambolle对偶法, 交替半二次型算法, 线性固定点算法等

2011-05-20

C++注释转换为C语言的注释

将C++的单行注释改成C语言的注释. 在Windows 下用VC2005环境写程序的时候, 有C语言写的程序, 但是用了C++的注释, 也能成功编译连接运行. 但发现也有很多编译器不支持C++的单行注释. 又不想手机地改所有的代码. 所以写了一个程序来自动将C++的单行注释替换成C语言的注释格式. 压缩包中有两个文件. 其中 to_c_style_comment.c 中的我程序文件. cpptest.c 只是一个测试文件. 是一个包含C++单行注释的c源程序, 仅供测试用. 编译方法: gcc -o to_c_style_comment to_c_style_comment.c 测试方法: ./to_c_style_comment cpptest.c 或 ./to_c_style_comment cpptest.c outputfile

2011-05-20

Advanced Bash-Scripting Guide 中英版及源码

Advanced Bash-Scripting Guide 的中英文及源码,格式为Html和PDF,源码为文本格式(后缀为.sh,分文件存放)。

2010-10-15

二进制、十进制、十六进制相互转换的Java程序

GUI 界面 一共有三个文本框,一个退出按钮 从任一个文本框中输入对应进制的整数,另两个文本框将这个文本框中的整数自动地转换成另对应的进制并显示。

2010-05-15

C++ GUI Programming with Qt 4, Second Edition

学习 Qt 的最好方式是阅读官方 Qt 书籍 C++ GUI Programming with Qt 4, Second Edition (ISBN 0-13-235416-0)。本书从 "Hello Qt" 到高级功能(如多线程、2D 和 3D 图形、网络、内容视图类与 XML),全面详实地说明了 Qt 编程。

2010-04-01

C++ Effective STL

学习C++ STL不错的选择.这是英文版的.PDF格式

2009-12-30

make 中文手册(内含makefile的语法)

1——5章 本书比较完整的讲述GNU make工具,涵盖GNU make的用法、语法。同时重点讨论如何为一个工程编写Makefile。作为一个Linux程序员,make工具的使用以及编写Makefile是必需的。系统、详细讲述make的中文资料比较少,出于对广大中文Linuxer的支持,本人在工作之余,花了18 个多月时间完成对“info make”的翻译整理,完成这个中文版手册。本书不是一个纯粹的语言翻译版本,其中对GNU make的一些语法和用法根据我个人的工作经验 进行了一些详细分析和说明,也加入了一些个人的观点和实践总结。本书的所有的例子都可以在支持V3.8版本的GNU make的系统中正确执行。

2009-11-23

Matlab 数学手册

1.它是以简明方法写就的一本易于掌握的数学手册; 2.编写逻辑性强,内容由浅入深,对于初学者能很快掌握MATLAB的用法; 3.易于查找命令和问题,给读者灵感与启迪,以解决实际问题; 4.对每一条命令,都进行了详细论述; 5.对于每一条命令,几乎都有易懂的实例; 6.内容按数学分类进行描述。

2009-09-26

Matlab 命令大全

收集了常用的Matlab命令,.xls文件,便于查找

2009-09-26

用LaTeX排版漂亮的学位论文

Word,66KB,一直觉得有必要写这样一篇文章,因为学位论文从格式上说更像一本书,与文章的排版不同,不仅多出目录等文章没有的部分,而且一般要设置页眉页脚方便阅读查找。学校有时会提出具体的格式要求,虽然复旦的要求非常简单,而且事实上并不严格执行,但自己的论文毕竟是自己的孩子,还是要敝帚自珍的,大家都希望做得漂亮一点。

2009-09-26

LaTeX排版教程(自认为是很不错的教程,PDF格式)

An excellent LaTeX manual

2009-09-26

Ackermann函数的非递归算法

function of ackermann

2009-09-23

数学建模论文模版(LaTeX版)

Paper writing in LaTeX

2009-09-22

C语言函数大全(语法着色版)

Examples for C functions

2009-09-22

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除