journeyend-CSDN博客

转载 4.2 英文分词及词性标注

转载自：https://datartisan.gitbooks.io/begining-text-mining-with-python/content/%E7%AC%AC4%E7%AB%A0%20%E5%88%86%E8%AF%8D%E4%B8%8E%E8%AF%8D%E6%80%A7%E6%A0%87%E6%B3%A8/4.2%20%E8%8B%B1%E6%96%87%E5%88%86%E8%A...

2018-02-22 11:36:03 17098

原创【英文分词】英文分词处理中遇到的问题

之前一直和中文的自然语言处理打交道，最近因为工作关系需要进行英文相关的自然语言处理，本以为相对较难中文的NLP都搞的定，英文的还不是手到擒来:-) 事实证明不同语系的NLP之间存在着较大的差异。中文虽然和拉丁语系的语音相比更为繁琐，由拼音组成字音，再有字组成词语，词语组成句子文章等等等。但英语等拉丁语系语种以字母组成单词，而单词直接组成句子文章。虽然看上去好像省略了中文中字到词的一部，但并不是简单...

2018-02-22 11:34:14 2982 3

原创 Docker 常用命令记录

docker build it POSTS NAMES local_dirdocker build -t docker-registry.journeyend.com/journeyend/test:1.0 .创建一个新的镜像，将需要安装和配置运行的命令打包，存储在本地的docker仓库中docker push PORTS NAEMSdocker push docker-registry.jour...

2018-02-22 11:19:09 458

End-to-End Task-Completion Neural Dialogue Systems

One of the major drawbacks of modu- larized task-completion dialogue systems is that each module is trained individu- ally, which presents several challenges. For example, downstream modules are af- fected by earlier modules, and the per- formance of the entire system is not ro- bust to the accumulated errors. This pa- per presents a novel end-to-end learning framework for task-completion dialogue systems to tackle such issues. Our neu- ral dialogue system can directly interact with a structured database to assist users in accessing information and accomplish- ing certain tasks. The reinforcement learn- ing based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system. Our experiments in a movie-ticket book- ing domain show that our end-to-end sys- tem not only outperforms modularized di- alogue system baselines for both objective and subjective evaluation, but also is ro- bust to noises as demonstrated by several systematic experiments with different er- ror granularity and rates specific to the lan- guage understanding module1.

2018-09-10

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.

2018-09-10

HMM-based Script Identification for OCR

HMM-based Script Identification for OCR While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system used f

2018-09-10

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

转载 4.2 英文分词及词性标注

原创 【英文分词】英文分词处理中遇到的问题

原创 Docker 常用命令记录

End-to-End Task-Completion Neural Dialogue Systems

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

HMM-based Script Identification for OCR

空空如也

原创【英文分词】英文分词处理中遇到的问题