自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(3)
  • 资源 (3)
  • 收藏
  • 关注

转载 4.2 英文分词及词性标注

转载自:https://datartisan.gitbooks.io/begining-text-mining-with-python/content/%E7%AC%AC4%E7%AB%A0%20%E5%88%86%E8%AF%8D%E4%B8%8E%E8%AF%8D%E6%80%A7%E6%A0%87%E6%B3%A8/4.2%20%E8%8B%B1%E6%96%87%E5%88%86%E8%A...

2018-02-22 11:36:03 17098

原创 【英文分词】英文分词处理中遇到的问题

之前一直和中文的自然语言处理打交道,最近因为工作关系需要进行英文相关的自然语言处理,本以为相对较难中文的NLP都搞的定,英文的还不是手到擒来:-) 事实证明不同语系的NLP之间存在着较大的差异。中文虽然和拉丁语系的语音相比更为繁琐,由拼音组成字音,再有字组成词语,词语组成句子文章等等等。但英语等拉丁语系语种以字母组成单词,而单词直接组成句子文章。虽然看上去好像省略了中文中字到词的一部,但并不是简单...

2018-02-22 11:34:14 2982 3

原创 Docker 常用命令记录

docker build it POSTS NAMES local_dirdocker build -t docker-registry.journeyend.com/journeyend/test:1.0 .创建一个新的镜像,将需要安装和配置运行的命令打包,存储在本地的docker仓库中docker push PORTS NAEMSdocker push docker-registry.jour...

2018-02-22 11:19:09 458

End-to-End Task-Completion Neural Dialogue Systems

One of the major drawbacks of modu- larized task-completion dialogue systems is that each module is trained individu- ally, which presents several challenges. For example, downstream modules are af- fected by earlier modules, and the per- formance of the entire system is not ro- bust to the accumulated errors. This pa- per presents a novel end-to-end learning framework for task-completion dialogue systems to tackle such issues. Our neu- ral dialogue system can directly interact with a structured database to assist users in accessing information and accomplish- ing certain tasks. The reinforcement learn- ing based dialogue manager offers robust capabilities to handle noises caused by other components of the dialogue system. Our experiments in a movie-ticket book- ing domain show that our end-to-end sys- tem not only outperforms modularized di- alogue system baselines for both objective and subjective evaluation, but also is ro- bust to noises as demonstrated by several systematic experiments with different er- ror granularity and rates specific to the lan- guage understanding module1.

2018-09-10

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.

2018-09-10

HMM-based Script Identification for OCR

HMM-based Script Identification for OCR While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system used f

2018-09-10

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除