parker_ace-CSDN博客

原创 bloom filter 关键点记录

bloom filter 简述：目标集合A，待测试集合B，对于B中的每一条记录，判断是否属于集合A。首先对A中集合每条记录取特征值，然后保存所有特征值到集合C。对B中的每一条记录，使用相同的方法取特征值，判断特征值是否存在于C中。在Hadoop in Action书中给了个例子。对其中关键点进行说明：获取特征值的说明：使用md5算法，获取特征值， MessageDig

2015-10-10 11:06:14 134

elasticsearch 书籍 mastering elasticsearch

Welcome to the world of ElasticSearch and to the Mastering ElasticSearch book. While reading the book you'll be taken through different topics, all connected to ElasticSearch. We will start with the introduction to Apache Lucene and ElasticSearch, because even if you are familiar with it, it is crucial to have the background in order to fully understand what is going on when you form a cluster, send a document for indexing, or make a query. You will learn how Apache Lucene scoring works, how to influence it, and how to tell ElasticSearch to choose different scoring algorithms. The book will show you what query rewriting is and why it happens. Apart from that, you'll see how to change your queries to leverage ElasticSearch caching capabilities and make maximum use of it. After that we will focus on index control. We will learn the way to change how index fields are written, by using different posting formats. We will discuss segments merging, why it is important, and how to adjust it when there is a need. We'll take a deeper look at shard allocation mechanism and routing, and finally we'll learn what to do when data and query number grows. The book can't omit garbage collector description—how it works and where to start and when you need to tune its behavior. In addition to that, it covers functionalities that allow us to troubleshoot ElasticSearch, such as describing how segments merging works, how to see what ElasticSearch does beneath its high-level interface, and how to limit the I/O operations. But the book doesn't only pay attention to low-level aspects of ElasticSearch; it includes user search experience improvements tips, such as dealing with spelling mistakes, highly effective autocomplete feature, and a tutorial on how you can deal with query related improvements. In addition to this, the book you are holding will guide you through ElasticSearch Java API, showing how to use it, not only when it comes to CRUD operations but also when it comes to cluster and indices maintenance and manipulation. Finally, we will take a deep look at ElasticSearch extensions by developing a custom river plugin for data indexing and a custom analysis plugin for data analysis during query and index time.

2016-03-19

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

原创 bloom filter 关键点 记录

elasticsearch 书籍 mastering elasticsearch

空空如也

原创 bloom filter 关键点记录