- 博客(1)
- 资源 (1)
- 收藏
- 关注
原创 bloom filter 关键点 记录
bloom filter 简述: 目标集合A,待测试集合B,对于B中的每一条记录,判断是否属于集合A。 首先对A中集合每条记录取特征值,然后保存所有特征值到集合C。 对B中的每一条记录,使用相同的方法取特征值,判断特征值是否存在于C中。 在Hadoop in Action书中给了个例子。 对其中关键点进行说明: 获取特征值的说明: 使用md5算法,获取特征值, MessageDig
2015-10-10 11:06:14 134
elasticsearch 书籍 mastering elasticsearch
Welcome to the world of ElasticSearch and to the Mastering ElasticSearch book.
While reading the book you'll be taken through different topics, all connected
to ElasticSearch. We will start with the introduction to Apache Lucene and
ElasticSearch, because even if you are familiar with it, it is crucial to have the
background in order to fully understand what is going on when you form a
cluster, send a document for indexing, or make a query.
You will learn how Apache Lucene scoring works, how to influence it, and how
to tell ElasticSearch to choose different scoring algorithms. The book will show
you what query rewriting is and why it happens. Apart from that, you'll see how
to change your queries to leverage ElasticSearch caching capabilities and make
maximum use of it.
After that we will focus on index control. We will learn the way to change how
index fields are written, by using different posting formats. We will discuss
segments merging, why it is important, and how to adjust it when there is a need.
We'll take a deeper look at shard allocation mechanism and routing, and finally
we'll learn what to do when data and query number grows.
The book can't omit garbage collector description—how it works and where to start
and when you need to tune its behavior. In addition to that, it covers functionalities
that allow us to troubleshoot ElasticSearch, such as describing how segments merging
works, how to see what ElasticSearch does beneath its high-level interface, and how to
limit the I/O operations. But the book doesn't only pay attention to low-level aspects
of ElasticSearch; it includes user search experience improvements tips, such as dealing
with spelling mistakes, highly effective autocomplete feature, and a tutorial on how
you can deal with query related improvements.
In addition to this, the book you are holding will guide you through ElasticSearch Java
API, showing how to use it, not only when it comes to CRUD operations but also when
it comes to cluster and indices maintenance and manipulation. Finally, we will take
a deep look at ElasticSearch extensions by developing a custom river plugin for data
indexing and a custom analysis plugin for data analysis during query and index time.
2016-03-19
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人