- 博客(0)
- 资源 (8)
- 收藏
- 关注
Stochastic data acquisition for answering queries as time goes by
Data and actions are tightly coupled. On one hand, data analysis results trigger decision making and actions. On the other hand, the action of acquiring data is the very first step in the whole data processing pipeline. Data acquisition almost always has some costs, which could be either monetary costs or computing resource costs such as sensor battery power, network transfers, or I/O costs. Using out-dated data to answer queries can avoid the data acquisition costs, but there is a penalty of potentially inaccurate results. Given a sequence of incoming queries over time, we study the problem of sequential decision making on when to acquire data and when to use existing versions to answer each query. We propose two approaches to solve this problem using reinforcement learning and tailored locality-sensitive hashing. A systematic empirical study using two real-world datasets shows that our approaches are effective and efficient.
2018-02-01
Online windowed subsequence matching over probabilistic sequences
Windowed subsequence matching over deterministic strings has been studied in previous work in the contexts of knowledge discovery, data mining, and molecular biology. However, we observe that in these applications, as well as in data stream monitoring, complex event processing, and time series data processing in which streams can be mapped to strings, the strings are often noisy and probabilistic. We study this problem in the online setting where efficiency is paramount. We first formulate the query semantics, and propose an exact algorithm. Then we propose a randomized approximation algorithm that is faster and, in the mean time, provably accurate. Moreover, we devise a filtering algorithm to further enhance the efficiency with an optimization technique that is adaptive to sequence stream contents. Finally, we propose algorithms for patterns with negations. In order to verify the algorithms, we conduct a systematic empirical study using three real datasets and some synthetic datasets.
2018-02-01
计算机网络自顶向下方法(第6版 英文)
《计算机网络:自顶向下方法(原书第6版)》第6版继续保持了以前版本的特色,为计算机网络教学提供了一种新颖和与时俱进的方法,同时也进行了相当多的修订和更新:第1章更多地关注时下,更新了接入网的论述;第2章用python替代了java来介绍套接字编程;第3章补充了用于优化云服务性能的tcp分岔知识;第4章有关路由器体系结构的内容做了大量更新;第5章重新组织并新增了数据中心网络的内容;第6章更新了无线网络的内容以反映其最新进展;第7章进行了较大修订,深入讨论了流式视频,包括了适应性流和cdn的讨论;第8章进一步讨论了端点鉴别;等等。另外,书后习题也做了大量更新。
2017-09-06
数据挖掘导论ppt(英文版)
《数据挖掘导论》由人民邮电出版社出版,[美]作者Pang-Ning Tan,Michael Steinbach,Vipin Kumar 合著。该书全面介绍了数据挖掘,涵盖了五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前一章涵盖基本概念、代表性算法和评估技术,而后一章讨论高级概念和算法
2017-09-06
数据挖掘导论及课后习题答案(英文版)
《数据挖掘导论》由人民邮电出版社出版,[美]作者Pang-Ning Tan,Michael Steinbach,Vipin Kumar 合著。该书全面介绍了数据挖掘,涵盖了五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前一章涵盖基本概念、代表性算法和评估技术,而后一章讨论高级概念和算法。
2017-09-06
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人