自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(0)
  • 资源 (39)
  • 收藏
  • 关注

空空如也

Pregel: A System for Large-Scale Graph Processing

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex- centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thou- sands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution- related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

2015-04-17

OPTICS: Ordering Points To Identify the Clustering Structure

OPTICS: Ordering Points To Identify the Clustering Structure

2015-04-11

data cube 2

Data Mining: Concepts and Techniques - Data Cube Technology

2015-04-10

Data Cubes

Advanced Data Management - Data Cubes

2015-04-10

Clustering Analysis

Data Mining: Concepts and Techniques, Clustering Analysis

2015-04-08

Aurora: a new model and architecture for data stream management

This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing.The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.

2015-04-08

F1: A Distributed SQL Database That Scales

F1 is a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design. F1 also includes a fully functional distributed SQL query engine and automatic change tracking and publishing.

2015-04-02

Weka(http://blog.csdn.net/quantum_bit/article/details/44665555)

使用Weka语料库时需要用到的源文件,详情:http://blog.csdn.net/quantum_bit/article/details/44665555

2015-03-28

原始数据(http://blog.csdn.net/quantum_bit/article/details/44665555)

Predict if the car purchased at the Auction is a good bad buy The dependent variable IsBadBuy is binary C2 There are 32 Independent variables C3 C34

2015-03-27

paper: Generalized Search Trees for Database Systems

This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+-trees and R-trees in a single piece of code, and opening the application of search trees to general extensibility. To illustrate the flexibility of the GiST, we provide simple method implementations that allow it to behave like a B+-tree, an R-tree, and an RD-tree, a new index for data with set-valued attributes. We also present a preliminary performance analysis of RD-trees, which leads to discussion on the nature of tree indices and how they behave for various datasets.

2015-03-26

Access Path Selection in a Relational Database Management System

In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research Laboratory.

2015-03-16

Algorithm: recursion

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign)算法课(CS374)讲义,主要讲述递归(recursion),由Jeff Erickson撰写

2015-03-10

Concurrency Control and Recovery

Introduction Many service-oriented businesses and organizations, such as banks, airlines, catalog retailers, hospitals, etc. have grown to depend on fast, reliable, and correct access to their "mission-critical" data on a constant basis. In many cases, particularly for global enterprises, 7x24 access is required; that is, the data must be available seven days a week, twenty-four hours a day. Data Base Management Systems (DBMS) are often employed to meet these stringent performance, availability, and reliability demands. As a result, two of the core functions of a DBMS are: 1) to protect the data stored in the database and 2) to provide correct and highly available access to that data in the presence of concurrent access by large and diverse user populations, and despite various software and hardware failures. The responsibility for these functions resides in the concurrency control and recovery components of the DBMS software.

2015-03-09

A Comparison of Approaches to Large-Scale Data Analysis

There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system’s performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.

2015-03-08

Data Cube: A Relational Aggregation Operator

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value", ALL. For example, the point (ALL,ALL,ALL,...,ALL, sum(*)) would represent the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

2015-03-05

Mining Heterogeneous Information Networks

Mining Heterogeneous Information Networks: Principles and Methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery

2015-03-03

RankClus论文

信息网络聚类分析算法论文,作者是领域专家韩家炜。

2015-03-02

SimRank论文

信息网络聚类分析算法论文。出自斯坦福大学。

2015-03-02

Bigtable: A Distributed Storage System for Structured Data

Bigtable是一个存储结构化数据的分布式存储系统,容量可以扩展到PB。很多谷歌项目都是用Bigtable存储数据,比如网页检索,谷歌地球, 和谷歌金融。不同应用对Bigtable有不同的要求,例如数据量和延迟率。但不管是什么样的要求,Bigtable都很好地给谷歌产品提供了灵活和高性能能的解决方案。这篇论文描述了Bigtable里的数据模型,设计,已经实现。

2015-03-01

R-trees: a dynamic index structure for spatial searching

一篇著名的数据库数据检索论文,由加州伯克利大学的Antonin Guttman撰写。 论文索引: 在计算机辅助设计跟空间数据应用中,为了能高效处理空间数据,数据库系统需要一个检索功能从而根据空间位置来迅速抓取数据。但是,传统的检索方法不适合用于多维空间中的有限大小的数据对象。在这篇论文中我们设计了一个叫做R-树的动态检索结构,这种设计满足了我们新的需求并且包含了搜索跟更新数据的算法。通过一系列的测试我们发现这种数据结构非常高效并认为这种结构适合于当前的空间数据应用数据库系统。

2015-02-26

谷歌F1分布式数据库介绍

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign)高等数据管理课讲义,本篇介绍谷歌的F1分布式数据库,相关论文:http://download.csdn.net/detail/quantum_bit/8556099

2015-09-22

淘宝软件基础设施实践-拥抱开源

开源力量公开课第20期庆典 - "拥抱开源,企业IT自主之路" ,淘宝软件基础设施实践

2015-09-22

短语挖掘

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign),韩家炜数据挖掘课讲义,本篇主要讲述各种短语挖掘算法

2015-09-22

数据挖掘:网络概念与网络建模

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign),韩家炜在数据挖掘课上使用的讲义。主要介绍数据网络概念与网络模型

2015-09-22

信息网络分析

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign),韩家炜数据挖掘课讲义,本篇主要讲述各种网络分析算法

2015-09-22

r-tree

伊利诺伊大学厄本那香槟分校(University of Illinois at Urbana-Champaign)高等数据管理课,R-树课堂幻灯片。上课内容基于R-树论文:http://download.csdn.net/detail/quantum_bit/8458419

2015-09-22

Naiad: A Timely Dataflow System(presenation)

Naiad: A Timely Dataflow System(presenation)

2015-04-29

Naiad: A Timely Dataflow System

Naiad: A Timely Dataflow System

2015-04-29

Pig Latin: A Not-So-Foreign Language for Data Processing

Pig Latin: A Not-So-Foreign Language for Data Processing

2015-04-29

Advanced Data Management: Aurora and Stream Processing

Advanced Data Management: Aurora and Stream Processing

2015-04-29

Advanced Data Management: NewSQL systems and F1

Advanced Data Management: NewSQL systems and F1

2015-04-29

Advanced Data Management: mapreduce

Advanced Data Management: mapreduce

2015-04-27

Advanced Data Management: data cube

Advanced Data Management: data cube

2015-04-27

Implementing data cube efficiently

Implementing data cube efficiently

2015-04-27

Transactions: Concurrency Control and Recovery: Optimist concurrency control

Transactions: Concurrency Control and Recovery: Optimist concurrency control

2015-04-27

On Optimistic Methods for Concurrency Control

On Optimistic Methods for Concurrency Control

2015-04-27

Transactions: Concurrency Control and Recovery

Transactions: Concurrency Control and Recovery

2015-04-27

Indexing: GiST

Advanced Data Management Indexing: GiSTs

2015-04-26

Indexing: R-Trees

CS511Advanced Data Management, Indexing: R-Trees

2015-04-25

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除