自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(0)
  • 资源 (39)
  • 收藏
  • 关注

空空如也

Pro Spark Streaming

本书介绍了如何使用 Spark Streaming 开发应用程序以及一些最佳实践。适合数据科学家、大数据专家、BI分析以及数据架构师阅读。

2016-12-18

Cloudera-Hive

Hive data warehouse software enables reading, writing, and managing large datasets in distributed storage. Using the Hive query language (HiveQL), which is very similar to SQL, queries are converted into a series of jobs that execute on a Hadoop cluster through MapReduce or Apache Spark.

2016-12-16

Cloudera-Spark

Apache Spark is a general framework for distributed computing that offers high performance for both batch and interactive processing. It exposes APIs for Java, Python, and Scala and consists of Spark core and several related projects。

2016-12-16

Mastering Apache Spark

Apache Spark notes

2016-12-11

Advanced Analytics with Spark

Ever since we started the Spark project at Berkeley, I’ve been excited about not just building fast parallel systems, but helping more and more people make use of largescale computing. This is why I’m very happy to see this book, written by four experts in data science, on advanced analytics with Spark. Sandy, Uri, Sean, and Josh have been working with Spark for a while, and have put together a great collection of con‐ tent with equal parts explanations and examples. The thing I like most about this book is its focus on examples, which are all drawn from real applications on real-world data sets. It’s hard to find one, let alone ten examples that cover big data and that you can run on your laptop, but the authors have managed to create such a collection and set everything up so you can run them in Spark. Moreover, the authors cover not just the core algorithms, but the intricacies of data preparation and model tuning that are needed to really get good results. You should be able to take the concepts in these examples and directly apply them to your own problems. Big data processing is undoubtedly one of the most exciting areas in computing today, and remains an area of fast evolution and introduction of new ideas. I hope that this book helps you get started in this exciting new field.

2016-12-11

how to make mistakes in python

How to make mistakes in Python

2016-12-11

Cloudera Impala

Cloudera Impala is an open source project that is opening up the Apache Hadoop software stack to a wide audience of database analysts, users, and developers. The Impala massively parallel processing (MPP) engine makes SQL queries of Hadoop data simple enough to be accessible to analysts familiar with SQL and to users of business intelligence tools, and it’s fast enough to be used for interactive explo‐ ration and experimentation.

2016-12-11

Spark2.0 For Beginners

Develop large-scale distributed data processing applications using Spark 2 in Scala and Python

2016-12-11

Python高手之路

这不是一本常规意义上Python的入门书。这本书中既没有Python关键字和for循环的使用,也没有细致入微的标准库介绍,而是完全从实战的角度出发,对构建一个完整的Python应用所需掌握的知识进行了系统而完整的介绍。更为难得的是,本书的作者是开源项目OpenStack的PTL(项目技术负责人)之一,因此本书结合了Python在OpenStack中的应用进行讲解,非常具有实战指导意义。

2016-12-11

Spark Cook book

Over 60 recipes on Spark, covering Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX libraries

2016-12-11

Apress Text Analytics with Python

A Practical Real-World Approach to Gaining Actionable Insights from Your Data

2016-12-11

Python Parallel Programming Cookbook

Master effcient parallel programming to build powerful applications using Python

2016-12-11

Spark for Python Developers

Spark for Python Developers aims to combine the elegance and flexibility of Python with the power and versatility of Apache Spark. Spark is written in Scala and runs on the Java virtual machine. It is nevertheless polyglot and offers bindings and APIs for Java, Scala, Python, and R. Python is a well-designed language with an extensive set of specialized libraries. This book looks at PySpark within the PyData ecosystem. Some of the prominent PyData libraries include Pandas, Blaze, Scikit-Learn, Matplotlib, Seaborn, and Bokeh. These libraries are open source. They are developed, used, and maintained by the data scientist and Python developers community. PySpark integrates well with the PyData ecosystem, as endorsed by the Anaconda Python distribution. The book puts forward a journey to build data-intensive apps along with an architectural blueprint that covers the following steps: frst, set up the base infrastructure with Spark. Second, acquire, collect, process, and store the data. Third, gain insights from the collected data. Fourth, stream live data and process it in real time. Finally, visualize the information.

2016-12-11

Apache Kudu (incubating) User Guide

Apache Kudu (incubating) is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: it runs on commodity hardware, is horizontally scalable, and supports highly available operation.

2016-12-11

Impala Guide

Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS, HBase, or the Amazon Simple Storage Service (S3). In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Impala query UI in Hue) as Apache Hive. This provides a familiar and unified platform for real-time or batch-oriented queries.

2016-12-11

SparkForDataScience

Spark For Data Science

2016-12-04

Machine Learning in Python-Essential Techniques for Predictive Analysis

Machine Learning in Python-Essential Techniques for Predictive Analysis

2016-12-04

Hadoop with Python

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.

2016-10-24

Spark for Data Science

Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0

2016-10-12

learning python design patterns

2016-10-12

The Hitchhiker's Guide to Python

The Hitchhiker's Guide to Python完整清晰版The Hitchhiker's Guide to Python完整清晰版

2018-09-21

SQL Cookbook 中文清晰版带标签

许多人以一种马马虎虎的态度在使用SQL,根本没有意识到自己掌握着多么强大的武器。本书的目的是打开读者的视野,看看SQL究竟能干什么.

2018-09-20

Python High Performance(第二版带目录)

Python High Performance(第二版带书签很清晰英文无水印)

2018-09-20

Deep Learning for Computer Vision with Python

学习计算机视觉的深度学习不可错过的好书,原价70美元。高清最全版。

2018-09-20

Feature Engineering for Machine Learning

Feature Engineering for Machine Learning Principles and Techniques for Data Scientists

2018-09-20

Machine Learning for the Web Explore the web and make smarter predictions

Data science and machine learning in particular are emerging as leading topics in the tech commercial environment to evaluate the always increasing amount of data generated by the users. This book will explain how to use Python to develop a web commercial application using Django and how to employ some specific libraries (sklearn, scipy, nltk, Django, and some others) to manipulate and analyze (through machine learning techniques) data that is generated or used in the application

2018-09-19

Python Data Science Handbook

版权归作者所有,任何形式转载请联系作者。 作者:Tommy(来自豆瓣) 来源:https://book.douban.com/review/8367790/ 本书内容对应的 Jupyter notebook 放在 GitHub 上。 https://github.com/jakevdp/PythonDataScienceHandbook

2018-09-19

Python for Data Analysis 2nd

用python进行数据分析最新英文版。对数据分析入门很有帮助。

2018-09-19

Deep Learning with Python

深度学习四大金刚之一,浅显易懂。值得深度学习爱好者入门。喜欢看英文原版的同学们可以看看。

2018-09-19

Hands On Machine Learning with Scikit Learn and TensorFlow

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

2017-09-12

RedHat6.5百度云下载链接

RedHat6.5百度云下载链接

2016-12-28

RedHat9百度云下载链接

RedHat9百度云下载链接

2016-12-28

RedHat7.3百度云下载链接

RedHat7.3百度云下载链接

2016-12-27

RedHat7.0百度云下载链接

RedHat7.0百度云下载链接

2016-12-27

Flask Framework Cookbook

Over 80 hands-on recipes to help you create small-to-large web applications using Flask

2016-12-16

Web Development with Django Cookbook, 2nd Edition

Over 90 practical recipes to help you create scalable websites using the Django 1.8 framework.

2016-12-16

Functional Python Programming

Create succinct and expressive implementations with functional programming in Python

2016-12-16

Fast Data Processing with Spark 2 Third Edition

Learn how to use Spark to process big data at speed and scale for sharper analytics. Put the principles into practice for faster, slicker big data projects

2016-12-16

Cloudera Data Management

This guide describes how to perform data management using Cloudera Navigator. Data management activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects。

2016-12-16

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除