- 博客(0)
- 资源 (2)
- 收藏
- 关注
learning-pyspark.pdf
Learning pyspark
It is estimated that in 2013 the whole world produced around 4.4 zettabytes of data;
that is, 4.4 billion terabytes! By 2020, we (as the human race) are expected to produce
ten times that. With data getting larger literally by the second, and given the growing
appetite for making sense out of it, in 2004 Google employees Jeffrey Dean and
Sanjay Ghemawat published the seminal paper MapReduce: Simplified Data Processing
on Large Clusters. Since then, technologies leveraging the concept started growing
very quickly with Apache Hadoop initially being the most popular. It ultimately
created a Hadoop ecosystem that included abstraction layers such as Pig, Hive,
and Mahout – all leveraging this simple concept of map and reduce.
2019-05-27
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人