Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee, Learning Spark: Lightning-Fast Data Analytics, 2nd edition, 2020, O’Reilly Media. (code)
We welcome you to the second edition of Learning Spark. It’s been five years since the first edition was published in 2015, originally authored by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Over the years since its first 1.x release, Spark has become the de facto big data unified processing engine. Along the way, it has extended its scope to include support for various analytic workloads. Our intent is to capture and curate this evolution for readers, showing not only how you can use Spark but how it fits into the new era of big data and machine learning. Hence, we have designed each chapter to build progressively on the foundations laid by the previous chapters, ensuring that the content is suited for our intended audience....
Most of the examples in the chapters are written in Scala, Python, and SQL. Where necessary, we have infused a bit of Java.
The ebook is available for download once you fill in your information
at Databrick.