Spark Learning

Spark

Posted by PZ on March 8, 2020

Transformation and Actions

  • Nothing Happens when doing transformations: map, flatmap, filter, distinct
  • Start when doing actions: collect, count, take, reduce, foreach

img

Save time Example: laziness

By laziness, spark has time to do the optimization. Spark leverages this by analyzing the chain of operations before executing.

img

Evaluation in Spark: Unlike Scala Collections !!

  • Eager and Lazy
  • Actions and Transformations
  • In-memory iteration

RDDs are recomputed each time you run an action on them.

img

img

Cache and persist

Use persist() or cache() to cache an RDD in memory after it takes actions (like reduce())

img

img

How spark jobs are executed

img

img

img