Transformation and Actions
- Nothing Happens when doing transformations: map, flatmap, filter, distinct
- Start when doing actions: collect, count, take, reduce, foreach
Save time Example: laziness
By laziness, spark has time to do the optimization. Spark leverages this by analyzing the chain of operations before executing.
Evaluation in Spark: Unlike Scala Collections !!
- Eager and Lazy
- Actions and Transformations
- In-memory iteration
RDDs are recomputed each time you run an action on them.
Cache and persist
Use persist() or cache() to cache an RDD in memory after it takes actions (like reduce())