Apache Spark
Shell Commands
Use interactive shells for Spark development. Access spark-shell (Scala), pyspark (Python), and sparkR for exploratory data analysis and prototyping.
7 commands
Pro Tips
Use ':paste' mode in spark-shell for multi-line code blocks.
Set 'PYSPARK_DRIVER_PYTHON=jupyter' to launch PySpark with Jupyter notebook.
SparkContext is pre-configured as 'sc' and SparkSession as 'spark' in shells.
Common Mistakes
Shell sessions consume cluster resources - remember to exit when done.
Large collect() operations in shell can crash driver with OutOfMemoryError.