Apache Spark
Apache Spark commands for distributed data processing, SQL queries, streaming, and large-scale analytics.
42 commands
Browse by Topic
Install Apache Spark (macOS)
Install Apache Spark using Homebrew on macOS
Check Spark version
Display the installed Apache Spark version
Launch PySpark shell
Start an interactive PySpark shell for running Spark with Python
Submit Python application
Submit a PySpark application to the cluster.
Submit with master
Submit to a specific Spark standalone master.
Submit to YARN cluster
Submit to YARN with driver running on cluster.
Configure executor resources
Set executor memory, cores, and count.
Submit JAR application
Submit Scala/Java application with main class.
Add dependencies
Include Maven packages as dependencies.
Add Python files
Include additional Python files or archives.
Enable dynamic allocation
Auto-scale executors based on workload.
Submit to Kubernetes
Submit to Kubernetes cluster.
Start Scala shell
Launch interactive Scala REPL with Spark context.
Start PySpark shell
Launch interactive Python shell with Spark context.
PySpark with Jupyter
Launch PySpark with Jupyter notebook interface.
Shell with custom master
Connect shell to specific Spark master.
Shell with packages
Start shell with additional Maven dependencies.
Shell with memory config
Start shell with custom memory settings.
Start SparkR shell
Launch interactive R shell with Spark context.
Start Spark SQL CLI
Launch interactive SQL command line interface.
Execute SQL file
Execute SQL statements from a file.
Execute inline SQL
Execute a SQL statement directly.
Connect to Hive metastore
Enable Hive metastore support for table metadata.
Set database
Start CLI with specific database selected.
Enable adaptive query
Enable Adaptive Query Execution for optimization.
Set shuffle partitions
Configure number of shuffle partitions.
Set config via CLI
Pass configuration options via command line.
Use properties file
Load configuration from a properties file.
Enable Kryo serializer
Use faster Kryo serialization instead of Java.
Configure memory fraction
Tune execution vs storage memory balance.
Enable speculation
Enable speculative execution for straggler mitigation.
Set driver memory
Configure driver resource allocation.
Configure broadcast threshold
Set threshold for automatic broadcast joins.
Start standalone master
Start Spark standalone cluster master.
Start worker
Start worker and connect to master.
Start all workers
Start workers on all machines in conf/workers file.
Stop cluster
Stop master and all workers.
Submit with supervise
Enable automatic driver restart on failure.
Kill application
Kill a running application on standalone cluster.
YARN queue submission
Submit to specific YARN queue.
K8s with service account
Submit to Kubernetes with service account.
Check cluster status
Query Spark master REST API for cluster status.
Discussion
Loading comments...