Apache Spark
SQL Commands
Execute SQL queries with spark-sql CLI. Connect to Hive metastore, query data lakes, and perform distributed SQL analytics from the command line.
7 commands
Pro Tips
Use 'SET spark.sql.shuffle.partitions=200' to tune shuffle parallelism for your data size.
Enable Hive support with '--conf spark.sql.catalogImplementation=hive'.
Use EXPLAIN to understand query plans before running expensive operations.
Common Mistakes
SELECT * without LIMIT on large tables can crash the driver or take hours.
Hive compatibility mode may have different semantics than ANSI SQL.