How to Install Spark on Ubuntu

In this article, we will cover the installation procedure of Apache Spark on the Ubuntu operating system. This article is divided into 4 parts.
  1. Install java on Ubuntu.
  2. Download the Spark package from the official website.
  3. Setting up the path variable.
  4. Verification of the installation.

System requirements:

  1. Ubuntu Installed (I am using ubuntu 18.04.4 LTS. )
  2. Minimum 4GB of RAM
  3. 10 GB of free space.

Install java

Spark need java for execution, So before installing spark we need java to be installed. Check if java is already installed.
java -version
If you see output like below then java is installed else type the below command on your terminal.
sudo apt-get install default-jdk

Download the spark package

To download the spark go to and choose your spark release and package and then click on download spark. Save it to /opt/spark (mkdir -p /opt/spark ) directory.

Since its a tar file . So we need to untar it using below command

tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
chmod -R 775 spark-2.4.5-bin-hadoop2.7

Setting up the path variable

To start spark everytime I need to go to /opt/spark/spark-2.4.5-bin-hadoop2.7/bin and then start it. To overcome this limitation, we need to set the SPARK_HOME in .bashrc file.

vi ~/.bashrc

save it
source ~/.bashrc

Verification of the installation

To verify if the spark is installed correctly. Type spark-shell in the terminal.
And you should see the below screen.

Write the below code and see if you get the desired dataframe.

val nums = Array(1,2,3,5,6)
val rdd = sc.parallelize(nums)
import spark.implicits._
val df = rdd.toDF("num")


Congrats! You successfully installed Apache Spark on Ubuntu and used spark-shell to execute several example commands. Please leave me a note in the comments area if you need help setting up. I’ll do my best to answer with a solution.

Happy studying!

Leave a Reply

Your email address will not be published. Required fields are marked *