How to Install Spark on Ubuntu

In this article, we will cover the installation procedure of Apache Spark on the Ubuntu operating system. This article is divided into 4 parts.

Install java on Ubuntu.
Download the Spark package from the official website.
Setting up the path variable.
Verification of the installation.

System requirements:

Ubuntu Installed (I am using ubuntu 18.04.4 LTS. )
Minimum 4GB of RAM
10 GB of free space.

Install java

Spark need java for execution, So before installing spark we need java to be installed. Check if java is already installed.

java -version

If you see output like below then java is installed else type the below command on your terminal.

sudo apt-get install default-jdk

Download the spark package

To download the spark go to https://spark.apache.org/downloads.html and choose your spark release and package and then click on download spark. Save it to /opt/spark (mkdir -p /opt/spark ) directory.

Since its a tar file . So we need to untar it using below command

tar -xvf spark-2.4.5-bin-hadoop2.7.tgz
chmod -R 775 spark-2.4.5-bin-hadoop2.7

Setting up the path variable

To start spark everytime I need to go to /opt/spark/spark-2.4.5-bin-hadoop2.7/bin and then start it. To overcome this limitation, we need to set the SPARK_HOME in .bashrc file.

vi ~/.bashrc
SPARK_HOME=/opt/spark/spark-2.1.1-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

save it
source ~/.bashrc

Verification of the installation

To verify if the spark is installed correctly. Type spark-shell in the terminal.
And you should see the below screen.

Write the below code and see if you get the desired dataframe.

val nums = Array(1,2,3,5,6)
val rdd = sc.parallelize(nums)
import spark.implicits._
val df = rdd.toDF("num")
df.show()

Conclusion

Congrats! You successfully installed Apache Spark on Ubuntu and used spark-shell to execute several example commands. Please leave me a note in the comments area if you need help setting up. I’ll do my best to answer with a solution.

Happy studying!

System requirements:

Install java

Download the spark package

Setting up the path variable

Verification of the installation

Conclusion

You might also like:

How to Access Relational Data with Apache Spark

How to find table size in Apache Spark

How to Read CSV file using Spark DataFrame

Leave a Reply Cancel reply