[Ubuntu] Install Spark2.2.1 on Ubuntu 16.04

[Ubuntu] Install Spark2.2.1 on Ubuntu 16.04

What is Spark?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

You can download Spark from here, or follow the below instructions to download and install Spark2.2.1.

Download and install Spark2.2.1

$ sudo apt-get update
$ sudo apt-get -y install default-jdk scala
$ su
$ wget http://www-us.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
$ tar -zxvf spark-*.tgz
$ mv spark-2.2.1-bin-hadoop2.7/ /usr/local/spark
$ rm spark-2.2.1-bin-hadoop2.7.tgz
$ exit

Setting up the environment for Spark

Append the below line to the ~/.bashrc file

export PATH=$PATH:/usr/local/spark/bin

Then sourcing the ~/.bashrc file.

$ source ~/.bashrc

Run the Spark shell

$ cd /usr/local/spark
$ spark-shell

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/02/15 10:04:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/15 10:04:57 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 10.211.55.4 instead (on interface enp0s5)
18/02/15 10:04:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context Web UI available at http://10.211.55.4:4040
Spark context available as 'sc' (master = local[*], app id = local-1518642299027).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/
         
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

Spark interactive shell

scala> val sakanaFile = sc.textFile("README.md")
sakanaFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[3] at textFile at <console>:24

scala> sakanaFile.count()
res0: Long = 103

scala> val linesWithSpark = sakanaFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:26

scala> linesWithSpark.count()
res1: Long = 20

# We can chain together transformations and actions:
scala> sakanaFile.filter(line => line.contains("Spark")).count()
res3: Long = 20

scala> linesWithSpark.collect()
res4: Array[String] = Array(# Apache Spark, Spark is a fast and general cluster computing system for Big Data. It provides, rich set of higher-level tools including Spark SQL for SQL and DataFrames,, and Spark Streaming for stream processing., You can find the latest Spark documentation, including a programming, ## Building Spark, Spark is built using [Apache Maven](http://maven.apache.org/)., To build Spark and its example programs, run:, You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3)., ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html)., For general development tips, including info on developing Spark using an IDE,...

scala> linesWithSpark.collect.foreach(println)
# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
and Spark Streaming for stream processing.
You can find the latest Spark documentation, including a programming
## Building Spark
...
...

scala> 

Spark web interfaces: listening on port 4040

spark2.2.1

(Visited 700 times, 1 visits today)
Comments are closed.