[Ubuntu] Install Spark1.6.0 on Ubuntu 14.04

[Ubuntu] Install Spark1.6.0 on Ubuntu 14.04

What is Spark?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

You can download Spark from here, or follow the below command to download and install Spark1.6.0.

Download and install Spark1.6.0

$ sudo apt-get update
$ sudo apt-get -y install openjdk-7-jdk
$ sudo apt-get -y install scala
$ wget http://www.us.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
$ tar zxvf spark-*.tgz
$ mv ~/spark*/ ~/spark
$ cd spark
$ ./bin/spark-shell 

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/13 20:10:12 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 10.211.55.4 instead (on interface eth1)
16/01/13 20:10:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc.
16/01/13 20:10:16 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/13 20:10:17 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/13 20:10:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/13 20:10:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/13 20:10:25 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/13 20:10:25 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.
scala>

Spark interactive shell

scala> val sakanaFile = sc.textFile("README.md")
sakanaFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27

scala> sakanaFile.count()
res0: Long = 95

scala> val linesWithSpark = sakanaFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:29

scala> linesWithSpark.count()
res1: Long = 17

scala> linesWithSpark.collect()
res2: Array[String] = Array(# Apache Spark, Spark is a fast and general cluster computing system for Big Data. It provides, rich set of higher-level tools including Spark SQL for SQL and DataFrames,, and Spark Streaming for stream processing., You can find the latest Spark documentation, including a programming, ## Building Spark, Spark is built using [Apache Maven](http://maven.apache.org/)., To build Spark and its example programs, run:, ["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html)., The easiest way to start using Spark is through the Scala shell:, Spark also comes with several sample programs in the `examples` directory., "    ./bin/run-example SparkPi", "    MASTER=spark://host:7077 ./bin/run-example SparkPi", Testing first requires [building Spark](#b...

scala> linesWithSpark.collect.foreach(println)
# Apache Spark
Spark is a fast and general cluster computing system for Big Data. It provides
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
and Spark Streaming for stream processing.
You can find the latest Spark documentation, including a programming
## Building Spark
Spark is built using [Apache Maven](http://maven.apache.org/).
To build Spark and its example programs, run:
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
The easiest way to start using Spark is through the Scala shell:
Spark also comes with several sample programs in the `examples` directory.
    ./bin/run-example SparkPi
    MASTER=spark://host:7077 ./bin/run-example SparkPi
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
Hadoop, you must build Spark against the same version that your cluster runs.
in the online documentation for an overview on how to configure Spark.

scala> 

Spark web interfaces: listening on port 4040

spark1

(Visited 519 time, 1 visit today)
Facebooktwittergoogle_plusredditpinterestlinkedinmail
Comments are closed.