IT Versity

making IT resourceful

  • Courses
  • Blog
  • Labs
  • Discuss

May 15, 2017 By Training Itversity Leave a Comment

Launching Spark Shell

Topic Progress:
← Back to Lesson

Let us see different modes in which we can launch spark shell

Execution Modes

  • Local
  • Stand Alone
  • Mesos
  • YARN
  • In production typically we use YARN or Mesos
  • Mesos is used in exclusive Spark clusters
  • YARN is used in distributions such as Cloudera, Hortonworks, MapR etc

On your PC

  • You need to have spark set up – click here (if you need to set up environment)
  • sbin contains scripts to start and stop services
  • bin contains commands such as spark-shell
  • We can launch spark in scala context using spark-shell command
  • By default it will be launched in local mode
  • spark-shell can also be launched in
    • Stand-alone mode
      • $SPARK_HOME/sbin/start-master.sh
      • $SPARK_HOME/sbin/start-slave.sh
      • Make sure $SPARK_HOME/sbin is appended to PATH
      • Start master start-master.sh
      • Start slave start-slave.sh spark://Apples-MacBook-Pro.local:7077
      • Go to localhost:8080
      • Get the URL for spark master similar to this 
      • Launch spark-shell, like this spark-shell --master spark://Apples-MacBook-Pro.local:7077
    • Mesos mode – out of scope
    • YARN mode – we will see that on labs.itversity.com

On labs.itversity.com

  • Enroll for the lab
  • Login to gw01.itversity.com
  • We can launch in local mode (default) or yarn mode
  • spark-shell will launch it in local mode
  • For YARN mode, you can check instructions over here

Using labs.itversity.com

  • For certifications such as HDPCD Spark and CCA 175, it is better to use YARN mode

Related

← Previous Topic

Filed Under: Uncategorized

Start the discussion at discuss.itversity.com

Socially Connected

  • Facebook
  • Google+
  • Linkedin
  • Twitter
  • YouTube
Getting Started with Spark - Scala
  • Setting up environment - Spark
  • Using labs.itversity.com
  • Virtual Machine images
  • Architecture of Spark
  • Revision of Scala
  • File Systems Overview
  • Overview of YARN
  • Launching Spark Shell
  • Setup data sets
Building blocks for Spark applications
  • Parameter files and parameters
  • SparkConf and SparkContext
  • Reading and Writing data using files
  • Resilient Distributed Datasets - Overview
  • Resilient Distributed Datasets - from collections
  • Resilient Distributed Datasets - from files
  • Overview of Actions and Transformations
Transformations and Actions
  • Spark Actions
  • Row level transformations
  • Shuffling and Combiner - concept
  • Aggregations - reduceByKey and aggregateByKey
  • Joining data sets - join, cogroup and cartesian
  • set operations - distinct, union and intersect
  • Sorting and Ranking - sortByKey and groupByKey
  • groupByKey - another example
Develop word count application - Spark using Scala
  • Develop wordcount using REPL
  • Add wordcount to IDE
  • Externalize properties - typesafe config
  • Running wordcount using jar
  • Execution life cycle of wordcount
  • Control arguments using spark-submit
More Spark Concepts and Core APIs
  • Develop card count by suit
  • Default number of tasks in each stage
  • Determine number of tasks for later stages
  • Managing RDD partitions - coalesce and repartition
  • mapPartitions - scala
  • Revenue per product for a given month - Scala
  • Accumulators - Scala
  • Broadcast Variables - Scala
Data Frames and Spark SQL
  • Data Frames - Getting started
  • Data Frames and Operations - Example
  • Data Frames - Spark SQL - Native
  • Data Frames - Spark SQL - Hive Context
  • Different file formats
Building Data Pipelines - Flume, Kafka and Spark Streaming

Copyright © 2018 · Education Pro Theme On Genesis Framework · WordPress · Log in