Apache Spark - Install Apache Spark 3.x On Ubuntu |Spark Tutorial

00:06:34
https://www.youtube.com/watch?v=ei_d4v9c2iA

Summary

TLDRThe video provides a step-by-step guide to installing and setting up Apache Spark 3.5.0 on an Ubuntu operating system in four stages. The procedures include installing Java, downloading the Spark package from its official website, setting environment variables, and verifying the installation by launching the Spark console. Starting with checking and installing Java using terminal commands, the instructor moves on to downloading the chosen Spark version with a pre-built option for Apache Hadoop 3.3 and later. The video further explains how to set the SPARK_HOME and PATH variables by editing the '.bashrc' file to enable Spark functionalities in the terminal environment. Finally, verification is done by running the 'spark-shell' command, and a CSV file is loaded as a demonstration to ensure Spark's proper working. The instructor encourages subscribers to leave comments for any questions.

Takeaways

  • 💻 Guide to setting up Apache Spark on Ubuntu.
  • ☕ Install Java using 'sudo apt install default-jdk'.
  • 🔥 Download and untar Spark 3.5.0 package.
  • 🔧 Set SPARK_HOME in '.bashrc' file.
  • ✅ Verify Spark setup by 'spark-shell' command.
  • 📂 Demonstrate reading a CSV file in Spark.
  • 🌐 Use official Spark website for downloads.
  • 🖥️ Terminal commands are essential.
  • 📄 Add Spark to PATH for environment access.
  • 📝 Engage with the community for support.

Timeline

  • 00:00:00 - 00:06:34

    In this video, the host from Big Tech talks guides viewers on how to set up Spark 3.5.0 on an Ubuntu operating system through four main steps. First, the setup begins by checking if Java is installed and proceeds to install OpenJDK version 11 if it's missing. Then, the second step involves downloading the Spark package by selecting version 3.5.0 pre-built for Apache Hadoop 3.3 and saving it in a created directory. Moving to the third step, environment variables are configured by editing the Bash RC file to include paths for Spark Home. The host finalizes the process with the fourth step by verifying the installation through a Spark console check and running a command to read a CSV file, confirming that Spark is effectively set up on the system.

Mind Map

Video Q&A

  • How do I check if Java is installed on my Ubuntu system?

    You can check if Java is installed by typing 'java -version' in the terminal.

  • What command is used to install Java on Ubuntu?

    To install Java, you can use the command 'sudo apt install default-jdk'.

  • How do I download the Spark package?

    You can download the Spark package by visiting the official Spark website and selecting the desired version.

  • How do I set up the Spark HOME environment variable?

    Edit the '.bashrc' file to add 'SPARK_HOME' and 'PATH' variables pointing to the Spark directory.

  • What command verifies the Spark installation?

    You can verify the Spark installation by typing 'spark-shell' in the terminal.

  • How to read a CSV file using Spark?

    Use the command 'val df = spark.read.format("csv").option("header","true").load("file_path")' to read a CSV file.

  • What do you do if Java is not installed?

    Install Java using the command 'sudo apt install default-jdk'.

  • Where do you add Spark to the path variable?

    Spark is added to the path variable by editing the '.bashrc' file and sourcing it.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:01
    hi friends welcome to another video of
  • 00:00:03
    big Tech
  • 00:00:05
    talk in today's video I will try to set
  • 00:00:08
    up spark
  • 00:00:09
    3.5.0 on umuntu operating system so
  • 00:00:13
    let's start the video I will set up
  • 00:00:16
    spark in four simple Steps step one is
  • 00:00:19
    to install Java on Ubuntu operating
  • 00:00:22
    system step two is to download the spark
  • 00:00:25
    package from its official website step
  • 00:00:28
    three is setting up the environment
  • 00:00:31
    variable and step four is verification
  • 00:00:34
    of a spark
  • 00:00:35
    installation so let's install Java on
  • 00:00:38
    our
  • 00:00:40
    system let's check if Java is installed
  • 00:00:43
    on our system so type Java hyphen
  • 00:00:48
    version okay looks like Java is not
  • 00:00:51
    installed on my
  • 00:00:53
    system so to install Java I will write
  • 00:00:57
    Pudo AP install
  • 00:01:00
    default
  • 00:01:04
    jdk I will provide sudo
  • 00:01:10
    password and then why to install the
  • 00:01:14
    packages it will install
  • 00:01:23
    Java looks like Java is
  • 00:01:26
    installed let's check the Java version
  • 00:01:28
    installed so Java hyphen version and we
  • 00:01:34
    have installed open jdk version
  • 00:01:37
    11 let's go to step two that is download
  • 00:01:40
    the spark
  • 00:01:41
    package open your favorite browser and
  • 00:01:44
    search for spark
  • 00:01:50
    download click the URL over here and it
  • 00:01:53
    will take you to the official website of
  • 00:01:56
    spark now select your spark version so
  • 00:02:00
    for this video I will select spark
  • 00:02:04
    3.5.0 and in the package type select
  • 00:02:07
    pre-build for Apachi Hadoop 3.3 and
  • 00:02:11
    later and click on spark
  • 00:02:15
    3.5.0 bin Hadoop 3.
  • 00:02:18
    dgz it will take you to a different page
  • 00:02:22
    where you can
  • 00:02:24
    download so let me copy the
  • 00:02:28
    URL
  • 00:02:31
    and in the terminal I will create a
  • 00:02:34
    directory where I will download the
  • 00:02:36
    spark package so I will switch to root
  • 00:02:39
    user by sud sudo
  • 00:02:44
    Su so to create a directory I will type
  • 00:02:50
    mkdir SL opt SLS
  • 00:02:54
    spark now in this directory I will
  • 00:02:57
    download my spark package so I will
  • 00:03:00
    write w get and the
  • 00:03:14
    URL once the download is
  • 00:03:18
    completed untar the file using tar
  • 00:03:22
    hyphen xvf and the package
  • 00:03:26
    name once it is done we will move to
  • 00:03:29
    step step three that is setting up the
  • 00:03:31
    spark
  • 00:03:32
    home so to add spark home in the path
  • 00:03:36
    variable we need to make changes in The
  • 00:03:38
    Bash RC
  • 00:03:39
    file so go to home folder and type vi.
  • 00:03:46
    bashrc and scroll till the end of the
  • 00:03:51
    file and hit I for insert and then type
  • 00:03:56
    Spar uncore home equals
  • 00:04:00
    SL opt SL spark SL spark
  • 00:04:06
    3.5.0 bin Hardo
  • 00:04:09
    3 and in the next line just type export
  • 00:04:14
    path equals to Dollar
  • 00:04:17
    path dollar spark
  • 00:04:22
    home/ bin and again the spark home slash
  • 00:04:28
    bin
  • 00:04:30
    so by doing this I have set The Spar CH
  • 00:04:34
    in the path
  • 00:04:35
    variable now save the file and Source it
  • 00:04:39
    so that variables can be used in this
  • 00:04:42
    environment so Source do
  • 00:04:47
    bashrc once it's done let's move to step
  • 00:04:50
    four that is verification of the
  • 00:04:53
    installation so in the terminal type
  • 00:04:56
    Spar hyphen shell and if everything is
  • 00:05:00
    okay we will see a spark
  • 00:05:04
    console okay so spark console is up and
  • 00:05:08
    running and we have Spar
  • 00:05:11
    3.5.0 now let's try to execute some Spar
  • 00:05:15
    command so I will try to read a CSV file
  • 00:05:19
    which is in my downloads folder that is
  • 00:05:22
    mock data.csv
  • 00:05:25
    so in the terminal I will type Val DF
  • 00:05:30
    equals Spar do
  • 00:05:33
    read. format and the format is CSV do
  • 00:05:39
    option where header is
  • 00:05:44
    true do load and the path of the CFC
  • 00:05:48
    file that is home work PC downloads SL
  • 00:05:53
    Mock data.csv
  • 00:05:58
    and and hit
  • 00:06:01
    enter and then DF do show to show the
  • 00:06:07
    output so this is the expected output
  • 00:06:10
    and looks like our spark installation is
  • 00:06:12
    working
  • 00:06:14
    fine so friends we are done with setting
  • 00:06:17
    up spark 3.5.0 on Ubuntu operating
  • 00:06:20
    system if you have any question let me
  • 00:06:23
    know in the comment section do hit the
  • 00:06:25
    like button and subscribe for more such
  • 00:06:27
    video thank you
Tags
  • Spark
  • Ubuntu
  • Java
  • Installation
  • Environment Variables
  • Verification
  • Terminal
  • Hadoop