life is too short for a diary




Debugging Spark Application Locally using remote container

Tags: spark pyspark docker vscode

One of the nifty feature in any code development is the ability to debug your application using break points. Submitting a Spark job while waiting for it to complete will waste a lot of time debugging. Spark jobs can be debugging with break points and step over and step into and step out commands.

Requirements

  1. VS Code

  2. Docker

Setup

Lets create our project.

$ mkdir SparkDemo && cd SparkDemo

We will create a file .devcontainer/devcontainer.json. VS Code will use this file to access (or create) a development container with a well-defined tool and runtime stack.

$ mkdir .devcontainer
$ touch .devcontainer/devcontainer.json

devconainter.json will look like this:


We will need to add a Dockerfile to our project. This file will be used to build the container.


Lastly, we need to create a simple Pyspark script.


Running the container

Currently your project structure should look like this:

SparkDemo
___ .devcontainer
___ ___ devcontainer.json
___ Dockerfile
___ spark_demo.py

Next, we need to open the SparkDemo in VS Code.

$ cd SparkDemo
$ code .

To run the remote-container, you can click on the green button in the bottom left corner of the VS Code window.

Demo


comments powered by Disqus