Tags: spark pyspark docker vscode
One of the nifty feature in any code development is the ability to debug your application using break points. Submitting a Spark job while waiting for it to complete will waste a lot of time debugging. Spark jobs can be debugging with break points
and step over
and step into
and step out
commands.
Lets create our project.
$ mkdir SparkDemo && cd SparkDemo
We will create a file .devcontainer/devcontainer.json
. VS Code will use this file to access (or create) a development container with a well-defined tool and runtime stack.
$ mkdir .devcontainer
$ touch .devcontainer/devcontainer.json
devconainter.json
will look like this:
We will need to add a Dockerfile
to our project. This file will be used to build the container.
Lastly, we need to create a simple Pyspark
script.
Currently your project structure should look like this:
SparkDemo
___ .devcontainer
___ ___ devcontainer.json
___ Dockerfile
___ spark_demo.py
Next, we need to open the SparkDemo
in VS Code.
$ cd SparkDemo
$ code .
To run the remote-container, you can click on the green button in the bottom left corner of the VS Code window.