life is too short for a diary

Posts Tagged: spark

Getting Started With Parquet File Format

Data can be broadly categorized into three types based on its structure: Unstructured, semi-unstructured, structured data...

Continue reading → parquet spark

Debugging Spark Application Locally using remote container

One of the nifty feature in any code development is the ability to debug your application using break points. Submitting a Spark job while waiting for it to complete will waste a lot of time debugging. Spark jobs can be debugging with `break points` and `step over` and `step into` and `step out` commands...

Continue reading → spark pyspark docker vscode