Connecting to Oracle database form Scala using JDBC for large data

I was looking to write a scala script to fetch large data from Oracle database. I was earlier using the Alteryx software to fetch data (around 1 million rows) from oracle which took around 7 minutes. Licensed softwares like Alteryx or Informatica, etc are great, but I was looking for a free solution.

We need ojdbc6.jar to connect to the Oracle 12 c database. We will create a Scala script to fetch data from the oracle database & save it to the output file (CSV format).

Lets create a simple function timed to time our execution

Now create function connOracle to connect to Oracle

This took me 150 minutes to fetch data 1 million records. That was not the performance I was looking for 😟

However I realized that I can increase JDBC performance by tweaking optimal fetch value. The default fetch size is 10 for JDBC. Since I have more than million rows in the database, I should increase the fetch size to 1000.

The execution time improved drastically and reduced to approx 8 minutes.

Lastly it’s trivial to save the output to flat file (CSV format)

Also below is the build.sbt for resolving dependency

            name := “Oracle Connection”
Version := “0.1”
scalaVersion := “2.12.4”

libraryDependencies += “au.com.bytecode” % “opencsv” % “2.4”