Tags: jdbc scala oracle database
I was looking to write a scala script to fetch large data from Oracle database. I was earlier using the Alteryx software to fetch data (around 1 million rows) from oracle which took around 7 minutes. Licensed softwares like Alteryx or Informatica, etc are great, but I was looking for a free solution.
We need ojdbc6.jar to connect to the Oracle 12 c database. We will create a Scala script to fetch data from the oracle database & save it to the output file (CSV format).
Lets create a simple function timed
to time our execution
Now create function connOracle
to connect to Oracle
This took me 150 minutes to fetch data 1 million records. That was not the performance I was looking for 😟
However I realized that I can increase JDBC performance by tweaking optimal fetch value. The default fetch size is 10 for JDBC. Since I have more than million rows in the database, I should increase the fetch size to 1000.
The execution time improved drastically and reduced to approx 8 minutes.
Lastly it’s trivial to save the output to flat file (CSV format)
Also below is the build.sbt
for resolving dependency
name := “Oracle Connection”
Version := “0.1”
scalaVersion := “2.12.4”
libraryDependencies += “au.com.bytecode” % “opencsv” % “2.4”