I was looking to write a scala script to fetch large data from Oracle database. I was earlier using the Alteryx software to fetch data (around 1 million rows) from oracle which took around 7 minutes. Licensed softwares like Alteryx or Informatica, etc are great, but I was looking for a free solution.
We need ojdbc6.jar to connect to the Oracle 12 c database. We will create a Scala script to fetch data from the oracle database & save it to the output file (CSV format).
Lets create a simple function
timed to time our execution
Now create function
connOracle to connect to Oracle
This took me 150 minutes to fetch data 1 million records. That was not the performance I was looking for 😟
However I realized that I can increase JDBC performance by tweaking optimal fetch value. The default fetch size is 10 for JDBC. Since I have more than million rows in the database, I should increase the fetch size to 1000.
The execution time improved drastically and reduced to approx 8 minutes.
Lastly it’s trivial to save the output to flat file (CSV format)
Also below is the
build.sbt for resolving dependency
name := “Oracle Connection” Version := “0.1” scalaVersion := “2.12.4” libraryDependencies += “au.com.bytecode” % “opencsv” % “2.4”