Writing to a Database from Spark

by Vitaliy M.

Content Manager

07/20/2021

Database from Spark

One of the great features of Spark is the variety of data sources it can read from and write to. If you already have a database to write to, connecting to that database and writing data from Spark is fairly simple. This example shows how to write to database that supports JDBC connections.

Databases Supporting JDBC Connections

Spark can easily write to databases that support JDBC connections. MySQL, Oracle, and Postgres are common options. In this post we show an example using MySQL.

Downloading the Database JDBC Driver

A JDBC driver is needed to connect your database to Spark. The MySQL JDBC driver can be downloaded at https://dev.mysql.com/downloads/connector/j/. MySQL provides ZIP or TAR archives that contain the database driver. Inside each of these archives will be a mysql-connector-java-...-bin.jar file. This is the JDBC driver that enables Spark to connect to the database.

Write data from Spark to Database

We now have everything we need to connect Spark to our database. If running within the spark-shell use the --jars option and provide the location of your JDBC driver jar file on the command line.
spark-shell --jars ./mysql-connector-java-5.0.8-bin.jar

Once the spark-shell has started, we can now insert data from a Spark DataFrame into our database. A sample of the our DataFrame’s contents can be seen below.

scala> df.show()
+---+----------+----------+--------------------+--------------------+---------------+
| id|first_name| last_name|               email|             country|     ip_address|
+---+----------+----------+--------------------+--------------------+---------------+
|  1|   Barbara|       Day|      [email protected]|                null|   252.148.11.9|
|  2|  Jennifer|   Roberts| [email protected]|               China|           null|
|  3|     Helen|      Gray| [email protected]|               China|167.188.111.177|
|  4|     Maria|       Day|     [email protected]|          Uzbekistan|196.109.242.208|
|  5|    Bonnie|      Diaz|     [email protected]|            Malaysia|  36.75.163.246|

Spark DataFrames (as of Spark 1.4) have a write() method that can be used to write to a database. The write() method returns a DataFrameWriter object. DataFrameWriter objects have a jdbc() method, which is used to save DataFrame contents to an external database table via JDBC. The jdbc() method takes a JDBC URL, destination table name, and a Java Properties object containing other connection information.

Here is an example of putting these various pieces together to write to a MySQL database.

//create properties object
val prop = new java.util.Properties
prop.setProperty("driver", "com.mysql.jdbc.Driver")
prop.setProperty("user", "root")
prop.setProperty("password", "pw") 

//jdbc mysql url - destination database is named "data"
val url = "jdbc:mysql://localhost:3306/data"

//destination database table 
val table = "sample_data_table"

//write data from spark dataframe to database
df.write.mode("append").jdbc(url, table, prop)

Notice in the above example we set the mode of the DataFrameWriter to "append" using df.write.mode("append"). The mode() method specifies how to handle the database insert when then destination table already exists. The default behavior is for Spark to create and insert data into the destination table. If the table already exists, you will get a TableAlreadyExists Exception. In order to write to an existing table you must use mode("append") as in the example above.


Best Custom Software Development Company

Top Web Development Companies

Related Articles

messaging app

People send thousands of messages to relatives, friends, partners, and employees via special apps every day. For the same reason, numerous people want to create their own service and, to succeed, they explore the issue of how to make a messaging app.T ...

Business App

Mobile solutions are available not only to large corporations, as they used to be, but also to small businesses. To gain a competitive advantage and improve service are the main reasons why brands turn ...

Music Streaming App

So many people enjoy listening to music at home, on the road, or on vacation. Not so long ago, we made up our own playlists with downloaded songs. ...

Improve Your Online Presence!

We exceed your expectations! Create a company profile and get noticed by thousands in no time!

Submit Profile