Storing Failed Banks by Year in a SQLite Database
There are many ways to retrieve and store data on the iPhone, from property lists to NSCoding, from SQLite to Core Data.
This two-part tutorial series is going to cover one of those options:
SQLite. SQLite is a simple and easy to use library that implements a simple database engine, treating a database as a simple flat file.
In this series, we’re going to show how to make an app that displays a list of failed US banks from a SQLite database.
In this first part of the series, we will cover what SQLite is and why we’d want to use it, how to use the sqlite3 command-line utility, and how to import data programmatically via Python.
In the
second part of the series, we will cover how to make an iPhone app that reads data from the SQLite database we created, and display data in a table view and a drill down detail view.
This tutorial does not assume any prior knowledge with Python or SQLite, however it assumes a basic knowledge of SQL.
SQLite Overview
Before we begin, let’s give a high level overview of SQLite.
One of the nicest aspects of SQLite is its simplicty. As mentioned above, databases are just flat files!
This means that you don’t need to have a database server running at all. You simply tell the sqlite3 library the name of the file the database is stored in, and it contains all of the smarts about how to handle SQL statements to get the data in and out of the file.
One easy way to interact with sqlite3 databases is through the sqlite3 command line utility. To use it, simply fire up a Terminal and type sqlite3 followed by the name of the database you want to open, such as the following:
sqlite3 test.sqlite3
You will see a sqlite prompt appear, and you can begin to enter in SQL statements to create tables and work with data like the following:
sqlite> create table awesome_video_games(name text, type text);
sqlite> insert into awesome_video_games values('fallout', 'rpg');
sqlite> insert into awesome_video_games values('final fantasy', 'rpg');
sqlite> insert into awesome_video_games values('deus ex', 'rpg');
sqlite> insert into awesome_video_games values('hitman', 'stealth');
sqlite> select * from awesome_video_games where type='rpg';
fallout|rpg
final fantasy|rpg
deus ex|rpg
If you come from another database background such as MySQL, you’ll notice some differences with the syntax and some missing commands. I’d recommend checking out the
SQLite SQL reference and the
SQLite datatype reference for full details.
Note that the sqlite3 command line utility contains some useful “meta” commands to list tables and get table schemas like the following:
sqlite> .tables
awesome_video_games
sqlite> .schema awesome_video_games
CREATE TABLE awesome_video_games(name text, type text);
You can get a full list of “meta” commands by typing .help at the sqlite3 command line.
Why Use SQLite?
So why would we want to use SQLite, anyway?!
Well, SQLite has a lot of advantages over property list and NSCoding serialization when it comes to large sets of data.
First, it’s quite easy to pull out only a subset of the data available from the database at a time, which is good for memory usage and speed.
Secondly, the database engine can do a lot of neat work for you such as giving you counts of rows or sums of fields quite easily.
According to Apple, most of the cases where you find yourself wanting to use SQLite3, you would probably be better served using
Core Data instead because it has a lot of useful built-in features that can both save you time programming, reduce memory usage, and improve speed.
However, I’ve found the learning curve for Core Data to be significantly higher than SQLite3. Since Core Data is an API on top of SQLite3, learning SQLite3 first can be a good stepping stone to Core Data.
I may have some more articles in the future covering Core Data, but for now let’s start simple and work with SQLite!
Obtaining Some Data
Before we can get an app together, first we need to do is get a data set to work with!
If you aren’t already aware, the US government has a neat web site that makes a lot of data obtained by various government agencies publicly available at
data.gov.
I was browsing through the site and came across
a list of banks that have failed since October 1, 2000. I thought this would be a great set of data to work with, since a) it’s quite simple, and b) the data set is interesting to me! :]
So go ahead and visit the above link, and click to download the data in CSV format. Once you download it and take a look, you’ll see that the data contains lines of comma separated entries like so:
Desert Hills Bank,Phoenix,AZ,57060,26-Mar-10,26-Mar-10
Unity National Bank,Cartersville,GA,34678,26-Mar-10,26-Mar-10
Key West Bank,Key West,FL,34684,26-Mar-10,26-Mar-10
Creating Our Database : Via sqlite3?
Once we have the data, the next step we need to do is create a sqlite3 database with that data.
We could load up the sqlite3 command line utility and start creating our table and starting to import the data from it, line by line. However this would be incredibly tedious.
If our CSV was very simple, we could use the sqlite3 command line to create a table, specify the CSV separator value, and import the data from the CSV:
sqlite> create table failed_banks(name text, city text, state text,
zip integer, close_date, updated_date text)
sqlite> .separator ","
sqlite> .import banklist.csv failed_banks
However, unfortunately our CSV value is not that simple. If you look through the data, you will see some lines such as the following:
"La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10
In this instance, the quotes around “La Jolla Bank, FSB” are meant to indicate that that entire string is the name of the bank. However, the sqlite3 importer
does not handle embedded quote marks, so we have to find another option.
We could massage the CSV a bit to get rid of those quotes\commas of course. But there’s something else we can do that is pretty easy and a good learning experience too: import the data into our database with Python!
Creating Our Database : Via Python!
Python also comes with a built-in sqlite3 library that makes it super easy to interact with sqlite3 databases.
Python should be pre-installed on your Mac. Create a file named parseBanklist.py in the same directory that your CSV file is in, and start by adding the following code:
import sqlite3; from datetime import datetime, date; conn = sqlite3.connect('banklist.sqlite3') c = conn.cursor() c.execute('drop table if exists failed_banks') c.execute('create table failed_banks(id integer primary key autoincrement, name text, city text, state text, zip integer, close_date text, updated_date text)') |
The first line imports the sqlite3 module, and the second imports some datetime functionality we’ll need later.
Then we open up a connection to our SQLite DB by simply passing in the name of the file. We then create a cursor, which we can use to execute SQL statements or move through rows of results.
Then we call execute on the cursor to run two different SQL statements: one to drop any existing table by the name of failed_banks, and the second to create our table.
It’s good practice to have a primary key for database tables – so you can quickly access data by the primary key, for example. The “name” field wouldn’t be a good choice for a primary key, because banks could have the same name (such as if they were from different states).
So instead we create our own id field that is a unique number for each row. We mark it as auto-increment so the SQLite engine will handle assigning each row an incrementing number as we add data.
Also note that there is no date data type for SQLite. However, according to the
SQLite datatype reference, as long as you put dates in a particular format (one of which is a text string in the format “YYYY-MM-DD HH:MM:SS.SSS”), SQLite’s
date and time functions will be able to handle the values as dates – so that is what we’ll do!
At this point, you can run your Python script with “python parseBanklist.py” and then use the sqlite3 utility to verify that the database table has been created. Next, to add some data!
Inserting Data via Python!
Python has a neat function called string.split() where you can specify a delimiter, and Python will break the string up into an array of substrings. You can see this at work with the following:
>>> a = "Unity National Bank,Cartersville,GA,34678,26-Mar-10,26-Mar-10" >>> a.split(",") ['Unity National Bank', 'Cartersville', 'GA', '34678', '26-Mar-10', '26-Mar-10'] |
However, there is no built-in way (that I know of) to have Python handle quotes as escapes for separation, in order to handle the “La Jolla Bank, FSB” example shown above. So let’s write our own little function to split up a line that handles quote escaping:
def mysplit (string): quote = False retval = [] current = "" for char in string: if char == '"': quote = not quote elif char == ',' and not quote: retval.append(current) current = "" else: current += char retval.append(current) return retval |
This is quite straightforward Python here – we simply go through the string character by character, building up the list of sub-items separated by commas along the way.
Now that we have this, we can go line-by-line through our CSV file, inserting data into our database:
# Read lines from file, skipping first line data = open("banklist.csv", "r").readlines()[1:] for entry in data: # Parse values vals = mysplit(entry.strip()) # Convert dates to sqlite3 standard format vals[4] = datetime.strptime(vals[4], "%d-%b-%y") vals[5] = datetime.strptime(vals[5], "%d-%b-%y") # Insert the row! print "Inserting %s..." % (vals[0]) sql = "insert into failed_banks values(NULL, ?, ?, ?, ?, ?, ?)" c.execute(sql, vals) # Done! conn.commit() |
In the first line, we just open up the CSV in read mode, read all of the lines, and skip the first line (because it contains a header we don’t want to read).
We then iterate through each line, and use our helper function to split the line into pieces.
We use the
Python datetime module to convert the CSV data (in a format such as “26-Mar-10″) into a Python datetime object that we could then easily format however we want. But we’ll just take the default, which happens to output the dates in the format SQLite expects.
Finally, we create our SQL statemen, inserting question marks wherever user-specifie parameters should be, and then execute the statement passing in the users-specified values. We pass NULL for the first parameter (the id field) so that SQLite can generate the auto-incremented ID for us.
And that’s it! You should be able to run sqlite3 to verify the data is correctly in the database:
sqlite> select * from failed_banks limit 3;
1|Desert Hills Bank|Phoenix|AZ|57060|2010-03-26...
2|Unity National Bank|Cartersville|GA|34678|2010-03-26...
3|Key West Bank|Key West|FL|34684|2010-03-26...
Gratuitous Bar Graph
Once I’d gotten this far, I couldn’t stop myself from extending the script to make a little bar graph showing the number of failed banks by the year.
If you’re just interested in learning sqlite3, feel free to skip this step. But if you like making cool bar charts for no apparant reason, here’s how :]
- Install the latest 2.6 version of Python from python.org. Yes, the Mac already comes installed with Python, but this makes it so that the third party libraries we’re about to install install smoothly/easily. At the time of writing, I used the Python 2.65 Mac Installer.
- Install the latest version of Numerical Python. I used the 1.3.0 py26 Mac installer.
- Install the latest version of Matplotlib. I used the 0.99.1 py26 Mac installer.
Once you’ve got that installed, you can add the following to the bottom of your Python script:
# Get failed banks by year, for fun c.execute("select strftime('%Y', close_date), count(*) from failed_banks group b y 1;") years = [] failed_banks = [] for row in c: years.append(row[0]) failed_banks.append(row[1]) # Plot the data, for fun import matplotlib.pyplot as plt import numpy.numarray as na values = tuple(failed_banks) ind = na.array(range(len(values))) + 0.5 width = 0.35 plt.bar(ind, values, width, color='r') plt.ylabel('Number of failed banks') plt.title('Failed banks by year') plt.xticks(ind+width/2, tuple(years)) plt.show() |
If all goes well, you should see a cool chart like this:
I wonder what will this look like by the end of 2010?
Gimme the Code!
Here’s
a ZIP file with the CSV of the failed banks, the Python parsing script, and the resulting sqlite3 file.
source