FauxDB: Making it by faking it

As I've mentioned before, we're moving away from Caspio as our database provider to the extent that it makes sense (not out of utility, it's a function of cost). While we've managed to get some things migrated over, one of the biggest stumbling blocks are the things we use Caspio for the most — simple databases that need to be viewable and searchable online.

We have a number of semi-complex databases (read: more than a single-sheet XLS file) that we're not moving anytime soon (deed transfers database, among others, simply because of how we ingest the data), but there are a number that are little more than spreadsheets that we need to be able to view and search.

We investigated a number of vendor alternatives, but most featured pricing problems similar to Caspio, or had records limits absurdly lower than what we need. (Example: One such service offered 100,000 rows of data for $149/month. For comparison, one of our more popular databases, listing Pennsylvania teachers' salaries, has well over 2 million rows alone.) So, once again, Project Time™.


There is one thing that any aspiring programmer must realize when they set out to replace a tool: YOU CAN'T REPLACE A TOOL AT THE HEART OF A MULTI-MILLION DOLLAR CORPORATION ON YOUR OWN. I knew this academically but, as is often the case when setting out on these adventures, my brain chose to heed that advice only when it was convenient to do so.

I often live by the mantra, "If someone else can do it, that means it's possible." It works well something like 75 percent of the time — it prevents me from feeling daunted when facing large projects, but it can be turned around as well.

My favorite caveat is, "Technically, I could build you a reasonable facsimile of Facebook — it just wouldn't be as good, fast or as useful as the real thing."

It's true in that somebody built Facebook, but (more accurately) thousands of somebodies built Facebook. It's doable, it's just not feasible for one person to replicate it completely on their own.

That being said, Past Me was convinced it couldn't be THAT difficult to take a spreadsheet and present it online, despite the fact that people routinely pay up to and including hundreds/thousands of dollars per month to companies to be able do exactly that.

Ah, hubris.

The first priority involved figuring out how to store the data. The reason the York Daily Record likes Caspio so much is not just its versatility and usefulness, it's how easy it is to use. Caspio spent a lot of time and money into figuring out an interface that, while not everyone can use it and even fewer can take full advantage of all its features, it's easy enough that most people can do basic things with little training. This actually posed the greatest challenge — the data needed to be able to be input and edited in such a way that your average reporter (think 35-year-old metro reporter, not 23-year-old working at The Verge) would be able to do so without having to email/call me every five minutes. That ruled traditional databases out right away. (Which is not to say that you can't build an edit-friendly MySQL frontend, but I didn't have that kind of build time for this project.)

The build

The easiest and cheapest way forward seemed to be (as ever) through Google. Though I'm becoming more wary of Google Docs' live-editing capabilities, for the purpose of "storing data and being able to edit it directly," Sheets fit the bill.

Because our CMS does not allow for server-side code inclusion (another story for another time), inserting the data into articles needs to be accomplished via JavaScript drop-in. Since we're going to be building it in JS anyway (and I'm a firm believer on not doing the same work twice unless I forget to commit something to the repository), I figured we'd just use one codebase for both the widget version and the standalone.

After a little bit of searching (I got burned out going through a dozen different Caspio alternatives), I settled on DataTables as our jQuery plugin of choice.

Here's the part where I always have trouble when trying to relate the struggles of the average newspaper's newsroom to the more digital-focused newsrooms who have multiple app developers and coders on staff — most newspaper reporters do not have the coding ability beyond making a link or typing into the TinyMCE in WordPress.

You can get them to do things like a YouTube embed using a tag interface [Youtube: https://www.youtube.com/watch?v=jvqfEeuRhLY], but only after some heavy-duty brainwashing (and we still struggle with getting Excerpts right).

So while I and probably three or four in our newsroom have no problem using Quartz's excellent ChartBuilder, it's not something we can just send out to the general population with a "use this!" subject line and expect results.

While some might be content with a simple "Use DataTables!" and inserting some code to auto-activate the tables when people set them up properly, asking your average journalist to use JavaScript parameters is a fool's errand, and we're not even within driving distance of, "Oh yeah, and get your Sheet into JSON for DataTables to use."

Which is not to call them stupid — far from it. It's just that these are people who spent a bunch of time (and, likely, money) to learn how to write stories properly. Then they got to work anytime after 2005 and discovered that it wasn't enough — they have to learn Twitter, Facebook, an ever-increasing number of content managements systems and (oh yeah!) they still have to do it while writing their stories. All of this is doable, of course, but to ask them to learn HTML and JavaScript and every new thing someone invents (which even I have given up all hope of keeping up with; there are just too many new things out there) is simply untenable.

Thus, I consider it my number one job to make their jobs easier for them, not just give them something complicated they have to learn just because it does a new thing (or an old thing in a cooler/cheaper way).

For the first version, it's about as simple as can be. People work on their data using their own preferred Google accounts (work or personal), leaving them with a document they can play around with. Once they're to a point where they're ready to present the data to the public, we copy the data into a separate account. This has the advantage of a) keeping the data under our control, in case the reporter quits/leaves/dies/deletes their account, and b) allows the reporter to keep their own copy of the data with the fields they don't want shown to the public (internal notes, personally identifying information, that sort of thing). The reporter then grabs the sheet ID from the URL and puts it in the tool.

Assuming the data passes some very basic tests (every column has a header, only one header row, etc.), they're presented with a list of fields. Because our CMS frontend does not allow for responsive design, all our information lives in 600 pixel-wide boxes. So with a little help from jQuery Modal, I added some functionality to DataTables using the standard hidden fields that hides some columns in the standard presentation, but shows the entire entry's information in a modal if a row is clicked.

For version 1, search is pretty simple: If there's a field, it's searchable. We're hoping to expand on that in later iterations to not search certain fields, as well as create some method of specifically searching fields (as seen in this Caspio implementation). Users then add a title (shown only in the full version; we're assuming wherever the widget drop-in goes, there's already a headline on the article) and customized search text.

They're then taken back to the main screen, where they can find links to the full data page (like this), which we use for our mobile implementation (neither our apps nor our mobile site executes JavaScript, so we always have to include links to a place off our main domain for our mobile readers to view) as well as the drop-in widget code.


Eventually, we hope to add some things like the extended search functionality, a "download data" option and other enhancements. But for now, we feel like we have a tool for basic database work.