Software development in Digital Humanities

One of the topics that greatly interested me from THATCamp 2009 (which really wasn’t addressed at Digital Humanities 2009) was software development/process of digital humanities projects. I’m interested in questions of workflow and task distribution—what does the team look like, and what does it actually do on a day to day basis? I had a lot of really great conversations with people that helped me clarify my thinking, and a lot of great book recommendations (most of which I can’t seem to find on the shelves, even though they say they are checked in. sigh).

Two sessions in particular at THATCamp were of great interest to me: “Picking a platform(s)” and “Software Development.” I also attended an interesting session on Drupal which, though a bit over my head, intrigued me.

Picking a Platform

A few points jumped out at me in this session. One is that it is possible to over analyze when picking a software solution, and often the best solution is to pick two or three likely candidates (the good ones tend to jump out/get recommended a lot), and work out a rough prototype in each of them. What strikes me about this is that we never work this part of the development of a project into the time estimates for a projects (when we actually manage to estimate time on a project, which is rare). Another point rephrased many different times is to try to avoid building something new whenever possible, and to look for stable, well supported projects to build on.

Another part of the discussion involved frameworks. If no CMS like WordPress or Drupal exists that fits your project, a framework is the next best choice. Omeka is built on the Zend framework, and CHNM also tried CakePHP and CodeIgniter (and maybe something else, I was having a hard time keeping up at this point). At the CDRH, we recently used CodeIgniter for a project, which I liked because it was lightweight and the documentation seemed friendlier for non programmers (i.e. me) that have to use the framework. It worked pretty well for the project we used it on, and we will likely try out other frameworks in the future. We’re also looking into Drupal and WordPress to build certain sites, but I am still unclear on how to integrate the large amounts of TEI documents we have into these products (if anyone has any advice, I’d love to hear it).  Many of our past projects have been built using Cocoon, and we will likely keep using this for the straightforward TEI sites.

Near the end of the session, I asked what people use to do the kinds of things I do most frequently—i.e. transform TEI documents into a website. DEAD SILENCE. This was a little surprising, but not completely so since most of the projects talked about were community building apps, not content driven sites. It was recommended I look into code4lib as well as the eXist database. This has led me to question the difference between digital library/etext stuff and digital humanities—is it the focus on content? is it the tools used? Is there even a reason for a distinction? Many of our projects at the Center could be classified more as a digital library, and I am starting to wonder if clarifying the type of project in this manner might be able to help us with our workflow a bit.

CHNM Creative Lead Jeremy Boggs remarked how the hiring of a graphic designer and his own studies in graphic design have changed CHNM’s approach to design dramatically. I found this interesting in light of other conversations I had at THATCamp regarding design—I’ll be talking about this a bit more in future blog posts.

A final point from this session is the sentiment, expressed frequently, that it would be nice to have some central place where we can share this kind of information and ask questions.

Software Development

I was really interested in this session, because at CDRH we are still trying to figure out how to… well, create software. We have very few processes in place, so it’s up to the development “team” (mostly consisting of me, a programmer, and a text encoder) to determine tools, put them into use, and train anyone else that needs to be trained. On the one hand, this is kind of nice—we get to choose what works for us. On the other hand, it is a little overwhelming, especially since none of us really have software development experience to speak of. As a result, I have become really interested in the software development process, especially as it relates to digital humanities.

Several concepts were brought up as essential to any project team of any size: bug tracking software, version control software, and, maybe, communication software. Right now we have a wiki and subversion, and are working on the bug tracking and other communication helpers. Sure, we can yell over the cubicle walls at each other, but at some point, we need a way to track all the broken stuff (especially as we keep finding more each day).

The group also talked about making more of our code open source, even the code we don’t think is very good, because others may be able to improve on it, or at the very least, use it to learn from. We need to get away from the “We’ll release it when it is finished” mentality and just get it out there. Another idea was to stop making one use only tools and to focus on broader things that can be reused. I like this idea in the abstract, but when I really start to think about how it applies to us, I see so many exceptions—little projects that will only happen once, special cases having to so with an esotaric area of inquiry from a scholar. I think part of what makes digital humanities interesting is we take on the stuff that doesn’t have a broader appeal, and therefore in any other context just wouldn’t get done. However, many of these one time projects might be of use to someone else, so the code should still be available.

What I’d like to see is a balance. Every digital humanities center or project is likely to have some code specific to their own project, but there’s also likely to be something which can be given back to enrich the community.  Both are important.

A common complaint in general at THATCamp, and especially in the Software Development session, was that documentation was always lacking. This seems to be a universal truth in software development, not just DH. One idea someone brought up was to pitch the documentation to a technical writing class for a class project. Another was to have a “documentation sprint” in the spirit of the code sprint.

One thing that was not brought up, but I have thought a lot about since reading Joel on Software (in book form), is the idea of the spec. We have never written a spec for any of our projects (that I know of), but I think it would be a useful exersize. You can read Joel’s series on specs here in 4 parts: I II III IV. I especially like the idea that a detailed spec, done right, can serve as a basis both for testing the end product and as a start to the documentation. What we have in place of a spec is a mess of meeting minutes—sometimes a year’s of biweekly meetings—which would be near impossible to go through to nail down all the decisions that have been made. Instead, the idea of a website is in one person’s, or several people’s, heads, which makes it pretty hard to sit down and build.

To be continued

I’ve just scratched the surface on the topic of software development. To an experienced software developer, most of these ideas will be old hat, but to me, most of it is completely new. I think this is one of the sometimes frustrating things abut working in a digital humanities center—we don’t hire with the thought of creating software projects, even though that is much of what we do. And much of the apparatus for website development I am used to from working in an ad agency (such as an art director) isn’t there either. It’s really no one’s job to tell us how to work as a team effectively.

On the other hand, this is what I absolutely love about my job. I get to be the art director, designer, coder, tester and researcher, which I find much more interesting than just design or just coding. For me, it’s really ideal, and I would not change a thing.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Software development in Digital Humanities

  1. Sherman Dorn says:

    Were you in George Brett’s session on data visualization? He had a very interesting 90 seconds to say about “wicked problems,” a concept developed from a software engineer’s perspective (see Cognexus, a website George noted in reference to dialog mapping). It strikes me that writing software specs might be an attempted neat solution to a wicked problem–fine as long as you see it as malleable once you’re deeply in the process.

  2. Kevin says:

    Sounds intriguing, I’m glad you are really pushing to improve the software development process in the CDRH (or, you know, create one). As you’re well aware by now, nobody has all the answers and there are about as many different philosophies for software development as there are developers. While researching how other people do things can be very beneficial and helpful, at the end of the day the most important thing is that you figure out something that works for your particular domain/developers/set of projects.

    In this vein, especially with regard to tools and formal processes I would recommend that you frequently and critically evaluate whether a tool/process is adding value to the project or just taking up time. It can be very easy to find a bunch of flashy tools that sound like they’ll solve all kinds of problems only to end up bogged down in so much process that no actual development gets done. Especially with small teams the amount of online collaboration required should be minimal (although bug tracking is a must-have for any project :) ).

    Also be careful with specs. They can grow out of control and end up taking more time than the development itself, especially if there are multiple stakeholders involved in the project, each trying to hammer out each of their specific and detailed requirements in the spec. Requirements that might change later on anyway once development has begun.

    That conference sounded pretty awesome though, I’m selfishly glad that you went to it so I can read these interesting posts :)

  3. Connie says:

    This is not exactly related to your post, but I saw it while looking at “tree” image results on Google and it struck my fancy and reminded me of you:

    http://www.digibarn.com/stories/desktop-history/bushytree.html