Subscribe to: Posts Comments Photos Links 250 Posts and 424 Comments till now

Software development in Digital Humanities

One of the topics that greatly interested me from THATCamp 2009 (which really wasn’t addressed at Digital Humanities 2009) was software development/process of digital humanities projects. I’m interested in questions of workflow and task distribution—what does the team look like, and what does it actually do on a day to day basis? I had a lot of really great conversations with people that helped me clarify my thinking, and a lot of great book recommendations (most of which I can’t seem to find on the shelves, even though they say they are checked in. sigh).

Two sessions in particular at THATCamp were of great interest to me: “Picking a platform(s)” and “Software Development.” I also attended an interesting session on Drupal which, though a bit over my head, intrigued me.

Picking a Platform

A few points jumped out at me in this session. One is that it is possible to over analyze when picking a software solution, and often the best solution is to pick two or three likely candidates (the good ones tend to jump out/get recommended a lot), and work out a rough prototype in each of them. What strikes me about this is that we never work this part of the development of a project into the time estimates for a projects (when we actually manage to estimate time on a project, which is rare). Another point rephrased many different times is to try to avoid building something new whenever possible, and to look for stable, well supported projects to build on.

Another part of the discussion involved frameworks. If no CMS like Wordpress or Drupal exists that fits your project, a framework is the next best choice. Omeka is built on the Zend framework, and CHNM also tried CakePHP and CodeIgniter (and maybe something else, I was having a hard time keeping up at this point). At the CDRH, we recently used CodeIgniter for a project, which I liked because it was lightweight and the documentation seemed friendlier for non programmers (i.e. me) that have to use the framework. It worked pretty well for the project we used it on, and we will likely try out other frameworks in the future. We’re also looking into Drupal and Wordpress to build certain sites, but I am still unclear on how to integrate the large amounts of TEI documents we have into these products (if anyone has any advice, I’d love to hear it).  Many of our past projects have been built using Cocoon, and we will likely keep using this for the straightforward TEI sites.

Near the end of the session, I asked what people use to do the kinds of things I do most frequently—i.e. transform TEI documents into a website. DEAD SILENCE. This was a little surprising, but not completely so since most of the projects talked about were community building apps, not content driven sites. It was recommended I look into code4lib as well as the eXist database. This has led me to question the difference between digital library/etext stuff and digital humanities—is it the focus on content? is it the tools used? Is there even a reason for a distinction? Many of our projects at the Center could be classified more as a digital library, and I am starting to wonder if clarifying the type of project in this manner might be able to help us with our workflow a bit.

CHNM Creative Lead Jeremy Boggs remarked how the hiring of a graphic designer and his own studies in graphic design have changed CHNM’s approach to design dramatically. I found this interesting in light of other conversations I had at THATCamp regarding design—I’ll be talking about this a bit more in future blog posts.

A final point from this session is the sentiment, expressed frequently, that it would be nice to have some central place where we can share this kind of information and ask questions.

Software Development

I was really interested in this session, because at CDRH we are still trying to figure out how to… well, create software. We have very few processes in place, so it’s up to the development “team” (mostly consisting of me, a programmer, and a text encoder) to determine tools, put them into use, and train anyone else that needs to be trained. On the one hand, this is kind of nice—we get to choose what works for us. On the other hand, it is a little overwhelming, especially since none of us really have software development experience to speak of. As a result, I have become really interested in the software development process, especially as it relates to digital humanities.

Several concepts were brought up as essential to any project team of any size: bug tracking software, version control software, and, maybe, communication software. Right now we have a wiki and subversion, and are working on the bug tracking and other communication helpers. Sure, we can yell over the cubicle walls at each other, but at some point, we need a way to track all the broken stuff (especially as we keep finding more each day).

The group also talked about making more of our code open source, even the code we don’t think is very good, because others may be able to improve on it, or at the very least, use it to learn from. We need to get away from the “We’ll release it when it is finished” mentality and just get it out there. Another idea was to stop making one use only tools and to focus on broader things that can be reused. I like this idea in the abstract, but when I really start to think about how it applies to us, I see so many exceptions—little projects that will only happen once, special cases having to so with an esotaric area of inquiry from a scholar. I think part of what makes digital humanities interesting is we take on the stuff that doesn’t have a broader appeal, and therefore in any other context just wouldn’t get done. However, many of these one time projects might be of use to someone else, so the code should still be available.

What I’d like to see is a balance. Every digital humanities center or project is likely to have some code specific to their own project, but there’s also likely to be something which can be given back to enrich the community.  Both are important.

A common complaint in general at THATCamp, and especially in the Software Development session, was that documentation was always lacking. This seems to be a universal truth in software development, not just DH. One idea someone brought up was to pitch the documentation to a technical writing class for a class project. Another was to have a “documentation sprint” in the spirit of the code sprint.

One thing that was not brought up, but I have thought a lot about since reading Joel on Software (in book form), is the idea of the spec. We have never written a spec for any of our projects (that I know of), but I think it would be a useful exersize. You can read Joel’s series on specs here in 4 parts: I II III IV. I especially like the idea that a detailed spec, done right, can serve as a basis both for testing the end product and as a start to the documentation. What we have in place of a spec is a mess of meeting minutes—sometimes a year’s of biweekly meetings—which would be near impossible to go through to nail down all the decisions that have been made. Instead, the idea of a website is in one person’s, or several people’s, heads, which makes it pretty hard to sit down and build.

To be continued

I’ve just scratched the surface on the topic of software development. To an experienced software developer, most of these ideas will be old hat, but to me, most of it is completely new. I think this is one of the sometimes frustrating things abut working in a digital humanities center—we don’t hire with the thought of creating software projects, even though that is much of what we do. And much of the apparatus for website development I am used to from working in an ad agency (such as an art director) isn’t there either. It’s really no one’s job to tell us how to work as a team effectively.

On the other hand, this is what I absolutely love about my job. I get to be the art director, designer, coder, tester and researcher, which I find much more interesting than just design or just coding. For me, it’s really ideal, and I would not change a thing.

Digital Humanities and THATCamp 2009

So I have an overdue post due from DH09 and THATCamp09. Maybe I should first explain what those are.

Digital Humanities 2009 Conference

Digital Humanities is the web conference for those involved in (wait for it…) digital humanities. This was the first academic conference I’d been to, adhering mostly to 1.5 hour sessions with three papers each, with people generally reading a paper with bullet point slides in the background. This is in contrast to the library conference I’d been to, which generally had less paper reading (though the bullet points, unfortunately, seem to be a staple everywhere).

Many of the sessions I attended had to do with data visualization and tool discussions. In both of these types of presentations, I found myself wishing that the presenters would start with the demo and then go on to talk about it, especially as many weren’t available on the web. Some of the groupings didn’t really make sense – that is, two of the talks would have a lot to do with each other and the other didn’t really.

The first two days I was at DH I felt very out of place- more than I felt at my first ALA, bot in that case I was staying with a close friend who is also a librarian, so that helped. I felt acutely my lack of an overarching “research interest.” Also, I didn’t know anyone in person and though I knew a few people from Twitter, I always saw them while they were talking to someone else so I didn’t introduce myself. In hindsight, this was probably my biggest mistake.

By day three, I felt more at ease, and this was also the most interesting day of presentations for me, so that helped. I was finally starting to get the hang of things when DH ended- but I did introduce myself to several people the last day.

I don’t know for sure if I will attend future DH conferences. My position does not get any travel funding (even if I were to present, and I’m not at all sure what I would present on anyway) and this was the last year I could get student pricing.

THATCamp (The Humanities and Technology Camp)

THATCamp Twitter Word Cloud

THATCamp Twitter Word Cloud (photo by ghbrett)

On the opposite end of the cost spectrum, by contrast, was THATCamp which is donation only. I attended THATCamp last year, so I knew a little more what to expect, and I felt more at ease right away. I’m not sure exactly why that is, but I think it has to do with the informality and the mix of people. It’s not that I don’t like hanging out with academics, but my job just doesn’t allow the time to think about the academic-y research questions, and THATCamp addressed some of the more, shall we say, down to earth aspects of digital humanities such as: how can we make this all work? While DH seemed attended by the scholars who told someone what to do, THATCamp seemed to be attended by more of the techies (I don’t consider that term a put down, BTW) themselves, and scholars who took more active roles in the development of their projects. Again, maybe my impressions are completely off, because it has all to do with  the sessions attended.

THATCamp, for those who don’t know, is (mostly) an unconference, though with a little more structure than many unconferences. The structure comes in the form of a blog, where people can post their ideas ahead of time and others can comment. The first day of camp we signed up for sessions, and the organizers grouped these logically.  I like this idea, but in practice a bunch of people posted on the blog the last day, when I didn’t have time to read them all before the camp started, and stuff was grouped together that maybe should have been seperate. I think either a 24 hour suggested deadline for the blog might be good, or some of Daniel Chudnov’s great suggestions for improving on THATCamp.

Oh, thats me on the left (photo by ghbrett)

Oh, that's me on the left (photo by ghbrett)

Last year when I attended THATCamp, I did not know how to program, was still getting my Master’s degree in library science, and was a lowly assistant at the CDRH- so while I found everything very interesting, I had a hard time putting things in context and I really couldn’t implement anything I learned as my job was centered around setting meetings and taking minutes. Still, I felt like a part of the crowd and accepted even then, and this was true even even more (if possible) this year. Now, I am a graduate with a degree in library science, and working as one of the developers (visual resources designer) at the CDRH. I have also started down the programming road thanks to Steve Ramsay’s class taken last school year. This meant that this year’s THATCamp filled me with ideas I could actually implement, which is a Very Good Thing. I hope more of my co-workers can attend in the future.

THATCamp was notable for being my first conference where it seemed like almost everyone was on twitter. Tweets came fast and furious, and at some point I couldn’t keep up anymore. But it was a great way to make connections between sessions. Since THATCamp ended, the “#thatcamp” tag has been used to continue discussions started at camp, and to start discussions about regional THATCamps- an idea that just may be the best thing to come out of THATCamp. Travel money is unlikely to get any looser in the next few years, and regional unconferences are a great way to get together at a minimal cost and share ideas. Hopefully more centers and universities will be able to sent their staff to a regional get togethers if nothing else. (More on that, probably, in a later post.)

Both DH and THATCamp were enormously beneficial, and I am glad I went. I am a little sad to miss out on ALA this year, but buying a new house (before selling the old one) has limited my funds somewhat. Maybe next year. Int he meantime, both conferences have given me a lot to think about and (hopefully) blog about.

Poetry meaning and folksonomy flaws

In my Electronic Texts class this semester, we have decided on a class project: a poem illustrator.

The idea is simple enough: input a poem and the program will pick a flickr picture as an illustration.

But how to pick the picture? You could analyze the poem, remove stop words, find the most common words, and search on that. But that doesn’t return great results because a) there may be quite a few “most common” words, and b) just because a word is the most common, doesn’t mean it will return a meaningful picture.

So as an alternate method, the class decided to run each of the words in the poem (minus stop words) through the flickr.tags.getRelated Flickr API method, and then again sort the results and find the most common word.

The idea was that if you have words like “flower” “tree” “field” “bird” you might, using this method, hit upon the common word “nature” and thus be able to use that to help pick a picture to illustrate the poem.

Well, I decided I just had to try this out this weekend (I’m impatient). So last night I wrote something in Ruby that would read in the poem, put the words into a list (I chose to ignore duplicates), and then feed those word to the flickr API and get a list of related tags, and then rank the tags. My code is crude and clumsy (for instance, I didn’t even filter out stop words because Flickr does that for me) but I got results.

The first poem I ran through was T.S. Elliot’s “The Waste Land.” Actually, to be more precise, it was the first part of “Wasteland,” because it’s a long poem.

“The Waste Land” by T.S. Elliot

APRIL is the cruellest month, breeding
Lilacs out of the dead land, mixing
Memory and desire, stirring
Dull roots with spring rain.
Winter kept us warm, covering
Earth in forgetful snow, feeding
A little life with dried tubers.
Summer surprised us, coming over the Starnbergersee
With a shower of rain; we stopped in the colonnade,
And went on in sunlight, into the Hofgarten,
And drank coffee, and talked for an hour.
Bin gar keine Russin, stamm’ aus Litauen, echt deutsch.
And when we were children, staying at the archduke’s,
My cousin’s, he took me out on a sled,
And I was frightened. He said, Marie,
Marie, hold on tight. And down we went.
In the mountains, there you feel free.
I read, much of the night, and go south in the winter.

by Jon Bradley Photography

by Jon Bradley Photography

Here’s the top of the sorted returned related tags I got back:

102 – <tag>abandoned</tag>
3 – <tag>wood</tag>
3 – <tag>blossoms</tag>
3 – <tag>window</tag>
3 – <tag>blackandwhite</tag>
3 – <tag>trees</tag>
3 – <tag>river</tag>
3 – <tag>rocks</tag>
3 – <tag>scary</tag>
3 – <tag>purple</tag>

“Wow!” I thought. 102 times I got a related tag “abandoned” for the poem The Waste Land. This returned the Flickr picture on the right, which I consider a great illustration for the poem.

At this point I did a little dance (literally, ask my husband) and congratulated myself on Twitter.

But too soon! Because I had not ran the results on any other poem. As it turns out, “abandoned” is a REALLY common flickr tag (as is “abigfave”).

Take this example:

“Messy Room” by Shel Silverstein

Whosever room this is should be ashamed!
His underwear is hanging on the lamp.
His raincoat is there in the overstuffed chair,
And the chair is becoming quite mucky and damp.
His workbook is wedged in the window,
His sweater’s been thrown on the floor.
His scarf and one ski are beneath the TV,
And his pants have been carelessly hung on the door.
His books are all jammed in the closet,
His vest has been left in the hall.
A lizard named Ed is asleep in his bed,
And his smelly old sock has been stuck to the wall.
Whosever room this is should be ashamed!
Donald or Robert or Willie or–
Huh? You say it’s mine? Oh, dear,
I knew it looked familiar!

by windy234

by windy234

Here’s the top of the list of returned tags for this one:

86 – <tag>abandoned</tag>
86 – <tag>adorable</tag>
3 – <tag>d200</tag>
3 – <tag>clothes</tag>
3 – <tag>d300</tag>
3 – <tag>christmas</tag>
3 – <tag>cat</tag>
3 – <tag>city</tag>
3 – <tag>clouds</tag>
3 – <tag>d80</tag>

As it turns out, searching for “abandoned” and “adorable” mostly returns pictures of stray cats.

I am not sure why “abandoned” is such a popular related tag, but this means I am back to the drawing board. To be sure, there are a lot more variations the class can try here—we can run all the words, not removing the duplicates. Lots of improvements can be made on my often broken and inconsistent code—I can’t even replicate the results I got the first time for “The Wast Land” already!

The big question on my mind is, how can we get an accurate measure of “relatedness” when it comes to words? Going back to the original example, how can we train a program to derive “nature” from the words “flower” “tree” “field” “bird”? It is possible to use the Flickr tags for this purpose? The idea behind folksonomies let me down here—as it turns out, people add tags for all kinds of reasons other than to describe the picture. I already knew this by my own practice using tags—I often use tags with no semantic purpose, for instance to group a few images together. I guess I thought the most popular use of tagging would be to describe a photo. The tag “abigfave” refers to the flickr group “A Big Fave” which seeks to find and promote good images. The thing is, I shouldn’t feel let down by folksonomies here. The tags are still serving their purpose by helping people find things.

I’ve been trying to think of other ways to get at the related words of a poem. I thought of mining delicious.com tags, but those are used for even more utilitarian purposes—e.g. items tagged with “flowers” are likely also to be tagged with “wedding.”

One other idea I can think of is to create our own related words list by mining, say, a million books. This is one possible answer to Gregory Crane’s question “What Do You Do with a Million Books“? I’m not sure exactly how this would work—it would probably involve analyzing the texts for words that appeared near each other, maybe. Even if we did the analysis, though, we may end up finding the same results as my Flickr search did. The thing is, it’s impossible to know without trying first.

with a little help from my friends

?Voj?

by Voj

This is a sort of follow up to yesterday’s post. Steve posted a nice comment, assuring me that “In the end, you’re better off developing a relaxed attitude toward the fact that you will *always* feel a bit stupid in this business.”

I completely agree, and on some level I know this—at the same time, the anxious nervousness I get from NOT knowing drives me forward. It’s exciting to have a whole topic in front of me I know almost nothing about. Although this comes with frustration, it’s of a limited kind, because I do have faith in myself to learn this stuff. One thing I want to be sure of is that I don’t go too far down the rabbit hole into things I don’t really need to learn—after all, we do have a programmer. And there are other things I really want to delve deeply into, like data visualization.

Anyway, that’s not really what this post is about. What it is really about is my deep indebtedness to all my friends, techie and not, who provide support and words of encouragement.

First and foremost, of course, is my husband, who is always unwavering in his support for me. Ditto for my parents. Then there’s my real life techie friends, who bring me back to earth when I start to get a little TOO overexcited, and provide encouragement, and sometimes let me bitch over lunch/drinks/boardgames. And then there’s all my friends online—commenters on my blog, friends on Twitter and Friendfeed—that provide encouragement and help right when I need it. I can’t begin to describe how lucky I feel. I don’t know how much of this “I can do it!” attitude I would have if I didn’t have people surrounding that echo the sentiment.

That’s it, really. Just wanted to get that out there. thanks everyone. :)

The learning curve

by Joshua Davis (jdavis.info)

by Joshua Davis (jdavis.info)

I’ve been busy these last few weeks. Though I dreamed of having lots of free time to read and relax post grad school, free time has been hampered by two things: Geoff and I decided to look for a new house, and I’m having to learn a LOT for the new job. If you want to read about the reasoning behind the house hunt, head over to my other site, OS Agnostic. Here I’m going to talk about steep learning curves and the troubles they present.

My situation is not uncommon- though I knew a lot of what was needed for my new job as Digital Resources Designer, there are a lot of things I need to learn. This is exacerbated by the fact that some of the technology we use/used was only known well by my predecessor.

Some of what I need to learn is pretty straightforward—PHP is an example. I am muddling my way through a few books on PHP, and can puzzle out a lot of stuff. I’m also getting better at XSLT and can do much fancier things with it than before.

The harder things I am learning are tough for a few reasons. One is that there’s no easy learning guide, and another is because the guides that do exist assume Unix/Linux system admin experience, which I don’t have. These technologies include Tomcat, Cocoon, Solr, and XTF. The other problem I have is we use these technologires in a somewhat simplified way, and I don’t need to know how to do everything with them. I really only need to learn to do a small subset of what the program can do, but all that information is bundled in books or websites with a lot MORE information than I need.

by Victor Gregorio

by Victor Gregorio

Finally, I feel that, lacking a computer science background, I am missing something vital to the understanding of these technologies. I’m not sure if this is a common feeling or not. I end up feeling lost much of the time.

The learning curve problems are multiplied by two, because the Center hired a programmer who started in January and is also learning these technologies (albeit more quickly than I am.) The person we replaced knew all these technologies, unfortunately, he was the only one who seems to know some of them well. We’re still trying to figure out how he did the work of two, since the programmer and my position are a split of what my former colleague did. I remain in awe of his prolificacy.

So, work is going well, but I can’t help but feel a little lost much of the time. It is really great when things fall into place and I understand something, but sometimes that moment seems all too elusive. And it is hard to balance the learning needed with the other things that need to be done.

« Previous PageNext Page »