Subscribe to: Posts Comments Photos Links All feeds in one 233 Posts and 342 Comments till now

Day in the life post 1

Syntax Error -  Folded Up Beyond All Recognition

Photo by Simon Pow

I have been quiet on my blogs lately. I think I just needed a bit of a break- not to mention that school and my practicum are taking more time than I thought they would. I am back now to participate in the “A Day in the Life of a Library” meme, started by Bobbi L. Newman at Librarian by Day. You can see a list of other “Day in the Life” posts on the wiki.  (or sign up to participate yourself!)

Today was a little unusual, because I took a vacation day to go to my practicum at the Nebraska Library Commission. For those that don’t know, a practicum is a school requirement where I get to pay tuition to work for 90 hours. Exciting, huh? Usually, a practicum is used to get a sampling of what it might be like to work various jobs in a library. In my case, I was approached to help build a new website for the Nebraska Library Commission to supplement their existing site, and it was a too good an experience to pass up. I’m running a little bit behind on hours for the practicum (it’s over August 1st) which is why I am taking vacation days to finish it up.

It was actually really, really wonderful to be able to devote a full 8 hours to a project like this. My usual day (as you’ll see later this week) is a little bit of everything, with a LOT of interruptions. While it makes the day go quick, it can also make it hard to get things done. Today, on the other hand, I had the luxury of working almost non stop on one project. So today’s post will be somewhat short. Except I already rambled quite a bit already. oops.

I got to the Commission a little before 8. My first order of business was to talk to Michael Sauers. I had a favor to ask him having to do with the Other Job, and I wanted to touch base about the presentation we’ll be giving together this fall.

After that, I picked up where I left off last night on my website design. I have finished working out most of the design part, and am down to the nitty gritty of the CSS- not exactly my favorite thing to do. This is where having a lot of time to work on something comes in handy, because I can’t squash all those CSS cross browser bugs very quickly. Around 9:30 I checked the email account for the Other Job, responded to a couple of things, checked my personal email, checked Twitter, and started work on the CSS again.

I made a lot of progress today, which felt great. My next step in my practicum is to show my design proposals to the web committee, and hope they like them. I also have been working on a few suggestions for them. I am trying to leave them with well commented code and some Photoshop files they can use to make their own custom attractive graphics.

So that’s it for today. There will probably be some more variety in tomorrow’s post, but this being summer I can’t guarantee anything. :)

ALA Annual update

Colors of San Pedro

Colors of San Pedro by my hovercraft is full of eels

The last few weeks have been a bit of a blur. Various house issues, preparing for vacation and ALA Annual, work, school, and life have been keeping me very busy. All my poor blogs are neglected. :(

I’m not going to post an ALA schedule yet, because I learned last year that it will just change anyway as ALA draws closer. I will probably post a few tentative plans next week, and will hopefully blog some sessions. Of course I will go to the sessions Cory Doctorow is at.

As for social activities, I will go to the Scholarship Bash Saturday night, and then some of us are trekking to San Pedro for the Rocky Horror Picture Show. I will go to the Blog Salon and the NMRT Social (I’m sad they’re not in the same hotel this year) most likely.

I’m heading for vacation in California before Annual- so if you are there beforehand too and want to do something, email me (karin@nirak.net.) If you want my cell phone # to contact me during Annual, just email me.

I will likely be posting vacation related stuff to my blog at os-agnostic, so check there if you want to read any of that. I’m also going to bring a painting to LA so I can give away a painting during my trip - hopefully to someone at the conference. It worked well at THATCamp.

I think that’s it. If you’re going to Annual, I’ll see you there, and if not, I hope I don’t annoy you too much with my conference postings and tweets. :)

THAT Camp, Day 1

Finally back for good in my hotel after day 1 of THAT Camp. I am exhausted and energized at the same time. The organizers have brought together an absolutely amazing group of people, and I am humbled by the sheer brilliance present. I’m going to do a quick overview, but many of the topics discussed will show up in my blog for weeks to come.

First, though- the DC area is becoming a favorite destination of mine, even though I have only been here twice now. I spent 5 hours yesterday int he National Gallery of Art, and was, of course, awed the entire time. (The only annoying part was listening to people say ‘why is that art? I could do that!’ over and over. ) Fairfax is lovely, despite the occasional disappearing sidewalks (seems people don’t walk long distances here very often?)

THAT Camp began with a great breakfast and a whole group meeting where we planned out the schedule for the day. Participants posted their presenting ideas to the blog for a couple of weeks leading up to the unconference, so the task was a bit easier.

Session 1 - Art

The first session was a session on art- specifically digital art. There were only two others including me, David Rieder and Susan Harum. We had a great discussion of what digital art might look like and how it might be supported. David and Susan had many, many great links to share, and it was great to hear how other campuses are dealing with the emergence of digital art. I’d love to see more about this topic.

Lunch!

A fantastic lunch was accompanied by Dork Shorts- brief talks on technology topic. Presenters had 5 minutes to show off their site or idea. More good link goodness, although some of the sites were in production and not yet available to the public.

Session 2 - Alternative search

I started the session with a brief slide show that addressed some of the points I’ve made in my recent alternative search postings.

After that, I left it up to the group to talk about what we could do to make search better. I was thrilled that the group contained a number of people with much more experience with search than me, and we talked about technologies, what the users want, and how to make search better. Josh Greenburg brought up the excellent point that some of what we think of as search problems are really user interface problems- so I am looking forward to attending the interface design tomorrow.

One of the developers of Blacklight (Bess Sadler), an open source OPAC enhancement, was there and the work that they have done is absolutely amazing. I particularly liked her ideas for allowing departments to customize search for different disciplines through an easy to use GUI interface. There were a lot of other great links mentioned, which, unfortunately I lost because of an errant keystroke.

Session 3 - Making things

Bill Turkle lead two sessions on the Arduino- I attended the second. I managed to make a light blink and alter a few programs, but what I am really excited about is getting an Arduino. I have never done anything with physical computing or electronics before, so it was a steep learning curve for me. I am the proud new owner of an Arduino, though, and I have several ideas of project I can’t wait to get started with.

Session 4 - Creative Commons/Copyright

I sort of led this session, too, through I felt a bit like an impostor because I am by no means an expert on copyright. I started with a discussion on creative commons, talked about why I use it, and what some of the advantages and disadvantages are. The group talked about some of the copyright issues they have had, and we tried to brainstorm some ways to get around them. I wish I had more answers for the frustrating issue of copyright. I believe in intellectual property, but also share the belief of many that the copyright system as it stands is as much of a hindrance as a help.

One of the frustrations the group expressed was the tendency of institutions to hold back higher resolution images from the web, opting instead to only allow very low resolution images to try and make money by selling higher resolution images. One solid idea we came up with is to try and collect studies that analyze the cost vs benefits of doing this and compile a list of advantages of making higher resolution images available and free to use. I’m going to work on this - I’m wondering if I can make it into an independent study project for school.

Andrea Ferguson talked a little bit about her experiences getting her MFA at the University at the University of South Florida, and I came away much more optimistic about Fine Art in Academia. I have been afraid that digital art was stifled many places, but many conversations have now led me to believe that that just isn’t so. Makes me want to go for an MFA even more.

Recap and dinner

At the end, the group met again and Josh Greenburg made a few final remarks. Then many of us went to dinner at Minerva, a fantastic Indian restaurant here in Fairfax. The dinner and the conversation were excellent.

I look forward to another great day tomorrow, though my brain feels about full already. I have a beautiful walk to CHNM tomorrow in the morning to look forward to, during which I can clear my thoughts.

Alternative search, part 2: Analyzing a document

Analyzing the metadata in a document is a fairly straightforward process. However, analyzing the document itself is a little messier.

Document analysis is nothing new- people have been programming computers to refine full text analysis since the days when full text first started to appear. As hard drive space became cheaper and computers become more powerful, new documents began to be stored and new ways to analyze them were developed. In Information Storage and Retrieval, Korfhage mentions several of these methods, but things have evolved quite a bit since then. Besides the methods mentioned below, specialized retrieval systems such as face recognition used in law enforcement have been developed, but I will focus on a few technologies available to the general public.

Text Analysis

Text analysis in documents has come a long way since the inclusion of the first full text documents in databases. Search engines have become quite good at parsing the full text of web pages, as well as using hypertext and other measures to determine what a page is about. With the advent off more and more full electronic text, scholars have started to study ways to use text analysis on literary works. One such project is the Mellon funded MONK project. Sites are starting to work text analysis into their search and browsing features as well.

The Willa Cather Archive offers a feature to perform in depth text analysis on all of Cather’s books using a program called TokenX. This process is different than simply searching for a term because you can do new things, such as compare the use of words across books and contextualize the words for the user. These kinds of analyses allow scholars new ways to analyze literature.

Screenshots for Information Retrieval paper

Cather Archive text analysis powered by TokenX
, search results view.

Screenshots for Information Retrieval paper

Cather Archive text analysis powered by TokenX
, words in context view.

Another now common way to analyze documents is to create a word cloud of common words. Word clouds are commonly made up of user entered metadata such as tags, and are less commonly used with entire documents. The reason why is fairly obvious when one sees such a cloud- words like “a,” “an,” “the,” and “that” end up being the largest words in the cloud because they are the most common. However, a word cloud can be a useful way to browse even full text documents. This can be achieved by carefully filtering out words that do not add meaning to the cloud. The website “The Mountain Meadows Massacre in public discourse” does this in one of its visualizations, offering a view of common words used in articles about the Mountain Meadows Massacre . Another site that uses this technique is a search engine called Quintura (Fig. 15). Quintura analyzes the results from a web search and creates a word cloud of corresponding terms. Users can click on words to add or subtract them from a search. This may be more intuitive for users who don’t know how to use an advanced search.

Screenshots for Information Retrieval paper Screenshots for Information Retrieval paper
“The Mountain Meadows Massacre in public discourse” word cloud. Quintura search engine.

Multimedia Document Analysis

Although text analysis has been around for a while, it is only recently that computers have been able to analyze image and sound documents. It is not that such a search is impossible. In fact, Korfhage reported work was already beginning on such analysis in 1997, but it is extremely computer intensive and complex. As Korfhage notes, the transformations something might go through are enormous- a picture of a bridge can be from above, below, from the side, or on the bridge. It might be a sketch or a photograph. Also, there are hundreds of types of bridges (p. 249). Asking a computer to identify a bridge in an image is still a long way off and may never happen. However, other kinds of image analysis are possible and even easy using computers.

Color is one thing that is easy enough to analyze using a computer. The computer can select areas of a picture, average the colors, and match those colors up to a user provided hue. This allows for some interesting image analysis that aids both browsing and finding. One such site is called Flickr Colr Pickr . The navigation in this site is simple: choose a color from the color wheel, and the engine returns results that match the color. Another search that uses the Flickr API is called Retrievr, which allows for an even more complex query: it lets the user draw a picture to return pictures that resemble the drawing. This may work well when looking for photos of a sunset or the ocean, and less well for images of a dog. Retreivr is based on research by Chuck Jacobs, Adam Finkelstein and David Salesin, who created an algorithm which is “simple, requires very little storage overhead for the database of signatures, and is fast” (Jacobs, Finkelstein, & Salesin, 1995, p. 277).

Screenshots for Information Retrieval paper Screenshots for Information Retrieval paper
Flickr Color Fields allows searching Flickr photos by color. Retrievr matches photos to a drawing.

The above means of finding photos work well for browsing, but not as well for finding. One application that could prove very useful for finding is demonstrated by Dave Pattern (based on earlier experiments by Tim Hodson) (clarified thanks to Tim’s comment below) in an experimental site which lets you search for a book by color. Tim Hodson explained the usefulness of such a feature in a blog post. Imagine a patron asking “I heard about a book three months ago. I can’t remember who wrote it or what it was called, but it was blue” (Hodson, 2008, para. 3). Pattern goes on to describe the process for searching book covers in the same way Retreivr searches Flickr images “The search works by comparing the hex colours of the 8×8 version of the search image with the corresponding pixels of the book covers. Each book cover then gets ranked by how well it matches the search image” (Pattern, 2007, para. 8). Etsy has yet another fun way to search for products with its Colors search. Pick a color and Etsy will show you photos of products whose colors match your request. Although this isn’t a perfect method, it is an innovative way to search products.

Screenshots for Information Retrieval paper Screenshots for Information Retrieval paper
Dave Pattern’s demo of a book search by color. Etsy Colors.

One website, called like.com, uses several methods to help the user find a good result. Like.com might be one of the first applications of research performed by Wei-Ying Ma1 and B. S. Manjunath in 1999, promising the ability to “retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper of the image” (p. 184). It not only uses existing metadata as mentioned above, it uses image analysis to find similar products. In the example picture, a small box is drawn around part of the product, and the engine finds products similar in style or color. The user can then refine by style, color, and other options. This kind of innovative searching is likely to get more and more common.

Screenshots for Information Retrieval paper
Like.com lets you search by drawing a box around the part of the item you like.

Though full text document analysis is exciting, things really start to get interesting when sites allow for user added metadata and use that data to provide ever better search results. That’ll be the next (and last) part in the series.

Bibliography:

Hodson, T. (2008, March 6). Colourphon: cooking up something interesting. Information Takes Over. Retrieved April 28, 2008, from http://informationtakesover.co.uk/archives/2008/03/06/colourphon-cooking-up-something-interesting/.

Jacobs, C. E., Finkelstein, A., & Salesin, D. H. (1995). Fast multiresolution image querying. Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, 277-286.

Korfhage, R. (1997). Information storage and retrieval. New York: Wiley Computer Pub.

Ma, W. Y., & Manjunath, B. S. (1999). NeTra: A toolbox for navigating large image databases. Multimedia Systems, 7, 184-198.

Pattern, D. (2007, February 1). Michael Stephens = Norman Bates?!? Self-plagiarism is style. Retrieved April 28, 2008, from http://www.daveyp.com/blog/index.php/archives/172/.
Figures

Alternative search, part 1: Using existing metadata or data

Yesterday, I mentioned three types of alternative search:

  • Search that uses existing human or computer supplied metadata to find and display information.
  • Search that analyzes a documents’ contents to return a result.
  • Search that relies on user added metadata

Today, I’m talking about the first technique: using existing metadata in new ways to facilitate finding and browsing. Most files have some metadata attached: the date the file was created, the date it was last altered, the owner’s name, categories, etc. Many scholarly projects have extra metadata associated with it, expertly researched or generated. Library online catalogs also have rich metadata for holdings. Many systems have rich metadata, but don’t use it in a way that helps users to find what they are looking for.

Mapping

One example of using existing metadata to help a user find what they want is mapping technologies. Take crime statistics. Many police departments are now mapping crime data on interactive, online maps. The Lincoln City Police Department is doing just that:

Screenshots for Information Retrieval paper

Map of crime provided by the Lincoln, Nebraska Police Department.

Here we have already existing data from the police blotters, which has been accessible for quite some time. In years past, one could find police blotter information in the newspaper, and later these were moved online. While it was nice to have the information, and any citizen could scan the pages to find crime in his or her area, it was very difficult to answer the question “what crimes have taken place within a quarter mile of my house in the last 5 days?” By plotting the already existing information on a map, citizens can keep watch on crime in their area.

The previous example helps the user find the answer to a specific question. Other map based systems help the user browse through materials and form new questions. This is often the case in systems for scholarly research papers. As an example, The Willa Cather Archive has a forthcoming feature which maps Willa Cather’s travels across the globe and links them both to time and to objects which include primarily letters and photos.

Screenshots for Information Retrieval paper Screenshots for Information Retrieval paper

Screenshots of Willa Cather Archive production feature. (Feature not yet available.)

Called “Mapping a writer’s world: A Geographic Chronology of Willa Cather’s Life,” this feature allows a Willa Cather scholar to explore the archive’s collections not only through time, but through space as well. This allows the scholar to make connections that would be hard to make otherwise. The time component allows users to be led through Cather’s travels. This new view, a sort of geographic biography, brings a new perspective to Willa Cather’s life and may shatter the stereotype some have of a woman who lived her life on the plains. Similarly, map based views of documents on the site “Envisaging the west: Thomas Jefferson and the roots of Lewis and Clark” help the user find documents in space as well as time, and can help the user put documents together in new ways.

Screenshots for Information Retrieval paper

“Envisaging the West” Map view.

Screenshots for Information Retrieval paper

Etsy Geolocator.

Browsing by geography is not exclusive to scholarly works and crime maps. The commercial website Etsy, where users can buy and sell handmade goods, has several different search and browse methods, one of which will to let you find items on a map. This feature may not help you find a specific item, but it can help you find items made in your own city, therefore supporting your local economy. The geography feature has the added feature of helping sellers find local, dedicated buyers, who can support a small business.

Time

Another kind of existing metadata that can be used to make finding and browsing easier is time. Most objects have some kind of time identifier- either a time stamp added by a computer or recording device (for example, most cameras automatically imprint the time a shot was taken in the metadata) or the date something was created, added later by a scholar.

Mechanically added time stamps are marginally useful for historic objects, but for newer, born digital objects they can be very useful. For example, the aforementioned website Etsy provides another way of browsing items by sorting them by the most recently listed. This could be done through a simple list of items, but Etsy has added a 3-D component and an analog style clock that helps the user browse the items. A photo program called Picasa (offered by Google) sorts photos by date taken in the default view, offering the user a long list of chronologically ordered photos. This view depends on embedded metadata and depends not on the filename or title of a shot, but metadata associated with an object.

Screenshots for Information Retrieval paper

Etsy Time Machine.

Many documents have a date associated with them that indicate when the item was created. For instance, Zotero, a scholarly citation management system, has a field for “date” where the date of an object can be entered. If the date is not supplied in the metadata, the user can add this information. This allows for a new way to view one’s collected resources: a timeline.

zotero timeline

Zotero timeline, showing highlighted words. The Zotero timeline was created with the help of MIT’s SIMILE project.

The timeline interface also allows users to highlight items containing certain words, which lets the user do a quick check on their own sources to answer questions such as “did scholars stop using a certain term after the turn of the century?” This kind of question would be difficult to answer given the traditional list view. Another scholarly example of mapping existing metadata to a timeline is found in the Envisaging the West website, where documents have been mapped to a timeline and color coded. This allows the user to see at a glance where documents fall on the timeline and what types occurred when.

Screenshots for Information Retrieval paper

Envisaging the West timeline view.

Faceted Browsing

A final way that existing metadata might be used is to create a method for drilling down through results via faceted browsing. With this method, information about each item is extracted and offered to the viewer so they can navigate through results with ease. Faceted browsing helps both browsing and finding: the rich metadata offers otherwise new paths to follow, and can also assist in finding a specific item by breaking aspects into categories. A few examples of these systems include a library catalog which uses extensive metadata to allow a user to navigate through items, (See McMaster University Library catalog, below), or a shopping site that allows a user to set a number of controls to find exactly the item they want (See screenshots of Buzzillions and Volkswagen UK sites, below).

Screenshots for Information Retrieval paper

McMaster University catalog, powered by Endeca.

Screenshots for Information Retrieval paper

Buzzillions website, also powered by Endeca. (via Peter Morville’s Flickr Stream)

Screenshots for Information Retrieval paper

Volkswagen UK site. Users can move the sliders to control what cars are shown. (via Peter Morville’s Flickr Stream)

Tomorrow I will explore search that nalyzes a documents’ contents to return results.

« Previous PageNext Page »