Category Archives: Digital History

Digital Humanities Labs and Undergraduate Education

Over the past few months I was lucky enough to do research in Stanford’s Spatial History Lab. Founded three years ago through funding from the Andrew Mellon Foundation, the lab was grown into a multi-faceted space for conducting different projects and initiatives dealing with spatial history. Having worked in the lab as a graduate affiliate over the past nine months as well, I can attest to what a fantastic environment it provides: computers, a range of software, wonderful staff, and an overarching collaborative setting. There are currently 6-8 ongoing projects in various stages at the lab under the direction of faculty and advanced graduate students, which focus on areas ranging from Brazil to Chile to the American West. Over ten weeks this summer, eight undergraduate research assistants worked under these projects. I had the opportunity to work alongside them from start to finish, and came away fully convinced of the potential for this kind of lab setting in furthering undergraduate humanities education.

The eight students ranged from freshman to the recently-graduated, who majored in everything from history to environmental studies to computer science. Some entered the program with technical experience of ArcGIS software; others had none. Each of them worked under an existing project and were expected to both perform traditional RA duties for the project’s director and also develop their own research agenda for the summer. Under this second track, they worked towards the end goal of producing an online publication for the website based on their own original research. Led by a carefully-planned curriculum, they each selected a topic within the first few weeks, conducted research during the bulk of the summer, went through a draft phase followed by a peer-review process, and rolled out a final publication and accompanying visualizations by the end of the ten weeks. Although not all of them reached the final point of publication at the end of that time, by the final tenth week each of them had produced a coherent historical argument or theme (which is often more than I can say about my own work).

The results were quite impressive, especially given the short time frame. For instance, rising fourth-year Michael DeGroot documented and analyzed the shifting national borders in Europe during World War II. Part of his analysis included a dynamic visualization that allows the reader to see major territorial changes between 1938-1945. DeGroot concludes that one major consequence of all of these shifts was the creation of a broadly ethnically homogenous states. In “Wildlife, Neoliberalism, and the Pursuit of Happiness,” Julio Mojica, a rising junior majoring in Anthropology and Science, Technology, and Society, analyzed survey data from the late twentieth-century on the island of Chiloé in order to examine links between low civic participation and environmental degradation. Mojica concludes that reliance on the booming salmon industry resulted in greater tolerance for pollution, a pattern that manifested itself more strongly in urban areas. As a final example, senior history major Cameron Ormsby studied late-19th century land speculation in Fresno County and impressively waded into a historiographical debate over the issue. Instead of speculators serving as necessary “middle-men” between small farmers and the state, Ormsby convincingly argues that they in fact handicapped the development of rural communities.

The success of the summer program speaks not only to the enthusiasm and quality of Stanford undergraduates, but more centrally to the direction of the lab and it’s overall working environment. By fostering an attitude of exploration, creativity, and collaboration, the students were not only encouraged, but expected to participate in projects as intellectual peers. The dynamic in the lab was not a traditional one of a faculty member dictating the agenda for the RA’s. In many cases, the students had far greater technical skills and knew more about their specific subjects than the project instructor. The program was structured to give the student’s flexibility and freedom to develop their own ideas, which placed the onus on them to take a personal stake in the wider projects. In doing so, they were exposed to the joys, challenges, and nitty-gritty details of digital humanities research: false starts and dead-ends were just as important as the pivotal, rewarding “aha!” moments that come with any project. Thinking back on internships or research assistant positions, it’s difficult for me to imagine another undergraduate setting that would encourage this kind of wonderfully productive hand-dirtying process. And while I think digital humanities labs hold great potential for advancing humanities scholarship, I have grown more and more convinced that some of their greatest potential lies in the realm of pedagogy.

Topic Modeling Martha Ballard’s Diary

In A Midwife’s Tale, Laurel Ulrich describes the challenge of analyzing Martha Ballard’s exhaustive diary, which records daily entries over the course of 27 years: “The problem is not that the diary is trivial but that it introduces more stories than can be easily recovered and absorbed.” (25) This fundamental challenge is the one I’ve tried to tackle by analyzing Ballard’s diary using text mining. There are advantages and disadvantages to such an approach – computers are very good at counting the instances of the word “God,” for instance, but less effective at recognizing that “the Author of all my Mercies” should be counted as well. The question remains, how does a reader (computer or human) recognize and conceptualize the recurrent themes that run through nearly 10,000 entries?

One answer lies in topic modeling, a method of computational linguistics that attempts to find words that frequently appear together within a text and then group them into clusters. I was introduced to topic modeling through a separate collaborative project that I’ve been working on under the direction of Matthew Jockers (who also recently topic-modeled posts from Day in the Life of Digital Humanities 2010). Matt, ever-generous and enthusiastic, helped me to install MALLET (Machine Learning for LanguagE ToolkiT), developed by Andrew McCallum at UMass as “a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.” MALLET allows you to feed in a series of text files, which the machine will then process and generate a user-specified number of word clusters it thinks are related topics. I don’t pretend to have a firm grasp on the inner statistical/computational plumbing of how MALLET produces these topics, but in the case of Martha Ballard’s diary, it worked. Beautifully.

With some tinkering, MALLET generated a list of thirty topics comprised of twenty words each, which I then labeled with a descriptive title. Below is a quick sample of what the program “thinks” are some of the topics in the diary:

  • MIDWIFERY: birth deld safe morn receivd calld left cleverly pm labour fine reward arivd infant expected recd shee born patient
  • CHURCH: meeting attended afternoon reverend worship foren mr famely performd vers attend public supper st service lecture discoarst administred supt
  • DEATH: day yesterday informd morn years death ye hear expired expird weak dead las past heard days drowned departed evinn
  • GARDENING: gardin sett worked clear beens corn warm planted matters cucumbers gatherd potatoes plants ou sowd door squash wed seeds
  • SHOPPING: lb made brot bot tea butter sugar carried oz chees pork candles wheat store pr beef spirit churnd flower
  • ILLNESS: unwell mr sick gave dr rainy easier care head neighbor feet relief made throat poorly takeing medisin ts stomach

When I first ran the topic modeler, I was floored. A human being would intuitively lump words like attended, reverend, and worship together based on their meanings. But MALLET is completely unconcerned with the meaning of a word (which is fortunate, given the difficulty of teaching a computer that, in this text, discoarst actually means discoursed). Instead, the program is only concerned with how the words are used in the text, and specifically what words tend to be used similarly.

Besides a remarkably impressive ability to recognize cohesive topics, MALLET also allows us to track those topics across the text. With help from Matt and using the statistical package R, I generated a matrix with each row as a separate diary entry, each column as a separate topic, and each cell as a “score” signaling the relative presence of that topic. For instance, on November 28, 1795, Ballard attended the delivery of Timothy Page’s wife. Consequently, MALLET’s score for the MIDWIFERY topic jumps up significantly on that day. In essence, topic modeling accurately recognized, in a mere 55 words (many abbreviated into a jumbled shorthand), the dominant theme of that entry:

“Clear and pleasant. I am at mr Pages, had another fitt of ye Cramp, not So Severe as that ye night past. mrss Pages illness Came on at Evng and Shee was Deliverd at 11h of a Son which waid 12 lb. I tarried all night She was Some faint a little while after Delivery.”

The power of topic modeling really emerges when we examine thematic trends across the entire diary. As a simple barometer of its effectiveness, I used one of the generated topics that I labeled COLD WEATHER, which included words such as cold, windy, chilly, snowy, and air. When its entry scores are aggregated into months of the year, it shows exactly what one would expect over the course of a typical year:

Cold Weather

As a barometer, this made me a lot more confident in MALLET’s accuracy. From there, I looked at other topics. Two topics seemed to deal largely with HOUSEWORK:

1. house work clear knit wk home wool removd washing kinds pickt helping banking chips taxes picking cleaning pikt pails

2. home clear washt baked cloaths helped washing wash girls pies cleand things room bak kitchen ironed apple seller scolt

When charted over the course of the diary, these two topics trace how frequently Ballard mentions these kinds of daily tasks:

Housework

Both topics moved in tandem, with a high correlation coefficient of 0.83, and both steadily increased as she grew older (excepting a curious divergence in the last several years of the diary). This is somewhat counter-intuitive, as one would think the household responsibilities for an aging grandmother with a large family would decrease over time. Yet this pattern bolsters the argument made by Ulrich in A Midwife’s Tale, in which she points out that the first half of the diary was “written when her family’s productive power was at its height.” (285) As her children married and moved into different households, and her own husband experienced mounting legal and financial troubles, her daily burdens around the house increased. Topic modeling allows us to quantify and visualize this pattern, a pattern not immediately visible to a human reader.

Even more significantly, topic modeling allows us a glimpse not only into Martha’s tangible world (such as weather or housework topics), but also into her abstract world. One topic in particular leaped out at me:

feel husband unwel warm feeble felt god great fatagud fatagued thro life time year dear rose famely bu good

The most descriptive label I could assign this topic would be EMOTION – a tricky and elusive concept for humans to analyze, much less computers. Yet MALLET did a largely impressive job in identifying when Ballard was discussing her emotional state. How does this topic appear over the course of the diary?

Emotion

Like the housework topic, there is a broad increase over time. In this chart, the sharp changes are quite revealing. In particular, we see Martha more than double her use of EMOTION words between 1803 and 1804. What exactly was going on in her life at this time? Quite a bit. Her husband was imprisoned for debt and her son was indicted by a grand jury for fraud, causing a cascade effect on Martha’s own life – all of which Ulrich describes as “the family tumults of 1804-1805.” (285) Little wonder that Ballard increasingly invoked “God” or felt “fatagued” during this period.

I am absolutely intrigued by the potential for topic modeling in historic source material. In many ways, it seems that Martha Ballard’s diary is ideally suited for this kind of analysis. Short, content-driven entries that usually touch upon a limited number of topics appear to produce remarkably cohesive and accurate topics. In some cases (especially in the case of the EMOTION topic), MALLET did a better job of grouping words than a human reader. But the biggest advantage lies in its ability to extract unseen patterns in word usage. For instance, I would not have thought that the words “informed” or “hear” would cluster so strongly into the DEATH topic. But they do, and not only that, they do so more strongly within that topic than the words dead, expired, or departed. This speaks volumes about the spread of information – in Martha Ballard’s diary, death is largely written about in the context of news being disseminated through face-to-face interactions. When used in conjunction with traditional close reading of the diary and other forms of text mining (for instance, charting Ballard’s social network), topic modeling offers a new and valuable way of interpreting the source material.

I’ll end my post with a topic near and dear to Martha Ballard’s heart: her garden. To a greater degree than any other topic, GARDENING words boast incredible thematic cohesion (gardin sett worked clear beens corn warm planted matters cucumbers gatherd potatoes plants ou sowd door squash wed seeds) and over the course of the diary’s average year they also beautifully depict the fingerprint of Maine’s seasonal cycles:

Gardening


Note: this post is part of an ongoing series detailing my work on text mining Martha Ballard’s diary.

Chasing the “Perfect Data” Dragon

Whenever I put on my proselytizing robes to explain the potential of digital humanities to a layperson, I usually point towards the usual data deluge trope. “If you read a book a day for the rest of your life, it would take you 30-something lifetimes to read one million books. Google has already digitized several times that number.” etc. etc. The picture I end up painting is one where the DH community is better-positioned than traditional academics to access, manipulate, and draw out meaning from the growing mountains of digital data. Basically, now that all this information is digitized, we can feed the 1′s and 0′s into a machine and, presto, innovative scholarship.

Of course, my proselytizing is a bit disingenuous. The dirty little secret is that not all data is created equal. And especially within the humanist’s turf, most digitized sources are rarely “machine-ready”. The more projects I work on, the more and more convinced I become that there is one real constant to them: I always spend far more time than I expect preparing, cleaning, and improving my data. Why? Because I can.

A crucial advantage to digital information is that it’s dynamic and malleable. You can clean up a book’s XML tags, or tweak the coordinates of a georectified map, or expand the shorthand abbreviations in a digitized letter. Which is all well and good, but comes with a pricetag. In a way that is fundamentally different from the analog world, perfection is theoretically attainable. And that’s where an addictive element creeps into the picture. When you can see mistakes and know you can fix them, the temptation to both find and fix every single one is overwhelming.

In many respects, cleaning your data is absolutely crucial to good scholarship. The historian reading an 18th-century newspaper might know that “Gorge Washington” refers to the first president of the United States, but unless the spelling error gets fixed, that name probably won’t get identified correctly by a computer. Of course, it’s relatively easy to change “Gorge” to “George”, but what happens when you are working with 30,000 newspaper pages? Manually going through and fixing spelling mistakes (or, more likely, OCR mistakes) defeats the purpose and neuters the advantage of large-scale text mining. While there are ways to automate this kind of data cleaning, most methods are going to be surprisingly time-intensive. And once you start down the path of data cleaning, it can turn into whack-a-mole, with five “Thoms Jefferson”s poking their heads up out of the hole for every one “Gorge Washington” you fix.

Chasing the “perfect data” dragon becomes an addictive cycle, one fueled by equal parts optimism and fear. Having a set of flawlessly-encoded Gothic novels could very well lead to the next big breakthrough in genre classification. On the other hand, what if all those missed “Gorge Washingtons” are the final puzzle pieces that will illuminate early popular conceptions of presidential power? The problem is compounded by the fact that, in many cases, the specific errors can be fixed. But in breathlessly attempting to meet the “data deluge” problem, the number and kind of specific errors get multiplied by several orders of magnitude over increasingly larger and larger bodies of information and material – which severely complicates the ability to both locate and rectify all of them.

At some point, the digital material has to simply be “good enough”. But breaking out of the “perfect data” dragon-chasing is easier said than done. “How accurate does my dataset have to be to in order to be statistically relevant?” “How do I even know how clean my data actually is?” “How many hours of my time is it worth to bump up the data accuracy from 96% to 98%?” These are the kinds of questions that DH researchers suddenly struggle with – questions that a background in the humanities ill-prepares them to answer. Just like so many aspects of doing this kind of work, there is a lot to learn from other disciplines.

Certain kinds of data quality issues get mitigated by the “safety in numbers” approach. Pinpointing the exact cross-streets of a rail depot is pretty important if you’re creating a map of a small city. But if you’re looking at all the rail depots in, say, the Midwest, the “good enough” degree of locational error gets substantially bigger. Over the course of thirty million words, the number of “George Washingtons” are going to far outweigh and balance out the number of “Gorge Washingtons”. With large-scale digital projects, it’s easier to see that chasing the “perfect data” dragon is both impossible and unnecessary. On the other hand, certain kinds of data quality problems get magnified with a larger scale. Small discrepancies get flattened out with bigger datasets. But foundational or commonly-repeated errors get exaggerated with a larger dataset, particularly if some errors have been fixed and others not. For instance, if you fixed every “Gorge Washington” but didn’t catch the more frequently misspelled “Thoms Jefferson”, comparing the textual appearances of the two presidents over those thirty million words is going to be heavily skewed in George’s direction.

As non-humanities scholars have been demonstrating for years, these problems aren’t new and they aren’t unmanageable. But as digital humanists sort through larger and larger sets of data, it will become increasingly important to know when to ignore the dragon and when to give chase.

Valley of the Shadow and the Digital Database

Since its inception as a website in the early 1990s, the digital history project Valley of the Shadow has received awards from the American Historical Association, been profiled in Wired Magazine, and termed a “milestone in American historiography” in Reviews in American History. The project is also widely regarded as one of the principal pioneers within the rough-and-tumble wilderness of early digital history.1 Conceived at the University of Virginia as the brainchild of Edward Ayers (historian of the American South and now president of University of Richmond), the project examines two communities, one Northern and one Southern, in the Shenandoah Valley during the American Civil War. The initiative documented and digitized thousands upon thousands of primary source materials from Franklin County, Pennsylvania and Augusta County, Virginia, including letters, diaries, newspapers, speeches, census and government records, maps, images, and church records.

By any measure, Valley of the Shadow has been a phenomenal success. Over the course of a decade and a half, it has provided the catalyst for a host of books, essays, CD-ROM’s, teaching aids, and articles – not to mention more than a few careers. At times it seems that everyone and their mother in the digital history world has some kind of connection to Valley of the Shadow. The impact the project has had, both within and outside of the academy, is a bit overwhelming. In this light, I decided to revisit Valley of the Shadow with a more critical lens and examine how it has held up over the years.

At the bottom of the Valley‘s portal, it reads “Copyright 1993-2007.” There aren’t many academic sites that can claim that kind of longevity, but this also carries a price. In short, the website already feels a bit dated. The structure of the website is linear, vertical, and tree-like. The parent portal opens up into a choice between three separated sections: The Eve of War (Fall 1859 – Spring 1861), The War Years (Spring 1861 – Spring 1865), and The Aftermath (Spring 1865 – Spring 1870). Each of these are divided into different repositories of source material, from church records to tax and census data to battle maps. Clicking on a repository leads to different links (for instance, two links leading to the two counties’ letters). A few more clicks can lead to, say, a letter from Benjamin Franklin Cochran to his mother in which he leads off with the delicious detail of lived experience that historians love: “I am now writing on a bucket turned wrong side up.”

In this sense, the database is geared towards a vertical experience, in which users “drill down” (largely through hyperlinks) to reach a fine-grained level of detail: Portal -> Time Period -> Source Material Type -> County -> Letter. What this approach lacks is the kind of flexible, horizontal experience that has become a hallmark of today’s online user experience. If one wanted to jump from Cochran’s letter to see, for instance, battle maps of the skirmishes he was referencing or if local newspapers described any of the events he wrote about, the process is disjointed, requiring the user to “drill up” to the appropriate level and then “drill down” again to find battle maps or newspapers. This emphasis on verticality is largely due to the partitioned nature of the website, divided as it is into so many boxed categories. This makes finding a specific source a bit easier, but restricts the exploratory ability of a user to cross boundaries between the sites’ different eras, geography, and source types.

If different sections of the website are partitioned from one another, what kind of options exist for opening the database itself beyond the websites own walls? In October of 2009, NiCHE held a conference on Application Programming Interfaces (APIs) for the Digital Humanities, with the problem it was tackling outlined as follows:

To date, however, most of these resources have been developed with human-friendly web interfaces. This makes it easy for individual researchers to access material from one site at a time, while hindering the kind of machine-to-machine exchange that is required for record linkage across repositories, text and data mining initiatives, geospatial analysis, advanced visualization, or social computing.

This description highlights the major weakness of Valley of the Shadow: its (relative) lack of interactiveness and interoperability. A human researcher can access specific information from the website, but it remains a major challenge to employ more advanced digital research techniques on that information. Every database is inherently incomplete. But one way to mitigate this problem is to open up the contents of a database beyond the confines of the database itself. The following scenario might fall under the “pipe-dream” category, but it illustrates the potential for an online database: a researcher writes a programming script to pull out every letter in Valley of the Shadow written by John Taggart, search both the Valley‘s database and national census records in order to identify the letters’s recipients, capture each household’s location and income level, and use that data to plot Taggart’s social world on a geo-referenced historical map or in a noded social network visualization. Again, this might be a pipe-dream, but it does highlight the possibilities for opening up Valley of the Shadow‘s phenomenally rich historical content into a more interactive and interoperable database.

At the end of the day, Valley of the Shadow deserves every ounce of acclaim it has received. Beyond making a staggering array of primary sources available and accessible to researchers, educators, and students, it helped pave the way for the current generation of digital humanists. Valley of the Shadow embodies many of the tenets of this kind of scholarship: multi-modal, innovative, and most importantly, collaborative. Its longevity and success speaks to the potential of digital history projects, and should continue to serve as a resource and model moving forward.


1 I, for one, imagine the early days of digital history to be a rough-and-tumble wilderness, resplendent with modem-wrangling Mosaic cowboys and Usenet bandits.

Text Analysis of Martha Ballard’s Diary (Part 3)

One of the most basic applications of text mining is simply counting words. I began by stripping out punctuation (in order to avoid differentiating mend and mend. as two separate words), put every word into lowercase, and then ignored a list of stop words (the, and, for, etc.). By writing a program to count occurrences of the 500 most common words, I could get a general (and more quantitative) sense for what general topics Martha Ballard wrote about in her diary. Unsurprisingly, her vocabulary usage followed a standard path of exponential decay: like most people, she utilized a relatively small number of words with extreme frequency. For example, the most common word (mr) occurred 10,050 times, while her 500th most common word (relief) occurred 67 times:

Top500Words

Because each word has information attached to it – specifically what date it was written – we can look at long-term patterns for a particular word’s usage. However, looking at only raw word frequencies can be problematic. For example, if Ballard wrote the word yarn twice as often in 1801 as 1791, it could mean that she was doing a lot more knitting in her old age. But it could also mean that she was writing a lot more words in her diary overall. In order to address this issue, for any word I was examining I made sure to normalize its frequency – first by dividing it by the total word count for that year, then by dividing it by the average usage of the word over the entire diary. This allowed me to visualize how a word’s relative frequency changed from year to year.

In order to visualize the information, I settled on trying out sparklines: “small, intense, simple datawords” advocated by infographics guru Edward Tufte and meant to give a quick, somewhat qualitative snapshot of information. To test my method, I used a theme that Laurel Ulrich describes in A Midwife’s Tale: land surveying. In particular, during the late 1790s Martha’s husband Ephraim became heavily involved in surveying property. In the raw word count list, both survey and surveying appear in the top 500 words, so I combined the two and looked at how Martha’s use of them in her diary changed over the years (1785-1812):

survey_surveying survey(ing)

Looking at the sparkline, we get a visual sense for when surveying played a larger role in Martha’s diary – around the middle third, or roughly 1795-1805, which corresponds relatively well to Ulrich’s description of Ephraim’s surveying adventures. As a basis for comparison, the word clear appeared with numbing regularity (almost always in reference to the weather):

clear clear

Using word frequencies and sparklines, I could investigate and visualize other themes in the diary as well.

Religion

Out of the 500 most frequent words in the diary, only three of them relate directly to religion: meeting (#28), worship (#143), and god (#220).

meeting meeting

worship worship

god god

Meeting, which was used largely in a religious context (going to a church meeting), but also in a socio-political context (attending town meetings), had a relatively consistent rate of use, although it trended slightly upwards over time. Worship (which Martha largely used in the sense of “went to publick worship”), meanwhile, was more erratic and trended slightly downwards. Finally, and perhaps most interestingly, was Martha’s use of the word god. Almost non-existent in the first third of her diary, it then occurred much more frequently, but also more erratically over the final two-thirds of the diary. Not only was it a relatively infrequent word overall (flax, horse, and apples occur more often), but its usage pattern suggests that Martha Ballard did not directly invoke a higher power on a personal level with any kind of regularity (at least in her diary). Instead, she was much more comfortable referring to the more socially and community-based activity of attending a religious service. While a qualitative close reading of the text would give a richer impression of Martha’s spirituality, a quantitative approach demonstrates how little “real estate” she dedicates to religious themes in her diary.

Death

death death

dead dead

funeral funeral

expired expired

interd interd

Most of the words related to death show an erratic pattern. There are peaks and valleys across the years without much correlation between the different words, and the only word that appears with any kind of consistency is interd (interred). In this case, word frequency and sparklines are relatively weak as an analytical tool. They don’t speak to any kind of coherent pattern, and at most they vaguely point towards additional questions for study – what causes the various extreme peaks in usage? Is there a common context with which Martha uses each of the words? Why was interd so much flatter than the others?

Family

In this final section, I’ll offer up a small taste of how analyzing word frequency can reveal interpersonal relationships. I used the particular example of Dolly (Martha’s youngest daughter):

dolly dolly

The sparkline does a phenomenal job of driving home a drastic change in how Martha refers to her daughter. In a matter of a year or two in the mid 1790s, she goes from writing about Dolly frequently to almost never mentioning her. Why? Some quick detective work (or reading page 145 in A Midwife’s Tale) shows that the plummet coincides almost perfectly with Dolly’s marriage to a man named Barnabas Lambart in 1795. But why on earth would Martha go from mentioning Dolly all the time in her diary to going entire years without writing her name? Did Martha disapprove of her daughter’s marriage? Was it a shotgun wedding?

The answer, while not so scandalous, is an interesting one nonetheless that text analysis and visualization helps to elucidate. In short, Martha still writes about her daughter after 1795, but instead of referring to her as Dolly, she begins to refer to her as Dagt Lambd (Daughter Lambert). This is a fascinating shift, and one whose full significance might get lost by a traditional reading. A human poring over these detailed entries might get a vague impression that Martha has started calling her daughter something different, but the sparkline above drives home just how abrupt and dramatic that transformation really was. Martha, by and large, stopped calling her youngest daughter by her first name and instead adopted the new husband’s proper name. Such a vivid symbolic shift opens up a window into an array of broader issues, including marriage patterns, familial relationships, and gender dynamics.

Conclusions

Counting word frequency is a somewhat blunt instrument that, if used carefully, can certainly yield meaningful results. In particular, utilizing sparklines to visualize individual word frequencies offers up two advantages for historical inquiry:

  1. Coherently display general trends
  2. Reveal outliers and anomalies

First, sparklines are a great way to get a quick impression of how a word’s use changes over time. For example, we can see above that the frequency of the word expired steadily increases throughout the diary. While this can often simply reiterate suspected trends, it can ground these hunches in refreshingly hard data. By the end of the diary, a reader might have a general sense for how certain themes appear, but a text analysis can visualize meaningful patterns and augment a close reading of the text.

Second, sparklines can vividly reveal outliers. In the course of reading hundreds of thousands of words over the course of nearly 10,000 entries, it’s quite easy to lose sight of the forest for the trees (to use a tired metaphor). Visualizing word frequencies allows historians to gain a broader perspective on a piece of the text, and they also act as signposts pointing the viewer towards a specific area for further investigation (such the red-flag-raising rupture in how frequently Dolly appears). Relatively basic word frequency by itself (such as what I’ve done here) does not necessarily explain anomalies, but it can do an impressive job of highlighting important ones.

Text Analysis of Martha Ballard’s Diary (Part 2)

Given Martha Ballard’s profession as a midwife, it is no surprise that she carefully recorded the 814 births she attended between 1785 and 1812. These events were given precedence over more mundane occurrences by noting them in a separate column from the main entry. Doing so allowed her to keep track not only of the births, but also record payments and restitution for her work. These hundreds of births constituted one of the bedrocks of Ballard’s experience as a skilled and prolific midwife, and this is reflected in her diary.

As births were such a consistent and methodically recorded theme in Ballard’s life, I decided to begin my programming with a basic examination of the deliveries she attended. This examination would take the form of counting the number of deliveries throughout the course of the diary and grouping them by various time-related characteristics, namely: year, month, and day of the week.

Process and Results

The first basic step for performing a more detailed text analysis of Martha Ballard’s diary was to begin cleaning up the data. One step was to take all the words and (temporarily) turn every uppercase letter into a lowercase letter. This kept Python from seeing “Birth” and “birth” as two separate words. For the purposes of this particular program, it was more important to distill words into a basic unit rather than maintain the complexity of capitalized characters.

Once the data was scrubbed, we could turn to writing a program that would count the number of deliveries recorded in the diary. The program we wrote does the following:

  1. Checks to see if Ballard wrote anything in the “birth” column (the first column of the entries that she also used to keep track of deliveries)
  2. If she did write anything in that column, check to see if it contains any of the words: “birth”, “brt”, or “born”.
  3. I then printed the remainder of the entries that contained text in the “birth” column but did not contain one of the above words. From this short list I manually added an additional seven entries into the program, in which she appeared to have attended a delivery but did not record it using the above words.

Using these parameters, the program could iterate through the text and recognize the occurrence of a delivery. Now we could begin to organize these births.

First, we returned the birth counts for each year of the diary, which were then inserted into a table and charted in Excel:

Year Deliveries

At the risk of turning my analysis into a John Henry-esque woman vs. machine, I compared my figures to the chart that Laurel Ulrich created in A Midwife’s Tale that tallied the births Ballard attended (on page 232 of the soft-cover edition). The two charts follow the same broad pattern:

YearDeliveriesCompare

Note: I reverse-built her chart by creating a table from the printed chart, then making my own bar graph. Somewhere in the translation I seem to have misplaced one of the deliveries (Ulrich lists 814 total, whereas I keep counting 813 on her graph). Sorry!

However, a closer look reveals small discrepancies in the numbers for each individual year. I calculated each year’s discrepancy as follows, using Ulrich’s numbers as the “true” figures (she is the acting President of the AHA, after all) from which my own figures deviated, and found that the average deviation for a given year was 4.86%. Apologies for the poor formatting, I had trouble inserting tables into WordPress:

Year Deliveries Count Difference Deviation (from Ulrich)
Manual (Ulrich) Computer Program
1785 28 24 4 14.29%
1786 33 35 2 6.06%
1787 33 33 0 0.00%
1788 27 28 1 3.70%
1789 40 43 3 7.50%
1790 34 35 1 2.94%
1791 39 39 0 0.00%
1792 41 43 2 4.88%
1793 53 50 3 5.66%
1794 48 48 0 0.00%
1795 50 55 5 10.00%
1796 59 56 3 5.08%
1797 54 55 1 1.85%
1798 38 38 0 0.00%
1799 50 51 1 2.00%
1800 27 23 4 14.81%
1801 18 14 4 22.22%
1802 11 12 1 9.09%
1803 19 18 1 5.26%
1804 11 11 0 0.00%
1805 8 8 0 0.00%
1806 10 11 1 10.00%
1807 13 13 0 0.00%
1808 3 3 0 0.00%
1809 21 22 1 4.76%
1810 17 18 1 5.88%
1811 14 14 0 0.00%
1812 14 14 0 0.00%

Keeping the knowledge in the back of my mind that my birth analysis differed slightly from Ulrich’s, I went on to compare my figures with other factors, including the frequency of deliveries by month over the course of the diary.

MonthDeliveries

If we extend the results of this chart and assume a standard nine-month pregnancy, we can also determine roughly which months that Ballard’s neighbors were most likely to be having sex. Unsurprisingly, the warmer period between May and August appears to be a particularly fertile time:

Conceptions

Finally, I looked at how often births occurred on different days of the week. There wasn’t a strong pattern, beyond the fact that Sunday and Thursday seemed to be abnormally common days for deliveries. I’m not sure why that was the case, but would love to hear speculation from any readers.

DeliveriesDayWeek

Analysis

The discrepancies between the program’s tally of deliveries and Ulrich’s delivery count speak to broader issues in “digital” text mining versus “manual” text mining:

Data Quality

Ulrich’s analysis is a result of countless hours spent eye-to-page with the original text. And as every history teacher drills into their students when conducting research, looking directly at the primary documents minimizes the degrees of interpretation that can alter the original documents.  In comparison, my analysis is the result of the original text going through several levels of transformation, like a game of telephone:

Original text -> Typed transcription -> HTML tables -> Python list -> Text file -> Excel table/chart

Each level increases the chance of a mistake.  For instance, a quick manual examination using the online version of the diary for 1785 finds an instance of a delivery (marked by ‘Birth’) showing up in the online HTML, but which does not appear in the “raw” HTML files our program is processing and analyzing.

On the other hand, a machine doesn’t get tired and miscount a word tally or accidently skip an entry.

Context

Ulrich brings to bear on the her textual analysis years of historical training and experience along with a deeply intimate understanding of Ballard’s diary. This allows her to take into account one of the most important aspects of reading a document: context. Meanwhile, our program’s ability to understand context is limited quite specifically to the criteria we use to build it. If Ballard attended a delivery but did not mark it in the standard “birth” column like the others, she might mention it more subtly in the main body of the entry. Whereas Ulrich could recognize this and count it as a delivery, our program cannot (at least with the current criteria).

Where the “traditional” skills of a historian come into play with data mining is in the arena of defining these criteria. Using her understanding of the text on a traditional level, Ulrich could create far, far superior criteria than I could for counting the number of deliveries Martha Ballard attends. The trick comes in translating a historian’s instinctual eye into a carefully spelled-out list of criteria for the program.

Revision

One area that is advantageous for digital text mining is that of revising the program. Hypothetically, if I realized at a later point that Ballard was also tallying births using another method (maybe a different abbreviated word), it’s fairly simple to add this to the program’s criteria, hit the “Run” button, and immediately see the updated figures for the number of deliveries. In contrast, it would be much, much more difficult to do so manually, especially if the realization came at, say, entry number 7,819. The prospect of re-skimming thousands of entries to update your totals would be fairly daunting.

Text Analysis of Martha Ballard’s Diary (Part 1)

“mr Ballard left home bound for Oxford. I had been Sick with the Collic. mrs Savage went home. mrs foster Came at Evening. it snowd a little.”

This is the first entry in the diary of Martha Ballard. Martha Ballard was a rural Maine midwife who kept an extensive diary between 1785 and 1812 and whose life was immortalized in 1990 by the historian Laurel Thatcher Ulrich‘s award-winning A Midwife’s Tale. Over the course of three decades, Ballard kept a meticulous, near-daily accounting of her life spanning over 10,000 entries.

When reading A Midwife’s Tale, I was struck by how readily the text would seem to lend itself to digital analysis. In an interview, Ulrich noted, “The very thing that had attracted me to the diary in the first place was also the thing that made it difficult to work with. I mean there’s just so much.” To ground herself, she began by simply counting things: “And I would go day by day for every other year of the diary, and I would tick off what was in each entry: baking or brewing, spinning or washing, or trading, sewing, mending, deliveries, general medical accounts, going to church, visitors, people coming for meals, etc.” Because of the sprawling scope, she took this quantitative approach only for the even-numbered years in the diary. The fact that she was working in the late eighties without a computer makes her work even more impressive.

After poking around online I came across DoHistory.org, a website developed and maintained by the Film Study Center at Harvard University and hosted by (who else, really) George Mason’s CHNM. The website presents the diary to the public in two formats: the viewer can either browse through photographed pages of the diary or read the transcript of the pages (transcribed through a monumental effort by Robert R. McCausland and Cynthia MacAlman McCausland):

ballardpage1 ballardpage1text

When I realized the entire diary was online, it got me thinking about possibilities for text mining. As an aspiring digital humanist with little “hard” skills beyond basic GIS, I had been meaning to learn how to program for quite some time. In Martha Ballard’s diary, I had an intriguing source of data with which to learn how to do so. Now I just had to learn how to program. With the patient help of several programming-savvy family members, I gradually learned the basics of Python and how to apply it to Martha Ballard’s diary. What follows are the first steps we took to process the diary’s raw data into an accessible digital format.

Process

At first, I briefly considered learning how to scrape the text of the diary off the website. After some investigation, I decided that was a little beyond my abilities, so I copped out to the much easier route of sending an email to Kelly Schrum at CHNM, who kindly forwarded my request to Ammon Shepherd, who emailed me a zip file containing 1,431 html documents, one for each page of the diary. The html files of the transcribed diary are a basic, 3-column table that look this. My first step was to find a way to strip out the html tags and organize the text into a systematic database of individual entries. Fortunately, Ballard’s meticulousness and consistency lent itself well to such an approach.

The diary’s format translates quite nicely into creating a list of lists – the “main” diary being a list of all the entries, and each entry being a list in and of itself. The first program we wrote was to open each html file and begin extracting the different sections of text (which were conveniently marked by html tags). Iterating through each entry allowed us to separate the different columns in her diary into different items in the list. Here is the breakdown of our “list of lists”:

  1. Diary
    1. Entry
      1. Date
        1. Month
        2. Day
        3. Year
      2. Day of the Week
      3. Main Text of Entry
      4. Day Summaries (Column 3 of actual diary entry)
      5. Birth(s) (Recorded in Column 1 of actual diary entry)

In creating the list, we had to separate out the raw data from the html tags that formatted it. Fortunately, the folks who built the html files originally used an extremely systematic formatting process that actually made the job of distilling one from the other quite straightforward. A Python module called Pickle allowed us to export the list of entries as a manageable single file that we could then easily import into future programs to manipulate.

For example, the third entry in the diary would translate a bit into something like this:

  1. Diary
    1. Entry (3)

      1. Date
        1. 1 (January)
        2. 3
        3. 1785
    2. 3 (Tuesday – Ballard numbered the weekdays, beginning with Sunday as 1)
    3. “Tuesday. mrs. Foster went home. I had threats of thee Collic; by takein peper found releif.”
    4. Empty
    5. Empty

The list allows us to access pieces of information by “calling” their position. It helped me to think of the entire diary list as a warehouse containing almost 10,000 boxes (entries) inside it, with each box containing five compartments, with the first of those compartments divided into three sub-compartments. If you were to open any of the boxes (entries) and look inside the first compartment, then inside sub-compartment number two, you would always find a number that represented the month of that particular entry. If you were to look inside the third compartment of the entry/box, you would always find the main text for that day’s entry.

The advantages of setting up the data in a list structure is the ability to access these specific pieces of information easily and to compare them across entries. In many ways, processing the text to make it readable and programmable is one of the biggest challenges to text mining. Deciding on the most logical way to organize and break down over 1,400 files will lay the groundwork for the fun part: writing programs to actually analyze the diary of Martha Ballard.

***Special-edition sneak preview of future posts in this series***

A simple counting program reveals that the main text of Martha Ballard’s diary alone contains 377,315 words, spanning I-couldn’t-make-this-number-up 9,999 entries. That is a lot of data to play with.

Separating from the Pack

Almost two years ago, I made the decision to go to graduate school. At the time I was basking in what could only be called a history nerd’s dream summer break, spending my workdays as an intern at the New-York Historical Society‘s public programs department doing background research on professors and authors we could invite to give talks. While learning about traditional history occupied my days, learning about digital history began to occupy my evenings (at least when I wasn’t occupied with being a 21 year-old enjoying NYC). I had used GIS extensively the previous summer through a research project and had caught passing glimpses of the broader digital history universe, but I hadn’t fully explored it as a possibility for future study or (gasp) a future career. By the end of the summer, I had realized with a crystal-eyed clarity that digital history is what I wanted to “do” – in the airplane conversation sense of, “So, what do you do?”

By the time I began the actual application process, things had become even clearer, but certainly not easier. I loved history – I loved reading it, researching it, writing it, speaking it, teaching it. The idea that I could potentially spend my life doing these things made me embarrassingly giddy. At the same time, I was endlessly fascinated by the potential that lay in digital scholarship as an exciting frontier with seemingly limitless possibilities. When I sat down to my computer to start looking at schools, I began to feel the intense tug-of-war between these two impulses that would become a constant throughout the next nine months.

On the one side of the spectrum lay the traditional academic program, ivy-wrapped in prestige and brimming with names that fly off the jackets of some of my favorite books. On the other side lay the digital history program, sleekly packaged in technology and humming with voices that build the blogs and websites I trawl. On the one side, my college professors and the academic job market counseling me to apply to the very best schools I could. On the other side, my own geeky impulses were urging me to take a chance and apply somewhere new and different and exciting. I did my best to split the difference, and in the end, I was lucky enough to be accepted to a school with a fantastic combination of these two sides.

My experience led me to the conclusion that just as digital methodology is shifting the scholarly landscape of historical study, it is also altering the competitive landscape of the field. Schools are rapidly carving out digital niches for themselves, and this will prove increasingly attractive to successive waves of graduate applicants and job candidates. Most of these individuals will be reliant on accessing databases and articles online, many will be familiar with new forms of media and technology, and some will be interested in areas of visual design, data mining, or spatial analysis. However, if any of them were to ask their advisor or mentor for suggestions on programs that are strong in digital history, they’d likely hear a one-name (if any) reply: George Mason. Most advisors wouldn’t be able to point towards Nebraska-Lincoln or University of Virginia as digital history strongholds, the same way they would be able to point towards Duke and North Carolina for their strength in African-American history.

This will change. At first glance, the general structure will remain the same. “Top-tier” schools aren’t likely to start hemorrhaging applicants to less-established programs immediately. Innovative schools like UNL will continue to fight the persistent prestige-and-name-recognition battle. Nonetheless, subtler transformations will occur. Even five years ago, it would have been inconceivable that a school like George Mason (whose doctoral program has only been in existence for eight years) would be able to compete on an even footing for history applicants with a school like Stanford. Now, their success in establishing themselves as the dominant industry leader gives them an unparalleled advantage for anyone interested in digital history.

Even for schools on traditionally similar footing, an established track record of integrating new media can easily tip the scales in their favor. Much like special collections and on-campus archives, showing off a sparkling digital infrastructure will emerge as a “sexy” way to pull in both applicants and candidates. Wealthier schools may begin to invest in humanities computing centers and new kinds of software, even if for the simple goal of keeping pace with their competitors. The day is not far off when the mainstream academic history-verse buzzes with the news that an ivy-league school has “poached” a leading history professor in media and technology. Grant proposals for more applicable digital initiatives will cut out bigger slices from an expanding NEH pie.

The schools that will truly separate themselves from the pack, however, will be the ones that demonstrate their support for digital scholarship on an ideological level.  Those programs that establish a sustained committment to encourage, guide, and reward the members of their department (both faculty and students) for digital methodological inquiry will be the ones that will emerge in the best position to attract and train historians eager to tackle the technological opportunities inherent in today’s world.

The Mobile Historian

The rocketing ascent of mobile technology was one of the fundamental shifts of 2008, and many market analysts predict it will only continue throughout 2009. Its rise seems to be following a two-tracked progression: individuals in developing countries are latching onto increasingly affordable mobile phones as a way to log in to a wider network, while wealthier consumers fascinated by the ability to take their online experience on-the-go are snatching up smartphones at a shocking rate (to the point where the smartphone industry appears to be recession resistant). This environment creates an intriguing medium for historians to refine and improve their craft, and the time is ripe for innovation.

Some historians have been leading the charge in utilizing this technology. Bill Turkel has been a pioneer in applying new methods in place-based computing to the field of history. Meanwhile, the majority of similar efforts fall under the sphere of public history. Some museums have long been experimenting with “electronic curators,” or hand-held audio devices that emit information about an aspect of the exhibit depending on where its carrier is standing. Cultural heritage sites, particularly battlefields and/or national parks, have quickly recognized the potential for GPS-enabled devices that guide visitors through a site. Finally, some history educators are experimenting with ways to engage their students using portable technology, including fieldwork and visitations.

Dave Lester, of George Mason University’s CHNM, presented “Mobile Historical Landscapes: Exposing and Crowdsourcing Historical Landmarks” in early April at the American Association for History and Computing conference. Dave’s is currently working on a project called HistoryPlot to encourage user participation in exploring and contributing to a knowledge bank of historical places. The idea is that roving bands of history enthusiasts could visit sites, pull out their iPhone, learn about some of its history, and possibly add both information and multimedia to the site by snapping pictures and/or uploading content – creating a kind of Yelp for the historically-minded. Dave’s project draws upon two specific advantages: 1) the participatory culture of crowdsourcing, and 2) the increasing ubiquitousness of mobile technology

Dan Cohen recently explored the advantage of crowdsourcing when he posted a historical puzzle on his blog at the start of a presentation, which asked people to identify the following picture using minimal clues:

He simultaneously sent out the puzzle via Twitter by asking his 1,600 followers to try to solve it in the next hour. The speed with which Dan got answers was impressive, with an initial correct answer coming in 9 minutes. Although he admits he should have made the puzzle a bit more difficult, the process was successful in highlighting the immense advantages of crowdsourcing historical problems using a fluid and mobile platform such as Twitter.

The growth of a mobile culture in which users are constantly connected magnifies the power of crowdsourcing. Dan’s experiment rested on the assumption that a certain number of his followers would be online and checking their tweets, and enough of them would then be able to use the internet to access his blog, read the clue, and search for the answer online. Two or three years ago, the chances of receiving an answer in 9 minutes would be much, much slimmer. A mobile culture removes barriers to accessing information, and simultaneously increases users’ expectations for accessing that information, many of whom no longer tolerate being shackled by outlets, ethernet cords, or wireless signals.

Consequently, mobile technology is redefining our social conception of space and place, and this has corresponding ramifications for historians. It revisits the fundamental relationship between a physical location and what happened in the past within that space, a relationship with which spatial and geographic historians continuously grapple. This shift is opening up a two-way street for historical researchers. On the one hand, a mobile culture allows efforts such as Dave Lester’s to shed light on previously inaccessible areas. Suddenly, a historian researching a far-away site might be able to “travel” there by looking at uploaded pictures and documents, trading emails or tweets with other researchers who have visited the place, or watching the video of a history enthusiast on vacation at the site.

On the other hand, those shifting expectations that accompany a mobile culture can also turn themselves on historical researchers. A mobile society might question the reliability of a solitary historian writing abstractly about a place they have never actually been to. A constantly connected audience will start to expect the kind of intimate access and exploration that can only be gained from hands-on visitation. A readership conditioned to read reviews on Amazon or tourists’ travel blogs will increasingly dismiss the authority of a specialist who has never visited a location they describe, even if they are describing its past. Audiences will continue to tolerate a historian’s inability to time-travel; they will not continue to tolerate an inability to place-travel.

Fortunately, mobile technology can also create a mobile historian. Imagine a historian writing about shifting gender roles on the Oklahoma Chickasaw reservation during the Dust Bowl. Armed with a laptop, digital camera, and smartphone, the historian can travel to Oklahoma and go to the reservation itself. Once there, traditional archival research is greatly enhanced by technology. Instead of lugging around 3×5 index cards, Zotero can speed up and digitize the note-taking process. The digital camera can capture documents for later perusal, allowing them to find more sources in a shorter amount of time. Is the researcher suddenly curious about gender demographics for a particular town near the reservation, or wants to understand the background to a religious ceremony referenced in a court record? They can use their smartphone to look up census data or send out queries to colleagues likely receive a rapid answer to their question.

Leaving the archives, the historian can dip into oral history by interviewing locals and recording their memories on the smartphone or digital recorder. The smartphone’s GPS capabilities allow him or her to not only locate the homes of the interviewees, but to flag and mark locations to look for spatial patterns at a later date – what if all the traditional “male” venues on a reservation were located on a specific street, while “female” venues were spread over a greater area? The GPS ability of a smartphone can capture these on-the-ground patterns. Finally, the mobile historian can quickly send out updates on their progress, receiving feedback and suggestions from a remote crowd of like-minded researchers, students, assistants, or colleagues.

Mobile technology (like all technology) is not a magic pill that will suddenly transform the historical profession. There are certainly drawbacks. First and foremost exists a strong economic barrier to entry. Already struggling for travel stipends and fellowship money, many historians won’t be able to afford a brand-new iPhone or high-quality digital camera. Those who aren’t already comfortable with mobile technology will often feel overwhelmed or at an unfair disadvantage. On a more abstract level, technology and its inherent distractions can sometimes construct blinders to one of the most important advantages to visiting a place in person: the ability to feel the sense of place, to listen to the wind and hear the accents and taste the food, a decidedly fuzzy process that adds crucial depth and richness to the historian’s understanding of their subject.

As technology itself becomes more refined and more sophisticated, the possibilities for innovation and exploration will continue to expand. As with any new methodology, the traditional skills and strengths of a historian will not fade into obsolescence. Instead, they’ll be ever more critical to the process of responsibly incorporating new techniques and approaches into the broader historical fold. If this process is even moderately successful, the future of the mobile historian appears bright.

AAHC Recap (Afternoon)

The first session of the afternoon at the AAHC conference was by Patrick Murray-John as he discussed “Interlinking Who, What, Where, When, Why, and How We Teach into a Giant EduGraph.” He gave an entertaining presentation that described the benefits and challenges of a semantic web within education. One story stood out in particular as an example for the need for better efforts in the field. Two courses might use Frankenstein as an assigned text: a class in the chemistry department that examined the portrayal of scientists in literature and an introductory English literature course. Unfortunately, the only people who know that both classes are using the same text is the school book store. If the course syllabi were located within a strong semantic web, there could be tremendous potential for interdisciplinary discovery and interaction within a university.

The second session of the afternoon was “Mapping Our Archives,” by Tim Sherratt of the National Archives of Australia. Sherratt gave an incredible presentation, as he demonstrated his work with the records of Australian servicemembers during the First World War in Mapping Our Anzacas. First up, he used Google Earth to show how he mapped where each of the 300,000 plus soldiers were from. It was remarkably powerful, as he used the platform dynamically in order to visualize the origins and density of this segment of the population. In particular, I was struck at how effectively it took advantage of the interface of Google Earth, and its re-population of point data when you zoom in, so even as you get more and more detailed, it seems like the dots just keep multiplying. The map was one point of entry for users of the database, with another critical point of entry being the opportunity for users to upload multimedia to a “scrapbook” of individual servicemembers profiles. Although some curators expressed (standard) anxiety about losing credibility and content control, they were overwhelmed at the number and quality of contributions from “laypeople,” such as this one of Donald Addison:

Addison

Tim also shared his crafting and tweaking of the database for the servicemembers’ records. Each record was scanned, and many of them included photographic sets of the individuals along with vitals and descriptions of their service. He then used CoolIris (which I discussed in one of my first blog posts) to create a virtual gallery of these archival documents, as you could move along a 3-D wall with rather haunting profile photographs and corresponding information (almost like an early 20th-century version of Facebook). He also showed a similar application of CoolIris for other archival records, including those of immigrants and laborers. I’ve always been somewhat skeptical of CoolIris as a visualization technique, always thinking it did little more than create a kind of cool way to zoom around and look at pictures. But Tim substantially changed my mind by demonstrating its power as an interface – without a doubt, being able to see an array of documents displayed in front of you, seeing the faces of Chinese immigrant children looking out at you, was a singularly powerful use of archival material that significantly enhanced the viewer’s interaction with them.

Tim’s presentation got rave reviews from its audience members, and I think I speak for all of them when I say that his use of dynamic technical interfaces (which included not only Google Earth and CoolIris, but also Tumblr and Greasemonkey scripts) was one of the most elegant, powerful, and accessible examples of bringing archival material to life that I’ve seen yet. Tim offered a fantastic glimpse into what is possible when you combine the power of digital applications with the subtlety and thoughtfulness of historical exploration.

The final session of the afternoon was a presentation from Susan Garfinkel, Judy Graves, and Jurretta Heckscher from the Library of Congress. Although I was a little fried from seven hours of absorbing and processing a host of new thoughts and ideas, I always love supporting our friends at the LOC, who have put a lot of time, thought, and effort into advancing and experimenting with digital history. Their presentations brought up several interesting points. First, they stressed that a series of surveys have shown that faculty members are less than enthusiastic about actually utilizing digitization tools and resources that their libraries provide. Additionally, the LOC has decided to expand their earlier initiative to post photographic content on Flickr Commons by posting audio and video content to iTunes and YouTube. The presentation reinforced my optimism that our national library has the courage to pursue such a variety of digital initiatives.

I drove out of George Mason’s campus feeling exhausted and inspired. The conference did a superb job of bringing together and enhancing a community of digital humanists (made even stronger by the constant Twittering activity during the day). This community isn’t necessarily representative of either the general public or even the wider historical community – in particular, I was struck at the rather overwhelming whiteness of the conference participants. I think more efforts could be made to encourage a wider diversity of perspectives and backgrounds within the field of digital history (a subject for another blog post). Nevertheless, the range, depth, and passion with which so many people are pursuing their projects gives me a strong sense of hope. I know that over the next several years I will undoubtedly struggle to balance the demands of training in a traditional graduate history program with my own passion for digital methodology and exporation. But events such as the AAHC conference bolster my faith in the incredible support system of fellow digital humanists that I can lean on as I walk down that path.