Digital Humanities Centers as Cyberinfrastructure

John Unsworth

Digital Humanities Centers Summit
National Endowment for the Humanities
Washington, DC
Thursday, April 12, 2007

The ACLS report on Cyberinfrastructure for Humanities and Social Sciences (available online at http://www.acls.org/cyberinfrastructure/index.htm) was a response to what is now called the Atkins report, after Dan Atkins, who chaired the NSF-appointed blue-ribbon panel on Cyberinfrastructure that produced it: he also served as an advisor to the ACLS Commission. I want to thank Dan for his leadership in this topic, and especially for the ecumenical breadth of his thinking on the subject, not only in the original NSF report, but also in his more recent position as director of the NSF division of cyberinfrastructure: that kind of openness is extremely important as the humanities and the social sciences work out their relationship to science, engineering, computer science, and commercial interests in an emergent and rapidly changing environment.

The ACLS report noted, in its introduction, that the original NSF document described cyberinfrastructure as consisting of

grids of computational centers;
comprehensive libraries of digital objects;
well-curated collections of scientific data;
online instruments and vast sensor arrays;
convenient software toolkits.

The ACLS report went on to note that

Humanities scholars and social scientists will require similar facilities but, obviously, not exactly the same ones: "grids of computational centers are needed in the humanities and social sciences, but they will have to be staffed with different kinds of subject-area experts; comprehensive and well-curated libraries of digital objects will certainly be needed, but the objects themselves will be different from those used in the sciences; software toolkits for projects involving data-mining and data-visualization could be shared across the sciences, humanities, and social sciences, but only up to the point where the nature of the data begins to shape the nature of the tools. Science and engineering have made great strides in using information technology to understand and shape the world around us. This report is focused on how these same technologies could help advance the study and interpretation of the vastly more messy and idiosyncratic realm of human experience.

To that end, the ACLS report had eight recommendations: I'd like to look at each of those recommendations with an eye to the critical contributions that digital humanities centers can make in these areas, in order to ensure that the goals outlined in that report are realized. What I'll be arguing here is that digital humanities centers are cyberinfrastructure for humanities and social sciences--not the only kind, but one of the most important kinds, especially given where those disciplines are and where they need to go.

Recommendation 1: Invest in cyberinfrastructure for the humanities and social sciences, as a matter of strategic priority.

Centers are the most efficient way for institutions of higher education to make this investment: the collection of expertise, equipment, software, etc. that is required to facilitate digital humanities and social sciences requires some economy of scale: it can't be supported at the department level, and though it might be supported at the level of college or school, these bureaucratic units are never co-extensive with the humanities or the social sciences, in any university. At Virginia, IATH has drawn faculty participation from history of science in Engineering, architectural history in Architecture, a number of humanities disciplines represented in curriculum and instruction in the School of Education, and of course a wide range of departments in Arts and Sciences; at Illinois, the situation is pretty much the same--the humanities are spread across Liberal Arts and Sciences, Fine and Applied Arts, Education, Library and Information Science, and other schools and colleges, with computing and engineering involved as well, through Computer Science, NCSA, Computational Science and Engineering, and others. If you are going to make an institutional investment in cyberinfrastructure for humanities and social sciences, as a university, you are obviously better off making that investment once, and in a high-impact, high-profile way, than many more times, with less impact, at a higher cost, across more units. Aside from the economies-of-scale argument, there is an argument to be made about the benefits of interdisciplinarity: it is still, in most universities, a relatively rare thing for faculty in humanities and social sciences to have ready access to compelling opportunities for interdisciplinary collaboration within their own institution.

Recommendation 2: Develop public and institutional policies that foster openness and access.

Centers, working closely with the library, can take point within the institution on promoting the development of these policies, and in promulgating them to faculty. The library is obviously a key player, and institutional repositories are an opportunity for what is probably best imagined as broad-and-shallow education of faculty, and these efforts will inevitably focus on the intellectual property that faculty members themselves produce; centers offer an opportunity for narrower and deeper engagements with the rights policies that govern the primary materials on which scholarship is based. In this engagement, the faculty member is the intellectual property (IP) consumer, rather than the IP creator, though access to these primary materials will be a necessary precondition for creation of the faculty member's IP. The library's role and that of the Centers are complementary, and should be coordinated, not least to make sure that a consistent message is being communicated to faculty at both moments--when they are IP consumers, and when they are IP producers. If there's a university press in the neighborhood, they should also be engaged in the discussion.

There's also another way in which Centers can play a particularly useful and important role, with respect to faculty members who are trying to negotiate questions of rights for access to primary source materials. As many of these materials will come from cultural institutions like libraries and museums and archives, and over time these may well be the same libraries, museums, and archives even though the faculty projects will be different, a Center can establish relationships with these institutions that span many years and many projects, providing a basis of trust and prior acquaintance that will ease negotiations in particular cases.

Recommendation 3: Promote cooperation between the public and private sectors.

The university has a hard time, especially in the humanities, in producing effective representatives or partners for the private sector. Humanities and, to a lesser extent, social science departments, have little or no experience, and often little or no interest, in partnering with the private sector. It's actually worse than that: the humanities tend to hold the private sector in contempt, as the culprit in the corporatization of the university. But just as Centers can provide continuity, build trust, and establish a track record with cultural institutions, so that individual faculty members don't have to start that process from scratch, Centers can do the same with private-sector partners: they can identify appropriate collaborators for the humanities, inculcate appropriate expectations for research outcomes, and match those partners with faculty who have congruent interests. Looking at it from the other side, the Center can match a researcher's interests with an appropriate private-sector partner, if one exists, and can create appropriate expectations for the nature and the outcomes of that partnership, on the faculty member's side. Private-sector partners might be interested in promoting cultural heritage, publishing or licensing scholarship for specialist or generalist audiences, or access to users with advanced requirements for general-interest content. Centers can represent the interests of the researcher in collaborations--for example, in something like the Google Book project, libraries represent one set of resources and requirements, but these are not necessarily always those of the faculty researcher. Centers could do all of these things more effectively if they networked with one another.

Recommendation 4: Cultivate leadership in support of cyberinfrastructure from within the humanities and social sciences.

Leadership in the disciplines needs to emerge in an institutional context that provides some support and direction for it--without that context, the impetus to pursue digital work may be perceived as detracting from other work. Leadership in cyberinfrastructure, for the humanities, will no doubt emerge from large projects and from national centers, as it has done in other disciplines. And indeed, we already can see such leadership emerging, in the centers represented here today, and, I would argue, in the membership of the ACLS Commission, all of whom are people who have spent years of their academic lives developing, using, and promoting cyberinfrastructure for the humanities and social sciences. During the work of the Commission, we heard a good deal about the need to change the reward system in the humanities--particularly tenure and promotion--to cultivate digital scholarship. At the same time, though, the Commission recognized that its own members had been rewarded by their disciplines and their home institutions for doing such work, so the situation is not a simple one: for example, there are examples of individuals who have been tenured for digital scholarship. I was, more than ten years ago, at what was then considered a conservative department and university. Others, like Matt Kirschenbaum, have been tenured in the same discipline, more recently, for work that is significantly (though not entirely) digital. But Matt, if you'll permit me, I'd like to use your case as an example of what I think the real issues are here.

In my entire decade at Virginia, Matt was the only doctoral student who was sufficiently risk-friendly to choose me as his dissertation director. He did his dissertation work in the open, on the web, and he pitched in with a will as the project manager for the William Blake Archive, doing digital work that was not directly in his area, at least in terms of its subject matter and the focus of his dissertation.

I distinctly remember a workshop that Matt and I did for other graduate students in the English department on the subject of electronic dissemination of work in progress, and electronic publishing of work from dissertation research. Most of these students were extremely skeptical of our encouragement to do these things, which they clearly regarded as extremely risky. What did they fear? They were worried that this kind of publication wouldn't count. They were worried that learning how to do this would be a distraction from their real work. They were worried that someone would steal their ideas. We argued that the only way to protect one's claim to an idea was to publish it, but to no avail: they were receiving advice to avoid the web from at least some of my colleagues in the department, particularly (at that time) those responsible for counseling students on how to navigate the job market.

And yet when Matt came on the job market, his work was already known to many on the committees with which he interviewed. He had experience working as a colleague in a collaborative project that included faculty members from several other universities, and all of them took a proprietary interest in his success. He had a professional network and an intellectual profile in the discipline, in other words--things that are still pretty much unheard of for graduate students just completing the dissertation. And while I know there have been times when Matt encountered his own ideas in other people's work, there's no question about the primacy or the originality of the book he's about to publish with MIT Press--and in any case, one measure of success in scholarship is citation, and (though we might prefer citation) imitation is another. Matt's now tenured here at Maryland, and associate director of MITH.

So what's the moral of Matt's story, with respect to cultivating leadership in humanities cyberinfrastructure? Centers like IATH and MITH are important, because they create the context in which students who are not completely risk-averse can find opportunities to collaborate, to pursue their own research and to contribute to the work of others, to establish those intellectual and professional networks that make the difference, ultimately, between moderate success within established boundaries, and boundary-crossing leadership.

Recommendation 5: Encourage digital scholarship.

In 2001, two years before the publication of the Atkins report, Fran Berman (the Director of San Diego Supercomputing Center) wrote this:

"We hear a lot about the impact on science and engineering of cyberinfrastructure hardware resources (computers, storage, instruments, networks) or software tools and interfaces. Less heard, perhaps, is a discussion of the element most critical to the success of the cyberinfrastructure--its human infrastructure. The cyberinfrastructure's human infrastructure is a synergistic collaboration of hundreds of researchers, programmers, software developers, tool builders, and others who understand the difficulties of developing applications and software for a complex, distributed, and dynamic environment. These people are able to work together to develop the software infrastructure, tools, and applications of the cyberinfrastructure. They provide the critical human network required to prototype, integrate, harden, and nurture ideas from concept to maturity.

Fran Berman, "The Human Side of the Cyberinfrastructure, Envision 17.2 (April-June 2001).

Human infrastructure is key to cyberinfrastructure in the humanities, as well, though we don't yet have (and may never have) "hundreds of researchers, programmers, software developers, tool builders, and others helping "to prototype, integrate, harden, and nurture ideas from concept to maturity. We do have some such people, though, and they work in the centers represented here. Some also work in libraries and in campus computing organizations, but I would argue that in both of those cases we find human infrastructure that is less exclusively focused on bringing to fruition the concepts of faculty researchers in the disciplines of the humanities. That exclusive focus is important: in my experience there is considerable danger of "mission creep in under-resourced academic settings, where computers are involved. Since the computer is a general purpose modeling machine, it can do lots of different things, and from the perspective of the person needing support, it isn't really that important whether the activity in question is research, or teaching, or publishing, or something else. But from the point of view of developing the kind of in-depth, long-term engagement with computational methods that actually produces new knowledge acquired by new means, that is a critical difference. Only research represents a long-term commitment on the part of the faculty member, and only that long-term commitment can justify the extremely taxing effort of what Daniel Pitti used to call "ontology and obstetrics--that is, eliciting from the researcher his or her tacit knowledge of a subject, working with him or her to express that knowledge in an explicit and computable form, trying it on the data for size, and iterating--usually many, many times, before an acceptable computational model, or tool, or resource has been developed. The long-term engagement of professional staff in this process is key, as well: it takes time to learn to understand the research paradigms, the vocabulary, the motivations, and the intellectual practices of scholars in the humanities--and without understanding these things, it is highly unlikely that a programmer, or tool-builder, or others in the human infrastructure can succeed in making cyberinfrastructure useful at any very high level in the humanities.

Recommendation 6: Establish national centers to support scholarship that contributes to and exploits cyberinfrastructure.

In that same 2001 article, Berman goes on to note that

"The personal networks, knowledge, and relationships of the human infrastructure take a long time to build and are critical to the usability of the resources. In particular, the advances we now enjoy in science and engineering are the fruit of the many years of cooperation in the national effort to unite computational and computer sciences.

Although it is likely that most of the centers represented here have figured out some way to work with faculty members at universities other than the one that houses the center, it is usually an ad hoc and/or an unfunded arrangement, and it is difficult to get real traction on those terms. It's also difficult, on those terms, to be strategic about what projects you support, and nothing that would really support the idea of a national network, with faculty directed to centers with appropriate expertise, and so on.

The ad hoc and project-based funding that has, by and large, characterized the work done in digital humanities to date raises some real (and, in other domains, familiar) problems for building cyberinfrastructure. Earlier this year, Paul Edwards, Steven Jackson, Geoffrey Bowker, and Cory Knobel published a very interesting white paper, coming out of some meetings at the University of Michigan, titled "Understanding Infrastructure: Dynamics, Tension, and Design. In this white paper, the authors write that

Social and historical analyses reveal some base-level tensions that complicate the work of infrastructural development. These include:

Time, e.g. short-term funding decisions vs. the longer time scales over which infrastructures typically grow and take hold
Scale, e.g. disconnects between global interoperability and local optimization
Agency, e.g. navigating processes of planned vs. emergent change in complex and multiply determined systems.

Although the white paper is primarily interested in cyberinfrastructure for computational science, which is still what most people are thinking of when they talk about cyberinfrastructure, the tensions articulated here are the same problems that we face. The authors go on to say:

Such complications challenge simple notions of infrastructure building as a planned, orderly, and mechanical act. They also suggest that boundaries between technical and social solutions are mobile, in both directions: the path between the technological and the social is not static and there is no one correct mapping. Robust cyberinfrastructure will develop only when social, organizational, and cultural issues are resolved in tandem with the creation of technology-based services. Sustained and proactive attention to these concerns will be critical to long-term success.

http://www.si.umich.edu/cyber-infrastructure/UnderstandingInfrastructure_FinalReport25jan07.pdf

This passage suggests why it might be useful to talk not only about centers, but also about a national network or coalition of centers. Some such social structure is probably required if "social, organizational, and cultural issues are [going to be] resolved in tandem with the creation of technology-based services.

Recommendation 7: Develop and maintain open standards and robust tools.

No one wants to fund standards development, in my experience--or if they do fund it, it is for a particular project, not with recurring operating funds. Maybe that's OK--after all, the argument can be made that if a standards organization doesn't have enough community support to survive on volunteer labor, it's not necessarily a good thing to keep them alive on external funding. On the other hand, some of the most profoundly important standards bodies operate on significant funding, with participation from government, private sector, and research communities. A middle road might be for funders to strongly encourage individual projects to write into their budgets membership fees for standards organization, and funds to travel to and participate in meetings of those organizations.

Developing and maintaining robust tools is a bit more of a challenge--at least, we have examples of humanities open standards that have survived for a long time, and we can point to very few software tools that, at least in their robust form, emerge from academic software development. As I am currently funded to do software development in an academic environment, by one of the funders here today, this might seem an ill-advised observation, on my part, but it is true. And it's OK. I think the role of academic software development is to provide workable proof of concept tools, that serve their intended audience and purpose--albeit perhaps not robustly--but illustratively, at least. In the nora project, the Mellon-funded software development project that I'm involved in, I think we managed to produce a working application that begins to show that text-mining could do some interesting work in the humanities--but some of the hardest work was in figuring out exactly how. Designing and building the application is also a challenge, of course, and it also carries with it some research questions, but in both design and development, what I think distinguishes the research enterprise from its commercial equivalent is that we can imagine failures that are still useful outcomes, in the sense of being informative.

Bowker et al. talk about this too:

How we can learn more about "growing infrastructures by studying current cyberinfrastructure projects, in an iterative and informative cycle potentially beneficial to those projects and future ones? [. . . .] Anecdotal evidence from many of the workshop participants suggests that standard forms of project reporting, given the incentives of both funder and grantee, will tend to over-report experiences of success and under-report those of difficulty or failure. Efforts to accommodate and encourage the honest reporting of failure could go a long way to supporting long-term and comparative learning across the varieties of cyberinfrastructural experience. As science itself has proceeded through the disciplined and even-handed study of failure, funders and proponents of cyberinfrastructure must learn to stop hiding the bodies.

I hope that, over the next couple of days, we'll have an opportunity to have some genuine discussion about how we structure a funding and research environment so that failures are valued, as long as they advance the enterprise as a whole--as long as they are informative. I've been arguing this for some time, of course, beginning with a piece I published in the Journal of Electronic Publishing in 1997, which began by saying:

If an electronic scholarly project can't fail and doesn't produce new ignorance, then it isn't worth a damn. [....] At a conference at the University of Maryland, Neil Fraistat (whose Romantic Circles Web site some of you may know) asked me if there were any writing on specific humanities hypertext projects that was neither promotional nor anecdotal, but that reported and analyzed and theorized the experience of constructing such a project. I could think of a couple of examples, but only a couple, and none perfectly apt. The conversation with Neil progressed to the topic of the importance of reporting and analyzing failure in any research activity, humanistic or scientific, and to the patterns of funding that discouraged such reporting and analysis. I owe whatever illuminations emerge [on this topic] to that conversation, and I take it as an emblematic instance of a research opportunity: a question for which there should be an answer, for which one could imagine an answer, but for which no very good answer was at present to be found.

http://www.press.umich.edu/jep/03-02/unsworth.html

I do think that existing science cyberinfrastructure, in the sense of tools and environments that support collaboration in large, interdisciplinary research projects, has been oversold, by quite a bit. But what's wrong with that is not the fact that it doesn't work all that well yet--the problem is that when we speak and write about it, and especially when that speaking and writing has funding in view, we pretend that it does work, that it's great, that it's whiz-bang. Happily, we are not far enough along, in developing humanities cyberinfrastructure, to have much to oversell. But let us agree to try to do this one thing better than the sciences have done, and make our difficulties, the shortcomings of our tools, the challenges we haven't yet overcome, something that we actually talk about, analyze, and explicitly learn from.

Recommendation 8: Create extensive and reusable digital collections.

We have left the hardest for last. This is an area where centers can help, to some extent, by being a source of best practices that can be brought to bear on the individual project from the beginning, but even centers won't necessarily provide enough pressure, or have enough experience, to really produce this result. This is an area where centers need to be a point of contact with libraries--the library on the same campus as the center, if there's appropriate interest and expertise there, but libraries elsewhere, if not. If, as Deanna Marcum says, preservation begins at creation, then libraries, who will eventually be faced with collecting the products of digital humanities research, need to be involved as early as possible, in the creation of those products. There is a reciprocal benefit, as well: if library collections are taken out of their domestic context and subjected to expectations and uses that go beyond the ones envisioned by their creators. In my experience, in nora, texts that are prepared with the notion that they will always be used in the same way, for browsing and searching, in the same environment for which they were originally prepared, have a tendency to leave certain kinds of information implicit--it's implicit elsewhere in the system, and not explicit anywhere in the text itself. Once you start to aggregate these resources and combine them in a new context and for a new purpose, you find out, in practical terms, what it means to say that that their creators really only envisioned them being processed in their original context--for example, the texts don't carry within themselves a public URL, or any form of public identifier that would allow me to return a user to the public version of that text. They often don't have a proper Doctype declaration that would identify the DTD or schema according to which they are marked up, and if they do, it usually doesn't point to a publicly accessible version of that DTD or schema. Things like entity references may be unresolvable, given only the text and not the system in which it is usually processed. The list goes on: in short, it's as though the data has suddenly found itself in Union Station in its pajamas: it is not properly dressed for its new environment. So, there's some benefit to the library, and to the long-term survivability and usefulness of their collections, or publishers' collections, to have them used in new ways, in research.

Finally, I'd like to close by mentioning a few other benefits of digital humanities centers:

Centers can function as an institution that mentors humanities faculty and graduate students in the fine art of collaboration
Centers can collect and sustain staff expertise that no individual project could afford.
Centers can inculcate, in humanities faculty, an awareness of external funding opportunities and an understanding of how to pursue those opportunities, and a sense of why it's worth doing so.
Centers can help faculty produce better grant proposals.
Centers provide funders with some long-term stability for individual research projects, and they help to assure that the work funded in a particular project won't be orphaned, institutionally.
Centers can provide graduate students with opportunities to work as part of a collective intellectual enterprise, which is quite unusual for them--and the experience can provide them with valuable experience when they apply for faculty jobs, or with experience that will open other career opportunities for them.
Centers involve humanities faculty in research projects that are collaborative, rely on staff support and computing infrastructure, and bring in external funding: all of these things make humanities faculty more difficult to relocate from one university to another, so the Center is an effective instrument of retention.
Centers can be a point of connection between humanities faculty and LIS programs, which would be very fruitful. LIS faculty are about half from other disciplines, and humanities computing is very much about information organization, ontologies, taxonomies, schema, preservation, interface design, and other issues that are studied and taught in LIS programs. The LIS connection also would help to activate the NEH/IMLS connection, as well as the NSF cyberinfrastructure connection.

Thank you very much for your time this afternoon, and I look forward to your comments and questions.