Digital Publication and Scholarly Communication

Digital Library Lecture Series 2005-2006

Kelvin Smith Library

Case Western Reserve University

September 15, 2005

Scholarly communication, for the purposes of this discussion, is a system that includes content producers, end users, publishers, and libraries. Content producers and end users are, generally, the same people at different moments (though it would be a good thing if there were a more general audience for scholarship than there is). Publishers are of two types—university presses and commercial (mostly science/technical/medical) publishers. The economies involved in this system are three: a prestige economy, primary for content-producers, important but secondary for the other players; a cash economy, primary for publishers, not very important to content producers in most cases, and important but not actually primary for libraries; and a subsidy economy, primary for libraries, who are subsidized by universities as a public good, and more important to content producers than they generally know. It shouldn't come as a surprise that a system with three different economies at work inside it is difficult to operate successfully, but when it does work, it has a certain elegance: each party contributes from its own sense of mission, and each gets paid in its own currency. At present, though, there seems to be general agreement that the system of scholarly communication is not working—that it is broken, or breaking.

When things are broken or breaking, one has a chance to remake them, so this might be a good time to ask the question, "What do we want our system of scholarly communication to look like in 2010?" That's not very far off—less than five years—so we're trying to imagine a better system that could actually come about, in a few years. So, what would a better system of scholarly communication look like?

To begin with, I'd like to propose that "digital by default" is the future of scholarly communication: almost all scholarship is already born digital, no matter how it is eventually published. Moreover, I think it's quite reasonable to assume that in the not very distant future, computational methods will have penetrated the humanities and social sciences to the point that there will be many research projects that require electronic dissemination. A better system of scholarly communication would begin from the premise that the form in which research results are conveyed is a question of appropriateness and convenience. In some cases, print is to be preferred as a matter of convenience, but in general, the digital medium offers the most convenience to end users, as well as offering the most expressivity to authors, and (potentially) access to the largest audience. As an aside, on the question of audience, the simplest analysis of the "crisis in scholarly publishing" is that it's a problem of audience: you can't afford to physically manufacture anything—books, televisions, or widgets—in lots of 500 or 1000. On that subject, in a talk delivered at the American Council of Learned Societies, I suggested

...we could enlarge the audience for humanities scholarship, not by dumbing it down, but by making it more readily available. Maybe if we did that, scholars would find an audience first, and a publisher second, instead of the other way around. And maybe in that world, the risk to publishers would be less, because the demand would already be demonstrated. Could we peer review in this world? Of course—and it might then be perfectly clear why we should conduct peer review independent of a decision to publish.

Second, since much of the discussion about the "crisis in scholarly publishing" is really about tenure and promotion, let's stipulate that tenure would be awarded on the basis of the quality and impact of a scholar's work, rather than on its quantity or its form. Let's further stipulate that the criterion of "impact" would include impact on readers outside of one's discipline—in other words, that an ability to convey the significance of specialized research to a general public would count for at least as much as citations resulting from internecine feuds. But tenure is not the whole story: if I were to offer a comprehensive description of the ideal system of scholarly communication, it would go like this:

In a better world, high-quality, peer-reviewed information would be freely available soon after its creation; it would be digital by default, but optionally available in print for a price; it would be easy to find, and it would be available long after its creation, at a stable address, in a stable form.

Starting with this more general set of desiderata, let's work backwards from ends to means. Let's suppose that the most important characteristic of a future system of scholarly communication is that information should be available long after its creation, at a stable address, in a stable form. I would argue, strongly, that digital information isn't going to be easy to find or available long after its creation at a stable address in a stable form unless it is held by libraries—and yet, libraries do not hold most of the digital information that we would consider important to scholarship. Some of it is out there in the wild, on the Web, not collected or preserved: consider the study published in January, 2003 that found "40 percent to 50 percent of the URLs referenced in articles in two computing journals were inaccessible within four years (http://www.washingtonpost.com/wp-dyn/articles/A8730-2003Nov23.html). In the area of journal literature, publishers own much of it, and libraries rent access to their holdings, though some libraries do ask for the right to archive the content they license (see for example "Challenges to Licensing from Some Publishers" at http://www.cdlib.org/news/barriers.html). If you wonder why libraries might ask for this right, see the January 10, 2003 article in the Chronicle of Higher Education pointing out Elsevier's practice of silently deleting articles from its database (http://chronicle.com/prm/weekly/v49/i18/18a02701.htm). It's not just preservation that's at issue here, but also collection development: increasingly, commercial publishers of journal literature in science, medicine, and engineering, aim to dictate collection policy to libraries, through bundling schemes with penalties for choosing not to buy into the bundle and for opting out of subscriptions to individual titles.

So, what if libraries actually collected the important digital information, rather than renting it? To do this, libraries would have to mount and maintain digital object repositories, which (with very few exceptions) they now don't do—though it's worth noting the rise in interest in online institutional archives. Building and maintaining digital object repositories will cost money, and it will be new money. But I think it could be demonstrated that, if building in-house digital collections were actually the price of freedom from commercial publishing, the cost of building those collections would be less than the millions every major university now pays for subscriptions to commercial STM journals. I also think it could be demonstrated that there's a significant role for university presses in building these collections, and that university presses could thrive in a university economy that rewards the lowering of costs to the system of scholarly communication as a whole. More on both those points later: for now—just for a moment—let's assume that we can pay for this collection-building, and let's assume that libraries carry it out.

This still doesn't explain how the first part of the better world of scholarly communications comes about—the part where high-quality, peer-reviewed information is freely available soon after its creation, online and optionally in print.

For over a decade, Stevan Harnad has been plugging away at this part of the problem, and lately he's getting considerable traction in a movement called the Budapest Open Access Initiative (which makes use of, but is not to be confused with, the Open Archives Initiative, or OAI, a metadata standard). The opening paragraph of the Open Access Initiative's mission statement reads as follows:

An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds. Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.

— http://www.soros.org/openaccess/read.shtml

The Initiative goes on to "recommend two complementary strategies" for accomplishing this: "self-archiving," in which scholars "deposit their refereed journal articles in open electronic archives," and "open-access journals" which "use copyright and other tools to ensure permanent open access to all the articles they publish. Because price is a barrier to access, these new journals will not charge subscription or access fees, and will turn to other methods for covering their expenses." As to how these journals will pay for themselves, the Initiative offers some business plans and suggests that "there are many alternative sources of funds for this purpose, including the foundations and governments that fund research, the universities and laboratories that employ researchers, endowments set up by discipline or institution, friends of the cause of open access, profits from the sale of add-ons to the basic texts, funds freed up by the demise or cancellation of journals charging traditional subscription or access fees, or even contributions from the researchers themselves."

I've known Stevan for a long time, and I'll be seeing him later this month in Australia: I agree with what he's trying to do—indeed, I'm a signatory to the Initiative. He's doing more than talking about a solution, too—he's providing software that enables the self-archiving he recommends (freely downloadable at http://www.eprints.org). And it's true, as he suggests, that even without this software, the internet has made it possible for peer-reviewed journals to distribute their contents widely and quickly—in other words, to make high-quality, peer-reviewed information freely available soon after its creation. A journal I've been associated with for nearly fifteen years now has been doing just that, on terms that agree entirely with the Budapest Open Access Initiative: Postmodern Culture is a peer-reviewed journal that distributes the current issue (and text-only versions of all back issues) free of charge, while also licensing the whole collection as part of Project Muse. It uses copyright to ensure access by leaving copyright with the author, and asking only for a non-exclusive right to publish. That strategy allows authors to self-archive, and it also allows them to republish the material elsewhere, as they please. But I also know from the experience of PMC that none of the sources of funding enumerated in the Initiative proved sustainable. What has sustained the journal is its relationship with a publisher, and the steady editorial stipend that comes from Project Muse. By the same token, on the library side, I'd also note that self-archiving and open-access journals, by themselves, do not guarantee "permanent open access." Only libraries can do that (and even then, "permanent" is stretching it).

Part of the problem with the current system is that authors are insulated from the pressures that are shaping their world: they don't pay directly for the costly commercial journals they use. In light of that, the Budapest Open Access Initiative is a step in the right direction, because it engages authors in the fray. It also appears to have struck a responsive chord internationally, as evidenced in the "Berlin Declaration" issued by a number of German research organizations at the conclusion of a conference on open access to knowledge in the sciences and the humanities at the Max Planck Institute in 2004 (http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html). Recent very public rebellion by some large US university libraries against the pricing policies of commercial STM publishers (Cornell, University of California, etc.) also looks like a piece of the same zeitgeist (see http://chronicle.com/prm/weekly/v49/i04/04a03101.htm and http://chronicle.com/prm/weekly/v50/i10/10a03403.htm respectively).

Open Access is part of a better system, and its significance is that it addresses the content-producers, who have had little incentive, hitherto, to change their publishing behavior. If we can make Open Access prestigious, or perhaps simply "cool," then an important battle has been won. That's a marketing problem, and it's one that universities, funding agencies, foundations, and other interested parties would be well advised to address. Faculty are fashion-conscious, despite what you might assume from looking at us: the fashions that matter are more political and ideological than sartorial, and Open Access is a fashion statement that faculty will embrace, if it is promoted.

The Open Access Initiative presents a solution to the problems with the current system that is focused the author/user end of the problem—but that's not to say that some publishers and libraries haven't seen the possibilities here: The California Digital Library is hosting an "eScholarship Repository" which currently archives a few thousand papers produced by faculty from 100 departments and units across the University of California system. Oxford University Press and Oxford University Library Services are partnering on an Open Archives project called SHERPA (Securing a Hybrid Environment for Research Preservation and Access) which aims to "investigate the IPR, quality control and other key management issues associated with making the research literature freely available to the research community. It will also investigate technical questions, including interoperability between repositories and digital preservation of e-prints" (http://www.sherpa.ac.uk/). It's encouraging to see library-press collaboration around the concept of open archives, because libraries and university presses represent the perspectives, expertise, and missions on which the open archives movement, despite its strong appeal, is weakest.

Near the beginning of this discussion, I asked you to set aside, for a moment, the question of how libraries would address the requirement that they collect and preserve digital information, rather than just renting access to it. It's time now to return to that question, and in answering it, to bring publishers into the picture.

In a statement on "The Value of University Presses" commissioned by Bill Regier, as President of the Association of American University Presses, a committee of university press publishers enumerated the things that university presses contribute to society, to scholarship, and to the university community. Self-archiving would moot many of the things on that list (for example, the claim to "make available to the broader public the full range and value of research generated by university faculty"), but even in such a world, a number of these things would still need to be done, and would probably not be done by anyone other than a publisher—for example, adding value to scholarly work "through rigorous editorial development, professional copyediting and design," or committing resources "to long-term scholarly editions and multivolume research projects, assuring publication for works with completion dates far in the future." Some of the other things on this list should be done far more than they are done, and would be central to a world in which university presses and open access co-exist, for example making "common cause with libraries and other cultural institutions to promote engagement with ideas and sustain a literate culture," or collaborating "with learned societies, scholarly associations, and librarians to explore how new technologies can benefit and advance scholarship" (for the whole list, see http://www.aaupnet.org/news/value.html). Here are some forms of common cause and collaboration that university presses might take on, in an open-access world:

administering an online authoring and peer-review environment that encourages authors to produce content in forms that lower library costs for collection and preservation
normalizing content produced outside that environment, to lower the cost of collection and preservation
producing standard metadata (OAI, MARC, METS, etc.) for digital information, to make it more searchable
working with authors and rights holders to address intellectual property issues that make it difficult or dangerous for libraries to collect and preserve certain digital content
working with the commercial sector as an advocate for scholarship, to negotiate a common understanding of the fair use of contemporary cultural materials (for example, film, television, music, etc.) in scholarly and educational contexts
working with university researchers to establish pre-commercial licenses for technologies that could broadly benefit the dissemination and use of scholarship, for example, natural language processing and machine translation
providing print on demand for users of free electronic resources in library collections, and managing the income from that activity
licensing scholarly work for commercial purposes, and managing the income from that activity
marketing online scholarship to maximize its impact and its audience
determining when the size of the audience merits more expensive editorial and production work, and when that work should be handled by the scholar or scholarly project or scholarly society

I'd submit that all of these are things worth doing, in at least some circumstances, and that many of them contribute directly to the support of authoring or to lowering the cost of collecting and preserving digital content. As such, both might qualify university presses for more subsidy than they have been getting from universities lately, though even without that, I think many of these activities would produce enough value for libraries that they could be paid for in the cash economy in which publishers now largely operate.

A word on subsidies, while we're on that point: for obvious reasons, institutional subsidies work best when the public good they create is consumed locally, within the institution. This has been the case with libraries, and it has not been the case with presses. University presses don't publish local authors exclusively, or even in the main, and the good they produce by publishing is produced for a global, not a local, market. Still, it may be time for institutions to think more broadly about the system of scholarly communication as something cooperatively subsidized across localities.

Even if presses are subsidized to a greater extent, and even if they cooperate as suggested above with libraries and with authors, and even if they act in various ways to lower the cost of collecting digital content in libraries, there is still a significant new cost attached to developing those digital collections, and an even greater cost to maintaining them over time. Universities—by which I mean Provosts—need to recognize that these collections are, in fact, the key that unlocks the problem of scholarly communication. If universities don't own the content they produce, if they don't actually collect it, hold it, and preserve it, then they'll be at the mercy of those who do. If universities do collect, and preserve, and provide open access to the content they produce, then the entire balance of power shifts away from commercial publishing and toward university presses and university libraries. Bill Clinton used to say, "it's the economy, stupid." He was right. We could say, in the same spirit, "it's the content, stupid." We should be using subsidies to both libraries and presses, and perhaps other means as well, to encourage (even require) substantive collaboration, with the goal of creating a system in which there are incentives to lower costs across the entire system, including authoring at one end, and preservation at the other. University presses would have a vital role in this process, and university libraries would be the lynchpin, because that's where the content would reside.

So, what do we have? Several interlocking elements that make up a better system of scholarly communication:

We all admit that most information is born digital now, even if that's not how it's published
Department chairs accept the idea that quality and impact are what matter, not the quantity or the medium or the genre of publication
Open Access becomes the new political correctness among content-producers
Libraries insist on collecting digital content and universities support their effort to do so, out of an awareness that this is the key to a better world
University presses normalize digital information to lower the cost of collecting and preserving it, and they perform a number of other functions that lower the cost of scholarship and increase its impact
Provosts use subsidies, marketing, and university policy to encourage an ethos of open access among faculty, and to support open access principles as being in the best interest of all research organizations.
Provosts use subsidies, marketing, and university policy to encourage collaboration between libraries and presses.

In fact, I think the last point is the most difficult. Libraries have been better at collaborating with their own kind than publishers have been, but neither libraries nor publishers have been very good at collaborating with one another. Authors, once politicized on the topic of scholarly communication, may see both libraries and publishers as unnecessary, as some of the emphasis of the Budapest Initiative suggests. In short, there's a good deal of work to do just to make it clear to authors, university press publishers, and libraries, that we're all on the same team, and that the enemy, while real, is elsewhere. The case will have to be made, publicly and repeatedly, as well as privately and pointedly, that collaboration, mutual respect, and close cooperation are absolutely necessary in order for the system of scholarly communication to survive and prosper. What makes that especially challenging, institution by institution, is that there will have to be (local) cash subsidies to encourage this change in behavior, but these subsidies will have to be justified, in part, with reference to a trans-institutional problem and its solution. In other words, just as each author wants to know he's not out there alone in choosing to publish in a new way, and just as deans want to know that their standards of excellence will be the same that other schools apply, provosts need to know that there are other universities making the same choices, providing the same subsidies, working toward a better, more sustainable future for scholarly communication.

All well and good. Now, what's the reality on the ground in humanities and social science departments: what are the rules that actually govern faculty advancement?

With funding from the Andrew W. Mellon Foundation, the Committee on Institutional Cooperation conducted a study (June-November, 2003) to determine the extent to which publication of a scholarly monograph is essential for faculty to receive tenure in the humanistic disciplines. Further, it sought to understand whether faculty members and their department chairs are open to change of their promotion and tenure standards. The research was carried out by the Library Research Center at the University of Illinois and directed by Leigh Estabrook. It included surveys and focus groups of faculty in Anthropology, History and English at CIC institutions; telephone interviews of department chairs in six of the CIC universities; and a survey of faculty who left before receiving tenure in these departments. Among the major findings are the following:

With the exception of scholars who are doing "creative work" or whose work is in certain subfields of Anthropology, department chairs expect a faculty member to have published (or have in press) a scholarly monograph prior to consideration for tenure.
Department chairs are not willing to abandon the scholarly monograph as a standard for promotion and tenure.
Only in History departments does a majority of faculty believe a book should be required (with rare exceptions) for tenure in their departments. Faculty with tenure and faculty who have not yet achieved tenure are similar in their views about this issue.
Most of the faculty members surveyed do not feel a book length manuscript is necessary to present their scholarship.
Department chairs and junior faculty have different perceptions about the type of support provided by the department to untenured faculty.
The publication record of faculty achieving tenure has increased since the 1970s, suggesting that requirements for promotion and tenure in CIC schools have increased. Nearly one-fourth (24.5 percent) of the faculty report being asked for a subvention for one or more books. Respondents differ in their perspectives about subventions with some quite accepting of the practice and others concerned about the implications of providing subventions.
Junior faculty have numerous concerns about the process of getting their work in print, including issues of market forces, time between submission and response and the changing profile of presses.
Faculty members are beginning to examine electronic publications as an outlet for scholarship. A small number of departments have formally considered how electronic publications should be evaluated.

— http://lrc.lis.uiuc.edu/web/ScholarlyCommunicationsSummitReport_Dec03.pdf

1. Changes in disciplinary practices within the humanities, provoked by networked information technology.

From this point of view, department chairs are the problem. Yet another survey—of faculty who regularly use electronic texts—reveals that there are different problems for achieving the desirable future, and these turn on faculty attitudes. Elaine Toms' survey on humanities scholars' use of electronic texts (carried out with Ray Siemens, Stefan Sinclair, Lynn Siemens, and Geoffrey Rockwell) is a Canadian work in progress, but it recently issued this summary report on a survey targeting computing humanists—presumably, the category of humanities faculty most likely to be friendly to the goals I've been promoting in this talk. Here's what they found:

As of November 15th, ninety-six scholars (half male and half female) from more than a dozen countries had responded to the survey. Three quarters were under the age of 45 and most were long-term and frequent users of computers and the Web. These respondents came from a range of disciplines working in a range of genre (mostly prose) and primarily using textual material for their research (it should be noted that the survey was especially directed at text-based humanities scholars; this focus is something that subsequent surveys may wish to broaden). Over 80% use e-text and about half use text analysis tools. In general they believe that e-text are available for their use and expect to find them downloadable off the Web. They prefer to find them in a stable, legal form that is freely available from a reliable institution. In terms of mark-up, respondents appear to be a bipolar group with half expecting to acquire text with no mark-up and half with rich XML. In general, respondents believe that they need text analysis tools, although not complex tools, and are not happy with the tools that are currently available. Somewhat surprisingly, over 50% did not know about commonly available tools such as TACT, WordCruncher and Concordancer. The one most highly used was TACT but few found it useful. In addition to our list of about ten tools, participants added another two dozen tools that they employ in their work. These included tools such as the Wordsmith Tools as well as common Microsoft Office products such as Word and Access.

Toms et al. continue, in a slightly wistful tone:

We inquired about their collaboration and communication habits. Most use e-mail regularly and subscribe to listservs. But they tend to work as solitary scholars, rarely collaborating with their own graduate students and [they] do not see the need for collaborating with other scholars. That said, they like to communicate with other scholars at various points in the research process. They share some of their materials, but tend not to share notes and tools, although they expect others to share tools.

So, although networked information technology has provoked some highly localized and still significant changes in humanities scholarship, we are not there yet. What might "there" look like, at the level of the individual faculty member and his or her research?

Faculty would expect that adequate information resources should be freely available online, and that exceptional ones should be licensed and provided online by the library
Faculty would make much greater use of images in research and teaching, in previously "textual" disciplines
Faculty would embrace the return to primary/archival sources
Faculty would also embrace collaboration, interdisciplinarity & community (Romantic Circles, H-Net, Blake Archive, Stoa Consortium, Perseus, etc.) including collaboration with computer professionals, librarians, others outside the discipline.
Faculty would become more aware of issues that shape scholarly communication, including copyright, ownership of research results, permanence, authenticity and reliability, audience.
Faculty would collaborate with librarians to develop ontologies for modeling disciplinary knowledge
Faculty would embrace maps and GIS in a range of applications, especially in modeling processes that unfold over time, in a particular space
Faculty would participate in developing high-resolution models and reconstructions of physical structures, building sites, cities, landscapes, and would help to ensure that those models accurately conveyed the difference between the known and the conjectural.

Given this kind of engagement by faculty, we might hope to see:

Demand for a national humanities digital library, and a concomitant demand for tools that would allow us to do things (beyond searching and browsing) with digital libraries, like text-mining and visualization, search and retrieval of non-textual data (images, music, video, etc.).
Demand for tools and standards to allow for the stand-off reprocessing (e.g., markup, annotation, etc.) of content in digital libraries.
Demand for better tools for annotating, comparing, sharing, overlaying, excerpting, and analyzing the semantic content of images. Demand for extremely high-fidelity imaging.
Demand for extremely detailed information about the provenance and production of digital information, and demand for tools to authenticate it.
Demand for uniform terms of access to and use of digital representations of material from libraries, archives, and museums, for education and research. Questions of transferable rights are likely to surface here, too.
Demand for better software for OCR for pre-modern and handwritten materials, video transcription and annotation,
Demand for better facilities to support online collaboration, e.g. access-grid/conferencing tools, systems for managing project workflow, peer review, and other distributed, collaborative processes, better tools for sharing applications, annotating web-based materials, excerpting with metadata, etc., tools for multilingual collaboration.
Demand for open-access publishing, with value placed on library stewardship of digital content
Demand for standards and tools that allow us to integrate very different types of models (buildings, landscapes, ornate objects) in a single multi-scale environment
Demand for access to semantic content and for pluralistic content management systems.

The bottom line is that institutions should transplant and carry out their traditional missions with respect to information in new media—publishers should publish, libraries should store and provide access, etc.. They will need to be capitalized to do this, by their provosts. Funding agencies need to look after funding shared infrastructure—the human resources dedicated to shared systems and applications, for example. Science funding agencies need to recognize that they are building not only the science and engineering infrastructure for information, teaching, and research in the 21st century, but the national infrastructure for information, teaching, and research. They need to enfranchise humanities and social sciences in the process from the beginning—because of what we know, because of the problems we bring into focus, because of what we can contribute to designing an effective, sustainable, and inclusive cyberinfrastructure.

But even more important than all of that is the responsibility of humanities scholars and humanities departments to step up to changed responsibilities in a changing world: what should happen will not happen unless department chairs, tenure committees, and individual faculty members make it happen. It is not a foregone conclusion that the humanities, as we know it, will survive the massive transformation that's currently reshaping everything outside of the academy, and much within it. We have much to offer this process of reshaping, but clearly, reshaping begins at home, and we will have to reshape ourselves before we can reshape the rest of the world.