Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [options] [help]

boaiforum messages

[BOAI] Re: EPrints, DSpace or ESpace?

From: Stevan Harnad <harnad AT>
Date: Tue, 3 Jun 2003 13:57:43 +0100 (BST)

Threading:      • This Message
             [BOAI] Re: EPrints, DSpace or ESpace? from harnad AT

    Publish or Perish: Self-Archive to Flourish

              Stevan Harnad

It is becoming apparent that our main challenge is not creating
institutional repositories, but creating policies and incentives for
filling them.

Universities' "publish or perish" policies are intended to encourage
and reward researchers for doing research and for making their findings
public to all would-be users. It is a natural extension of "publish or
perish" to encourage and reward researchers for maximizing the impact of
their research output by maximizing would-be user access to it.

An article on how this can be done through national and university
research accessibility and assessability policies (with the UK as a model)
will appear in THES Friday, June 6. It will be a condensed version of the
following short article:

    "Enhance UK research impact and assessment by making the RAE 

The institutional-repository movement would also benefit greatly
from clearly separating the 5 quasi-independent aims that currently
constitute its very mixed agenda. All 5 aims are worthwhile and important,
but only the first is urgent, and it is the heart of the challenge for
filling institutional with university research output for the sake of
maximizing its impact by maximizing access to it:

The 5 distinct aims for institutional repositories

    I. (RES) self-archiving institutional research output (preprints,
    postprints and theses)

    II. (MAN) digital collection management (all kinds of digital content)
    III. (PRES) digital preservation (all kinds of digital content)
    IV. (TEACH) online teaching materials

    V. (EPUB) electronic publication (journals and books)

As long as we keep blurring or mixing these 5 distinct aims, the first
and by far the most pressing of them -- the filling of university eprint
archives with all university research output, pre- and post-peer-review,
in order to maximize its impact through open access -- will be needlessly
delayed (and so will any eventual relief from the university serials
budget crisis).

Perhaps the two most counterproductive of the conflations among these
five distinct aims has been that between I and III (research
self-archiving, RES, and digital preservation, PRES) and that between
I and V (research self-archiving, RES, and electronic publication,

The RES/PRES mix-up, much discussed in the American Scientist Forum,
can easily be seen to be a needless and misleading conflation when we
recall that insofar as the peer-reviewed research literature is
concerned, the current preservation burden is on its primary corpus,
which is the published literature (online and on paper). The much-needed
filling of university research-output archives is a *supplement* to this
primary corpus, for the purpose of maximizing its impact by maximizing
access to it; it is not a *substitute* for it. It is simply a mistake
and a needless retardant on the filling of the university to imply that
there are preservation problems to solve before they can be filled.

The RES/EPUB mix-up is really two mixups. The first is the conflation of
self-archiving with self-publishing: The urgent archive-filling challenge,
RES, concerns the self-archiving of peer-reviewed, *published* research
output. Again, it is a *supplement* to publication, for the purpose of
maximizing its impact by maximizing access to it; it is not a *substitute*
for it.

The second RES/EPUB mix-up has to do with university e-publishing
ambitions (perhaps along the lines of High-Wire Press-wannabes!). It is
fine to have these ambitions, but they should not be conflated in any
way with the completely independent and urgent aim of self-archiving
the university's peer-reviewed, *published* research output.

Most of this is discussed in the thread:

    "EPrints, DSpace or ESpace?"

This is also the source of the slowness in archive-filling
lamented by Michael Day in the article below. The remedy,
again, is clearly distinguishing RES from any other institutional
repository aims, and drafting national and institutional research
self-archiving policies and incentives, as soon and as systematically
as possible.
    Michael Day, Prospects for institutional e-print repositories
    in the United Kingdom, a paper from the ePrints UK project. Abstract: "This study
    introduces ePrints UK, a project funded as part of the JISC's Focus
    on Access to Institutional Resources (FAIR) Programme. It first
    introduces the project and the main features of the FAIR programme
    as it relates to e-print repositories. Then it provides some general
    information on open-access principles, institutional repositories
    and the technical developments that have made their development
    viable. There follows a review of relevant repositories in the UK
    and an indication of what impact ePrints UK might have in supporting
    learning, teaching and research. This is followed by a discussion of
    perceived impediments to the take-up of institutional repositories,
    including both practical and cultural issues. A final section
    investigates the development of ongoing evaluation criteria for
    the project."  Source:

See: "Enhance UK research impact and assessment by making the RAE 

Stevan Harnad

[BOAI] Re: EPrints, DSpace or ESpace?

From: Stevan Harnad <harnad AT>
Date: Tue, 17 Jun 2003 16:19:38 +0100 (BST)

Threading: [BOAI] Re: EPrints, DSpace or ESpace? from harnad AT
      • This Message
             [BOAI] Re: Eprints, DSpace or ESpace? from harnad AT

On Tue, 17 Jun 2003, Jan Velterop wrote:

> Probably of interest to readers of this list:

In that article in The Scientist, "UC to launch open-access 
Catherine Zandonella writes:

> In a trend that could permanently alter the nature of scholarly
> publishing, several top research universities are setting up
> electronic superarchives to store and share their researchers'
> data. Some universities see these "institutional repositories"
> simply as a way to capture their intellectual output, but others
> aim to use their repositories as a means of launching open-access
> alternatives to conventional academic journals.

"Simply a way to capture their intellectual output"? Clearly the 
of self-archiving refereed research has been completely missed here!

Unfortunately, Zandonella's article simply propagates the growing
wave of nonspecific euphoria about university repositories, which seems
to be based on freely conflating distinct and not always compatible
potential uses for such repositories.

I suggested:

>sh>    The 5 distinct aims for institutional repositories are:
>sh>    I.   (RES) self-archiving institutional research output 
>sh>         postprints and theses)
>sh>    II.  (MAN) digital collection management (all kinds of digital
>sh>         content)
>sh>    III. (PRES) digital preservation (all kinds of digital content)
>sh>    IV.  (TEACH) online teaching materials
>sh>    V.   (EPUB) electronic publication (journals and books)
>sh>    As long as we keep blurring or mixing these 5 distinct aims, the
>sh>    first and by far the most pressing of them, RES -- the filling of
>sh>    university eprint archives with all university research output,
>sh>    pre- and post-peer-review, in order to maximize its impact
>sh>    through open access -- will be needlessly delayed (and so will
>sh>    any eventual relief from the university serials budget crisis).

UC seems to be another instance of conflating I. (RES) and V. (EPUB).
It is hard to discern whether this is just a case of (i) misunderstanding
the essential feature of peer review -- which is that it must be an
autonomous, outsourced, neutral-3rd-party service, otherwise it risks
just becoming a house organ or vanity press -- or else a case of (ii)
High (Wire Press) Hopes (Stanford Envy?): Universities seeking to make
a bigger inroad into electronic publishing.

> This fall, the University of California (UC) plans to unveil just
> such an option for its researchers: the ability to create and run
> an open-access, peer-reviewed journal within the framework of its
> eScholarship Repository.

But the question is this: Does the planet really need more peer-reviewed
journals (it has 20,000 already, most of them toll-access). And is the 
best contribution universities can make with their "superarchives" to
create new journals? Or would it be more useful (to both themselves and
other universities) if they instead focused on making their own 
peer-reviewed research publications openly accessible by self-archiving
them in their own eprint archives (RES)? Does it help either
objective to conflate them under the one rubric of "superarchive" 
UC's word, but a predictable reaction of the press, if we keep freely
admixing I. - V.). Especially at a time when archive frenzy is growing
fast, but self-archiving is still growing too slowly!

> The repository, which is open to all users, will provide software
> tools to automate the process of sending out papers for peer review;
> the journal editors will determine the editorial policies and the
> publication schedule. "We are trying to provide the continuum
> of publishing alternatives," said Suzanne Samuel, eScholarship
> Program coordinator for the California Digital Library, which runs
> the repository for the UC system. (The eScholarship site already
> contains one open-access journal, Dermatology Online Journal, which
> was launched in 1995 and later moved to the UC site.)

As Gerry McKiernan's recent overview shows, there are *many* new
pieces of software being created to automate peer review and journal
publication, all designed to make journal publishing faster, cheaper,
and more efficient. 
What has this to do with any pressing problem facing the university
(such as research access, research impact, or the serials crisis)?

> The idea for institutional repositories arose out of the need to
> archive the increasing amount of data researchers now store on their
> hard drives or display on their web sites. The data in the repository
> are indexed with meta-tags that allow a variety of search strategies,
> and the repository software provides the framework for checking data
> in, storing it, and retrieving it via a web interface. A repository
> can also serve as a preprint server, where researchers can solicit
> comments on unpublished work.

But what does this research data-archiving -- an excellent idea and a
subset of RES -- have to do with EPUB? And why are unrefereed preprints
(an excellent and welcome bonus) singled out for self-archiving when it
is peer-reviewed, published postprints to which access is most urgently

> An important development in the creation of repositories came last
> fall with the launch of DSpace, a repository software platform
> developed at the Massachusetts Institute of Technology (MIT) in
> collaboration with Hewlett-Packard. The DSpace software can be
> downloaded for free, and about 3400 individuals and institutions
> have now done so.

And so can a lot of other software, as indicated earlier in this
discussion thread. But what universities need now is not more software
but a much clearer idea of what to do with it, and why!

> A consortium of universities, called the DSpace Federation,
> is beta-testing the software. The Federation includes Columbia
> University, Cornell University, Ohio State University, University
> of Rochester, University of Washington, University of Toronto,
> and Cambridge University.

Meanwhile, at least 72 universities are already running eprint archives,
some for as long as 2 years: So what? The
archives need filling. And to understand why they need filling, and with
what they need filling, I. - V. have to be separated and each dealt with
in its own right, on its own agenda. Conflating the five just keeps
everything at the beta-testing stage!

> The DSpace software contains no rules on who can enter data, what
> kinds of data can be accepted, or who can access them. Instead,
> the DSpace users set up "communities" and establish their own 
> of use.

What the university community needs is a clear idea of what these
archives are for, and how to go about filling them. I may be wrong, but
at this moment the rationale and urgency for RES (I), the self-archiving
of research output, pre- and post-peer-review, seems to vastly outweigh
that of the other four. But, more important, RES is so distinct from the
other four that it would almost be better if we did not think of all
five as just different "superarchive" functions, but as independent
university functions in their own right. And I don't know about the
other four, but I am pretty sure that RES is better served
by having a lot of OAI-interoperable departmental archives rather than one
university monster-archive (especially if the central superarchive would
conflate I - V!): Isn't that sort of integrable distribution of the
burden part of the rationale for OAI interoperability?

> One federation member that plans to use DSpace to further its goal of
> providing free access to peer-reviewed content is Cornell University.
> Among the reasons for doing this is the feeling that the existing
> publishing model isn't serving universities well, said J. Robert
> Cooke, professor of agricultural and biological engineering and dean
> of the faculty at Cornell. "Long ago we outsourced publishing to
> [commercial] publishers," said Cooke. "Now we need to take it 

So (to put it graphically): Is Cornell University planning to make its
Science and Nature publications open-access by self-archiving them (RES),
or is it planning to create Cornell House-Journals to publish them in
instead (EPUB), rather than of "outsourcing" them to the established
peer-reviewed journals?

> Repositories can serve as a bargaining chip for universities in
> the debate over the future of scholarly publishing, believes Hal
> Abelson, MIT Class of 1922 professor of computer science. "We [the
> universities] have something to bring to the table," said Abelson.

Fine, but what, exactly, are we bargaining about? Open access to our own
peer-reviewed research output? But we can already have that by
self-archiving it in our eprint archives (RES)! What has this to do with
universities trying to get more involved in electronic publication

Or does Hal Abelson mean universities should pressure publishers to
make sure they have updated their copyright agreements to formally
support self-archiving? That is a good idea, but there is considerable
momentum there already, with 55% of publishers already formally
supporting self-archiving, and most of the other agreeing if asked on an
individual basis.
By that token, the RES archives should be at least 55% full already!

But I agree that universities have leverage here -- although it has little
to do with EPUB: It is because *authors* want and need maximal research
impact that publishers have little choice but to support self-archiving,
not because universities threaten to become journal-publishers [EPUB].

> But Harold Varmus, president and chief executive officer of Memorial
> Sloan-Kettering Cancer Center in New York City and cofounder of the
> Public Library of Science -- which later this year plans to publish two
> new open-access biomedical journals -- is skeptical about the idea that
> repositories themselves will help to bring about change. He emphasized
> that journals, not repositories, are the primary record of science.
> "They [repositories] are not going to replace the idea of having an
> investigator write up results," said Varmus.

And Hal Varmus is of course right. Self-archived, unrefereed preprints
in one's university eprint archive are merely vanity-press until/unless
they are submitted for independent, expert peer-review by a peer-review
service-provider with established quality-standards that would-be users
of those findings can rely upon. Such a service has to be 
and it happens to be performed at the moment by 20,000 peer-reviewed
journals, with their own established expertise, quality-standards and
known track-records.

The problem is not "repatriating" that peer-review service. It has to
continue to be an autonomous, 3rd-party service. The problem is access
to its *outcome*: The refereed final drafts. Self-archiving solves that
problem, not by providing a substitute for journals but by supplementing
access to their full-text contents (toll-free).

Hal Varmus himself conflated EPUB and RES somewhat in the original
version of his otherwise splendid and timely EBiomed proposal, but it
is clear that this has since been thought through and sorted out.

> Repositories won't make journals go away, agreed Rick Johnson,
> enterprise director at the Scholarly Publishing and Academic Resources
> Coalition (SPARC), a group that advocates an open model of scientific
> publishing. But, said Johnson, "They begin a process of change that
> will bring about emergence of different business models that support
> science communication."

Self-archiving (RES) provides open access, immediately. That's what's
urgently needed by the research community. New business models for
refereed journal publishing may follow, but what is needed *now* is

> Johnson thinks the availability of preprints, data sets, and images
> will spur communication and feedback among fellow scientists. "People
> will say, 'Gee, my research is hidden behind toll gates today. If
> it was not hidden, imagine what kind of impact it could have.'"

One can hardly disagree, now that SPARC is beginning to come round to
that sensible view! (It is not that long since SPARC's only visible goal
was lower journal prices!)

But it is not just, or even primarily, about (unrefereed) preprints,
data sets and images. It is about toll-free access to *refereed
research.* SPARC needs to be much clearer on that, otherwise they too are
contributing to the gridlock that comes from conflating I. - V.

> At the very least, these superarchives will draw universities into
> the ongoing debate over who should be the gatekeeper of scientific
> information. But Pieter Bolman, vice president and director of
> science, technology, and medical relations for Elsevier Science is
> bullish about the continuing importance of subscription journals. He
> said that although scientists may no longer need journals for
> peer-review -- as they can set up their own systems for reviewing
> papers -- they will continue to seek publication in the journals with
> the best reputation.

I will bet a good deal of money that Pieter Bolman did *not* say anything
as patently nonsensical as that! (Pieter?) This was a journalist's
own contribution to the confusion with which this simple domain is so
rife! Pieter is fully aware that "gate-keeping" has to be outsourced,
and that it is its track-record for gate-keeping that gives a journal
its reputation, not merely its name.

But this absurd picture of universities serving as their own
gate-keepers (EPUB?) along with the idea that this will
co-exist with journals subscribed to purely for their names is
just one facet of the incoherent chimera -- like a 5-dimensional
Escher-drawing -- that comes from conflating I. - V.! It's time
to de-conflate.

> One issue that the emergence of repositories brings to the fore is
> that of copyright. Most scholarly journals acquire copyright from the
> author and grant certain rights in return. The exact terms of this
> agreement vary widely, said Jane Ginsburg, an expert in copyright
> law at Columbia Law School in New York.

Indeed. But the only *relevant* term insofar as the refereed research
literature is concerned is whether or not they allow self-archiving --
and, regarding *that*, journals are quickly, sensibly, and responsibly
converging on the optimal and inevitable outcome:

> Many journals grant authors the right to post the article on a
> personal or university web site. However, "It is one thing if a
> bunch of individual professors put papers on their web sites, but
> it might be another matter if a university does it," said Ginsburg.

No, in the age of OAI-interoperability it does not matter in the
slightest whether it is individual professors or their universities who
self-archive their papers -- as long as the site is OAI-compliant. But
where universities and even governmental research-funding agencies can
help is in extending their existing "publish or perish" carrot/stick
to :"publish and self-archive" (for maximal research impact):

> Mary Waltham, a former publisher of the Nature journals and
> now a consultant for the publishing industry, can see that
> happening. "Search tools are becoming better, and my own personal
> view is that at some point, one will be able to search the Internet
> and find copies of these articles in repositories," said Waltham.

Yes, but it is not search tools that will make that day come, but a
systematic institutional policy of self-archiving those articles in those
institutional repositories! To sort that out, II-V have to be
disentangled from the all important I (RES).


Stevan Harnad

[BOAI] Re: Eprints, DSpace or ESpace?

From: Stevan Harnad <harnad AT>
Date: Thu, 31 Jul 2003 11:56:28 +0100 (BST)

Threading: [BOAI] Re: EPrints, DSpace or ESpace? from harnad AT
      • This Message

On Thu, 31 Jul 2003, Shirley Sullivan wrote:

> We have an repository already established for documents, but 
> interested to see whether it is possible to use the software for metadata 
> records only. or Dspace for this purpose. It appears to be essential for 
> both and dspace to actually load "documents" - it 
> work  for records only. Is this the case, or is to "operator 
error" here, 
> does anyone know? We would like to create an OAI compliant catalogue 
> containing metadata records for objects, but not load the objects 
> themselves. Any advice gratefully received.

Yes, archives (and, for that matter, dspace archives)
can be configured so they archive only the metadata and not the
full-text (i.e., with null full-text). Both softwares can be used for
many purposes, but the software was expressly created to
promote open access to full text, and not merely metadata archiving and
interoperability. (The OAI protocol itself was originally created in the
service of open access to full text, but, once the protocol's importance
and potential power became apparent, the OAI was extended to digital
archiving and interoperability in general, not just to open-access
to full-text.)

Here are the 5 major current categories of uses for institutional
OAI archives. Both softwares (and others) can be used for all 5
purposes, but the software, to repeat, is dedicated 
specifically to number *5 (self-archiving of full-text of refereed
research). We are very anxious to avoid diffusing or slowing the movement
toward the urgent and reachable goal of open access by diverting
the software toward 1-4. 1-4 are implementable, yes, but *5
is the paramount concern; 1-4 are merely distractions from *5 at this time:

The 5 uses for Institutional Digital Archiving Software

1. Institutional Digital Collection Management (both institutional
   output, and bought-in content)
2. Institutional Digital Content Preservation (both institutional
   output and bought-in content)
3. Institutional Digital Courseware
4. Institutional Digital Publishing (e-journals, e-books, )

*5. Institutional Self-Archiving of Research Output (pre- and post-
    peer-reviewed publication)*

The purpose of *5 is to maximize institutional research impact:

While we're on the subject, a variant and cousin of 4, namely, open-access
publishing (BOAI-2) has lately been getting so much attention that the
the misleading impression has grown that open-access publishing is either
the only form, or the main form, of open access! Here are the facts,
in context:

(1) Most of the refereed research literature is still not open-access.

(2) Of the small but growing portion of the refereed research literature
that is open-access already, by far the largest proportion of that is
open-access via self-archiving (BOAI-1, Archiving-5, above) rather than
via open-access publishing (BOAI-2, related to Archiving-4, above).

(3) Of the small but growing portion of the refereed research literature
that is open-access already, by far the fastest-growing proportion of that is
open-access via self-archiving (BOAI-1, Archiving-5, above) rather than
via open-access publishing (BOAI-2, related to Archiving-4, above).

The two open-access strategies (BOAI-1 and BOAI-2) are complementary,
but it is important for researchers and their institutions to have the
optimal joint strategy clearly in mind:

    If there exists a suitable open-access journal for you to publish
    your research in (about 5% currently), publish it there! If not
    (95%), then publish it in a toll-access journal and self-archive
    the full text (pre- and post-peer-review) in your institutional
    eprint archive. 

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 
& 03):

Discussion can be posted to: september98-forum AT 

[BOAI] [Forum Home] [index] [options] [help]

 E-mail: .