Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

boaiforum messages

Re: [BOAI] Re: Cliff Lynch on Institutional Archives

From: Christopher Gutteridge <cjg AT>
Date: Thu, 27 Mar 2003 15:59:58 +0000

Threading: Re: [BOAI] Re: Cliff Lynch on Institutional Archives from krichel AT
      • This Message

I agree! Most archives currently existing have, so far as I can tell,
created sets based on their own subject schemes.

Given that sets are *not* part of the metadata, but a way to harvest
a subset of the records, creating sets which conform to the requirements
of a service provider.

For example, our archive contains 8000 records, but only 800 of those
have actual documents available online. Some OAI services only want to
deal with records which are available online, so the 800 records are
available as an OAI set.

Keeping it simple would be good. If harvesters could describe their
scope in terms of popular classification schemes. Dewey, LoC, etc.

Although there is an argument for making the service provider/
harvester do all the work, as anything which makes it harder to
set up an OAI archive is a Bad Thing.

On Thu, Mar 27, 2003 at 09:17:52 +0200, Hussein Suleman wrote:
> hi
> this may be stating the obvious, but why not use sets for the separate 
> disciplines, aimed at particular service providers? i say it that way 
> because some disciplines are not well-defined (namely, computer science) 
> so such archives may want to play ball with multiple service providers 
> and hence may need different sets.
> in any event, for something like physics, a simple set might do the 
> trick at the source. then, somewhat in keeping with the Kepler model (as 
> published in DLib a while back), the service provider can provide an 
> interface for potential data providers to self-register. i know this 
> sounds dodgy, but think of it as an alternative mechanism for 
> contribution. either individual users submit individual papers or groups 
> submit baseURLS - both go through some kind of review and while one 
> leads to once-off storage, the other leads to periodic harvesting.
> what remains a difficult problem, however, is how to recreate the 
> metadata used by the service provider as its native format. so, for a 
> typical example, if arXiv classifies items using a specific set 
> structure, this is certainly not going to be the default for an 
> institutional archive. does the service provider automatically or 
> manually reclassify? or does it not allow browsing by categories? in 
> either event, the quality of the metadata from the perspective of the 
> service provider may be an impetus for potential users to want to 
> replicate their effort rather than rely on the automated submission from 
> their own institutions ... this needs more thought ...
> ttfn,
> ----hussein
> Christopher Gutteridge wrote:
> >Disciplinary/subject archives vs. Institutional/Organisation/Region 
> >archives. This is going to be a key challenge now open archives begin
> >to gain momentum. 
> >
> >For example; we are planning a University-wide eprints archive. I am 
> >concerned that some physisists will want to place their items in both
> >the university eprints service AND the arXiv physics archive. They may 

> >be required to use the university service, but want to use arXiv as it
> >is the primary source for their discipline. This is a duplication of 
> >effort and a potential irritation.
> >
> >Ultimately, of course, I'd hope that diciplinary archives will be 
> >with subject-specific OAI service providers harvesting from the 
> >institutional
> >archives. But there is going to be a very long transition period in 
> >the solution evolves from our experience.
> >
> >What I'm asking is; has anyone given consideration to ways of 
> >over this duplication of effort? Possibly some negotiated automated 
> >for insitutional archives uploading to the subject archive, or at 
> >assisting the author in the process.
> >
> >This isn't the biggest issue, but it'd be good to address it before it
> >becomes more of a problem.
> >
> >  Christopher Gutteridge
> >  GNU EPrints Head Developer
> >
> >
> >On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote:
> >
> >>On Sat, 15 Mar 2003, Thomas Krichel wrote:
> >>
> >>
> >>> Stevan Harnad writes:
> >>>
> >>>sh> There is no need -- in the age of OAI-interoperability 
-- for
> >>>sh> institutional archives to "feed" central 
disciplinary archives:
> >>>
> >>> I do not share what I see as a  blind faith in 
> >>> through a technical protocol. 
> >>
> >>I am quite happy to defer to the technical OAI experts on this 
one, but 
> >>let
> >>us put the question precisely: 
> >>
> >>Thomas Krichel suggests that institutional (OAI) data-archives
> >>(full-texts) should "feed" disciplinary (OAI) 
> >>because OAI-interoperability is somehow not enough. I suggest that
> >>OAI-interoperability (if I understand it correctly) should be 
enough. No
> >>harm in redundant archiving, of course, for backup and security, 
but not
> >>necessary for the usage and functionality itself. In fact, if I 
> >>correctly the intent of the OAI distinction between OAI 
data-providers -- 
> >> 
> >>-- and OAI service-providers --
> >> 
> >>-- it is not the full-texts of data-archives that need to be 
"fed" to
> >>(i.e., harvested by) the OAI service providers, but only their 
> >>
> >>Hence my conclusion that distributed, interoperable OAI 
> >>archives are enough (and the fastest route to open-access). No 
> >>to harvest their contents into central OAI discipline-based 
> >>(except perhaps for redundancy, as backup). Their OAI 
> >>should be enough so that the OAI service-providers can (among 
> >>things)
> >>do the "virtual aggregation" by discipline (or any other 
> >>criterion) by harvesting the metadata alone, without the need to 
> >>full-text data-contents too.
> >>
> >>It should be noted, though, that Thomas Krichel's excellent RePec
> >>archive and service in Economics -- -- goes
> >>well beyond the confines of OAI-harvesting! RePec harvests non-OAI
> >>content too, along lines similar to the way ResearchIndex/citeseer 
> >> -- harvests non-OAI content in 
> >>science. What I said about there being no need to "feed" 
institutional OAI
> >>archive content into disciplinary OAI archives certainly does not 
> >>to *non-OAI* content, which would otherwise be scattered 
> >>all over the net and not integrated in any way. Here RePec's and
> >>ResearchIndex's harvesting is invaluable, especially as RePec 
> >>does (and ResearchIndex has announced that it plans to) make all 
> >>harvested content OAI-compliant!
> >>
> >>To summarize: The goal is to get all research papers, pre- and
> >>post-peer-review, openly accessible (and OAI-interoperable) as 
soon as
> >>possible. (These are BOAI Strategies 1 [self-archiving] and 2
> >>[open-access journals]:
> >>). In principle this can be done by (1) self-archiving them in 
> >>OAI disciplinary archives like the Physics arXiv (the biggest and
> >>first of its kind) --
> >>-- by (2) self-archiving them in distributed institutional OAI
> >>Archives -- -- by 
> >>self-archiving them on arbitrary Web and FTP sites (and hoping 
> >>will be found or harvested by services like Repec or 
> >>or by (4) publishing them in open-access journals (BOAI Strategy 
> >> ).
> >>
> >>My point was only that because researchers and their institutions
> >>(*not* their disciplines) have shared interests vested in 
> >>their joint research impact and its rewards, institution-based
> >>self-archiving (2) is a more promising way to go -- in the age of
> >>OAI-interoperability -- than discipline-based self-archiving (1), 
> >>though the latter began earlier. It is also obvious that both (1) 
> >>(2) are preferable to arbitrary Web and FTP self-archiving (3), 
> >>began even earlier (although harvesting arbitrary Website and FTP 
> >>into OAI-compliant Archives is still a welcome makeshift strategy
> >>until the practise of OAI self-archiving is up to speed). Creating 
> >>open-access journals and converting the established (20,000) 
> >>journals to open-access is desirable too, but it is obviously a 
> >>slower and more complicated path to open access than 
> >>so should be pursued in parallel.
> >>
> >>My conclusion in favor of institutional self-archiving is based on 
> >>evidence and on logic, and it represents a change of thinking,
> >>for I had originally advocated (3) Web/FTP self-archiving --
> >> -- then switched 
> >>to central self-archiving (1), even creating a discipline-based 
> >> But with the advent of OAI in 
> >>plus a little reflection, it became apparent that
> >>institutional self-archiving (2) was the fastest, most direct, and 
> >>natural road to open access: 
> >>And since then its accumulating momentum seems to be confirming 
that this
> >>is indeed so:
> >>
> >>
> >>
> >>> The primary sense of belonging
> >>> of a scholar in her research activities is with the 
> >>> community of which she thinks herself a part... It certainly
> >>> is not with the institution. 
> >>
> >>That may or may not be the case, but in any case it is irrelevant 
> >>the question of which is the more promising route to open-access. 
> >>primary sense of belonging may be with our family, our community,
> >>our creed, our tribe, or even our species. But our rewards 
> >>grant funding and overheads, salaries, postdocs and students 
> >>to our research, prizes and honors) are intertwined and shared 
with our
> >>institutions (our employers) and not our disciplines (which are 
> >>in fact the locus of competition for those same rewards!)
> >>
> >>
> >>> Therefore, if you want to fill
> >>> institutional archives---which I agree is the best long-run 
> >>> to enhance access and preservation to scholarly research--- 
> >>> institutional archive has to be accompanied by a 
> >>> aggregation process. 
> >>
> >>But the question is whether this "aggregation" needs to 
be the "feeding"
> >>of institutional OAI archive contents into disciplinary OAI 
archives, or
> >>merely the "feeding" of OAI metadata into OAI services.
> >>
> >>
> >>>  The RePEc project has produced such an aggregator
> >>> for economics for a while now. I am sure that other, similar
> >>> projects will follow the same aims, but, with the benefit of
> >>> hindsight, offer superior service. The lack of such services
> >>> in many disciplines,  or the lack of interoperability between
> >>> disciplinary and  institutional archives, are major obstacle 
> >>> the filling  the institutional archives.  There are no
> >>> inherent contradictions between institution-based archives
> >>> and disciplinary aggregators,
> >>
> >>There is no contradiction. In fact, I suspect this will prove to 
be a
> >>non-issue, once we confirm that (a) we agree on the need for
> >>OAI-compliance and (b) "aggregation" amounts to 
metadata-harvesting and
> >>OAI service-provision when the full-texts are in the institutional
> >>archive are OAI-compliant (and calls for full-text harvesting only
> >>if/when they are not). Content "aggregation," in other 
words, is a
> >>paper-based notion. In the online era, it merely means digital 
> >>of the pointers to the content.
> >>
> >>
> >>> In the paper that Stevan refers to, Cliff Lynch writes,
> >>> at
> >>>
> >>>cl> But consider the plight of a faculty member seeking 
only broader
> >>>cl> dissemination and availability of his or her 
traditional journal
> >>>cl> articles, book chapters, or perhaps even monographs 
through use of
> >>>cl> the network, working in parallel with the traditional 
> >>>cl> publishing system.
> >>>
> >>> I am afraid, there more and more such faculty members. Much
> >>> of the research papers found over the Internet are deposited
> >>> in the way. This trend is growing not declining.
> >>
> >>You mean self-archiving in arbitrary non-OAI author websites? 
There is
> >>another reason why institutional OAI archives and official 
> >>self-archiving policies (and assistance) are so important. In 
> >>it is far easier to deposit and maintain one's papers in 
> >>OAI archives like Eprints than to set up and maintain one's own 
> >>All that is needed is a clear official institutional policy, plus
> >>some startup help in launching it. (No such thing is possible at a
> >>"discipline" level.)
> >> 
> >> 
> >>
> >>
> >>
> >>
> >>>cl> Such a faculty member faces several time-consuming 
problems. He or
> >>>cl> she must exercise stewardship over the actual content 
and its
> >>>cl> metadata: migrating the content to new formats as they 
evolve over
> >>>cl> time, creating metadata describing the content, and 
ensuring the
> >>>cl> metadata is available in the appropriate schemas and 
formats and
> >>>cl> through appropriate protocol interfaces such as open 
> >>>cl> metadata harvesting.
> >>>
> >>> Sure, but academics do not like their work-, and certainly
> >>> not their publishing-habits, [to] be interfered with by 
> >>> forces. Organizing academics is like herding cats!
> >>
> >>I am sure academics didn't like to be herded into publishing with 
> >>threat of perishing either. Nor did they like switching from paper 
> >>word-processors. Their early counterparts probably clung to the 
> >>tradition, resisting writing too; and monks did not like be herded 
> >>their peaceful manuscript-illumination chambers to the clamour of
> >>printing presses. But where there is a causal contingency -- as 
there is
> >>between (a) the research impact and its rewards, which academics 
like as
> >>much as anyone else, and (b) the accessibility of their research 
> >>academics
> >>are surely no less responsive than Prof. Skinner's pigeons and 
rats to
> >>those causal contingencies, and which buttons they will have to 
> >>in order to maximize their rewards!
> >>
> >>
> >>Besides, it is not *publishing* habits that need to be changed, 
> >>*archiving* habits, which are an online supplement, not a 
> >>for existing (and unchanged) publishing habits.
> >>
> >>
> >>>cl> Faculty are typically best at creating new
> >>>cl> knowledge, not maintaining the record of this process 
> >>>cl> creation. Worse still, this faculty member must not 
only manage
> >>>cl> content but must manage a dissemination system such as 
a personal Web
> >>>cl> site, playing the role of system administrator (or the 
manager of
> >>>cl> someone serving as a system administrator).
> >>>
> >>> There are lot of ways in which to maintain a web site or to 
> >>> access to a maintained one. It is a customary activity these 
days and
> >>> no longer requires much technical expertise. A primitive 
> >>> of the contents can be done by Google, it requires  no 
> >>> Academics don't care  about long-run preservation, so that 
> >>> remains unsolved. In the meantime, the academic who uploads 
papers to a 
> >>> web
> >>> site takes steps to resolve the most pressing problem, 
> >>
> >>Agreed. And uploading it into a departmental OAI Eprints Archive 
> >>by far the simplest way and most effective way to do all of that. 
All it
> >>needs is a policy to mandate it:
> >>
> >>
> >>
> >>>cl> Over the past few years, this has ceased to be a 
reasonable activity
> >>>cl> for most amateurs; software complexity, security risks, 
> >>>cl> requirements, and other problems have generally 
relegated effective
> >>>cl> operation of Web sites to professionals who can exploit 
economies of
> >>>cl> scale, and who can begin each day with a review of 
recently issued
> >>>cl> security patches.
> >>>
> >>> These are technical concerns. When you operate a linux box
> >>> on the web you simply fire up a script that will download
> >>> the latest version. That is easy enough. Most departments
> >>> have separate web operations. Arguing for one institutional
> >>> archive for digital contents is akin to calling for a single 
> >>> site for an institution. The diseconomies of scale of central
> >>> administration impose other types of costs that the ones that 
it was to
> >>> reduce. The secret is to find a middle way.
> >>
> >>I couldn't quite follow all of this. The bottom line is this: The 
> >> software (for example) can be installed within a few 
days. It
> >>can then be replicated to handle all the departmental or research 
> >>archives a university wants, with minimal maintenance time or 
costs. The
> >>rest is just down to self-archiving, which takes a few minutes for 
> >>first paper, and even less time for subsequent papers (as the 
> >>metadata -- author, institution, etc., can be "cloned" 
into each new
> >>deposit template). An institution may wish to impose an 
> >>"look" on all of its separate eprints archives; but 
apart from that,
> >>they can be as autonomous and as distributed and as many as 
> >>OAI-interoperability works locally just as well as it does 
> >>
> >>
> >>>cl> Today, our faculty time is being wasted, and expended 
> >>>cl> on system administration activities and content 
curation. And,
> >>>cl> because system administration is ineffective, it places 
> >>>cl> institutions at risk: because faculty are generally not 
capable of
> >>>cl> responding to the endless series of security exposures 
and patches,
> >>>cl> our university networks are riddled with vulnerable 
faculty machines
> >>>cl> intended to serve as points of distribution for 
scholarly works.
> >>>
> >>> This is the fight many faculty face every day, where they
> >>> want to innovate scholarly communication, but someone
> >>> in the IT department does not give the necessary permission
> >>> for network access...
> >>
> >>I don't think I need to get into this. It's not specific to
> >>self-archiving, and a tempest in a teapot as far as that is 
concerned. An
> >>efficient system can and will be worked out once there is an 
> >>institutional self-archiving policy. There are already plenty of 
> >>examples, such as CalTech: 
> >> 
> >>See also:
> >>
> >>
> >>Stevan Harnad
> >
> >
> -- 
> =====================================================================
> hussein suleman ~ hussein AT ~
> =====================================================================

    Christopher Gutteridge -- cjg AT -- +44 (0)23 8059 4833

|                                   |                                      |
| Now Playing: "For You" from       | Pessimist by policy, optimist 
by     |
| Tracy Chapman - Tracy Chapman     | temperament -- it is possible to be  |
|                                   | both. How? By never taking an        |
|                                   | unnecessary chance and by            |
|                                   | minimizing risks you can't avoid.    |
|                                   | This permits you to play out the     |
|                                   | game happily, untroubled by the      |
|                                   | certainty of the outcome. -- From    |
|                                   | "The Notebooks of Lazarus Long" 
by   |
|                                   | Robert Heinlein                      |

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

 E-mail: .