Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

boaiforum messages

[BOAI] Re: Cliff Lynch on Institutional Archives

From: Hussein Suleman <hussein AT>
Date: Thu, 27 Mar 2003 09:17:52 +0200

Threading: [BOAI] Re: Cliff Lynch on Institutional Archives from krichel AT
      • This Message
             [BOAI] Re: Interoperability - subject classification/terminology from harnad AT
             [BOAI] Re: Cliff Lynch on Institutional Archives from comyn AT


this may be stating the obvious, but why not use sets for the separate 
disciplines, aimed at particular service providers? i say it that way 
because some disciplines are not well-defined (namely, computer science) 
so such archives may want to play ball with multiple service providers 
and hence may need different sets.

in any event, for something like physics, a simple set might do the 
trick at the source. then, somewhat in keeping with the Kepler model (as 
published in DLib a while back), the service provider can provide an 
interface for potential data providers to self-register. i know this 
sounds dodgy, but think of it as an alternative mechanism for 
contribution. either individual users submit individual papers or groups 
submit baseURLS - both go through some kind of review and while one 
leads to once-off storage, the other leads to periodic harvesting.

what remains a difficult problem, however, is how to recreate the 
metadata used by the service provider as its native format. so, for a 
typical example, if arXiv classifies items using a specific set 
structure, this is certainly not going to be the default for an 
institutional archive. does the service provider automatically or 
manually reclassify? or does it not allow browsing by categories? in 
either event, the quality of the metadata from the perspective of the 
service provider may be an impetus for potential users to want to 
replicate their effort rather than rely on the automated submission from 
their own institutions ... this needs more thought ...


Christopher Gutteridge wrote:
> Disciplinary/subject archives vs. Institutional/Organisation/Region based
> archives. This is going to be a key challenge now open archives begin
> to gain momentum. 
> For example; we are planning a University-wide eprints archive. I am 
> concerned that some physisists will want to place their items in both
> the university eprints service AND the arXiv physics archive. They may 
> be required to use the university service, but want to use arXiv as it
> is the primary source for their discipline. This is a duplication of 
> effort and a potential irritation.
> Ultimately, of course, I'd hope that diciplinary archives will be replaced
> with subject-specific OAI service providers harvesting from the 
> archives. But there is going to be a very long transition period in which
> the solution evolves from our experience.
> What I'm asking is; has anyone given consideration to ways of smoothing
> over this duplication of effort? Possibly some negotiated automated 
> for insitutional archives uploading to the subject archive, or at least
> assisting the author in the process.
> This isn't the biggest issue, but it'd be good to address it before it
> becomes more of a problem.
>   Christopher Gutteridge
>   GNU EPrints Head Developer
> On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote:
>>On Sat, 15 Mar 2003, Thomas Krichel wrote:
>>>  Stevan Harnad writes:
>>>sh> There is no need -- in the age of OAI-interoperability -- 
>>>sh> institutional archives to "feed" central 
disciplinary archives:
>>>  I do not share what I see as a  blind faith in interoperability
>>>  through a technical protocol. 
>>I am quite happy to defer to the technical OAI experts on this one, but 
>>us put the question precisely: 
>>Thomas Krichel suggests that institutional (OAI) data-archives
>>(full-texts) should "feed" disciplinary (OAI) data-archives,
>>because OAI-interoperability is somehow not enough. I suggest that
>>OAI-interoperability (if I understand it correctly) should be enough. 
>>harm in redundant archiving, of course, for backup and security, but 
>>necessary for the usage and functionality itself. In fact, if I 
>>correctly the intent of the OAI distinction between OAI data-providers 
>>-- and OAI service-providers --
>>-- it is not the full-texts of data-archives that need to be 
"fed" to
>>(i.e., harvested by) the OAI service providers, but only their 
>>Hence my conclusion that distributed, interoperable OAI institutional
>>archives are enough (and the fastest route to open-access). No need
>>to harvest their contents into central OAI discipline-based archives
>>(except perhaps for redundancy, as backup). Their OAI interoperability
>>should be enough so that the OAI service-providers can (among other 
>>do the "virtual aggregation" by discipline (or any other 
>>criterion) by harvesting the metadata alone, without the need to 
>>full-text data-contents too.
>>It should be noted, though, that Thomas Krichel's excellent RePec
>>archive and service in Economics -- -- goes
>>well beyond the confines of OAI-harvesting! RePec harvests non-OAI
>>content too, along lines similar to the way ResearchIndex/citeseer --
>> -- harvests non-OAI content in computer
>>science. What I said about there being no need to "feed" 
institutional OAI
>>archive content into disciplinary OAI archives certainly does not apply
>>to *non-OAI* content, which would otherwise be scattered willy-nilly
>>all over the net and not integrated in any way. Here RePec's and
>>ResearchIndex's harvesting is invaluable, especially as RePec already
>>does (and ResearchIndex has announced that it plans to) make all its
>>harvested content OAI-compliant!
>>To summarize: The goal is to get all research papers, pre- and
>>post-peer-review, openly accessible (and OAI-interoperable) as soon as
>>possible. (These are BOAI Strategies 1 [self-archiving] and 2
>>[open-access journals]:
>>). In principle this can be done by (1) self-archiving them in central
>>OAI disciplinary archives like the Physics arXiv (the biggest and
>>first of its kind) --
>>-- by (2) self-archiving them in distributed institutional OAI
>>Archives -- -- by (3)
>>self-archiving them on arbitrary Web and FTP sites (and hoping they
>>will be found or harvested by services like Repec or ResearchIndex)
>>or by (4) publishing them in open-access journals (BOAI Strategy 2:
>> ).
>>My point was only that because researchers and their institutions
>>(*not* their disciplines) have shared interests vested in maximizing
>>their joint research impact and its rewards, institution-based
>>self-archiving (2) is a more promising way to go -- in the age of
>>OAI-interoperability -- than discipline-based self-archiving (1), even
>>though the latter began earlier. It is also obvious that both (1) and
>>(2) are preferable to arbitrary Web and FTP self-archiving (3), which
>>began even earlier (although harvesting arbitrary Website and FTP 
>>into OAI-compliant Archives is still a welcome makeshift strategy
>>until the practise of OAI self-archiving is up to speed). Creating new
>>open-access journals and converting the established (20,000) 
>>journals to open-access is desirable too, but it is obviously a much
>>slower and more complicated path to open access than self-archiving,
>>so should be pursued in parallel.
>>My conclusion in favor of institutional self-archiving is based on the
>>evidence and on logic, and it represents a change of thinking,
>>for I had originally advocated (3) Web/FTP self-archiving --
>> -- then switched 
>>to central self-archiving (1), even creating a discipline-based 
>> But with the advent of OAI in 1999,
>>plus a little reflection, it became apparent that
>>institutional self-archiving (2) was the fastest, most direct, and most
>>natural road to open access: 
>>And since then its accumulating momentum seems to be confirming that 
>>is indeed so:
>>>  The primary sense of belonging
>>>  of a scholar in her research activities is with the disciplinary
>>>  community of which she thinks herself a part... It certainly
>>>  is not with the institution. 
>>That may or may not be the case, but in any case it is irrelevant to
>>the question of which is the more promising route to open-access. Our
>>primary sense of belonging may be with our family, our community,
>>our creed, our tribe, or even our species. But our rewards (research
>>grant funding and overheads, salaries, postdocs and students attracted
>>to our research, prizes and honors) are intertwined and shared with our
>>institutions (our employers) and not our disciplines (which are often
>>in fact the locus of competition for those same rewards!)
>>>  Therefore, if you want to fill
>>>  institutional archives---which I agree is the best long-run way
>>>  to enhance access and preservation to scholarly research--- [the]
>>>  institutional archive has to be accompanied by a discipline-based
>>>  aggregation process. 
>>But the question is whether this "aggregation" needs to be 
the "feeding"
>>of institutional OAI archive contents into disciplinary OAI archives, 
>>merely the "feeding" of OAI metadata into OAI services.
>>>   The RePEc project has produced such an aggregator
>>>  for economics for a while now. I am sure that other, similar
>>>  projects will follow the same aims, but, with the benefit of
>>>  hindsight, offer superior service. The lack of such services
>>>  in many disciplines,  or the lack of interoperability between
>>>  disciplinary and  institutional archives, are major obstacle to
>>>  the filling  the institutional archives.  There are no
>>>  inherent contradictions between institution-based archives
>>>  and disciplinary aggregators,
>>There is no contradiction. In fact, I suspect this will prove to be a
>>non-issue, once we confirm that (a) we agree on the need for
>>OAI-compliance and (b) "aggregation" amounts to 
metadata-harvesting and
>>OAI service-provision when the full-texts are in the institutional
>>archive are OAI-compliant (and calls for full-text harvesting only
>>if/when they are not). Content "aggregation," in other words, 
is a
>>paper-based notion. In the online era, it merely means digital sorting
>>of the pointers to the content.
>>>  In the paper that Stevan refers to, Cliff Lynch writes,
>>>  at
>>>cl> But consider the plight of a faculty member seeking only 
>>>cl> dissemination and availability of his or her traditional 
>>>cl> articles, book chapters, or perhaps even monographs through 
use of
>>>cl> the network, working in parallel with the traditional 
>>>cl> publishing system.
>>>  I am afraid, there more and more such faculty members. Much
>>>  of the research papers found over the Internet are deposited
>>>  in the way. This trend is growing not declining.
>>You mean self-archiving in arbitrary non-OAI author websites? There is
>>another reason why institutional OAI archives and official 
>>self-archiving policies (and assistance) are so important. In reality,
>>it is far easier to deposit and maintain one's papers in institutional
>>OAI archives like Eprints than to set up and maintain one's own 
>>All that is needed is a clear official institutional policy, plus
>>some startup help in launching it. (No such thing is possible at a
>>"discipline" level.)
>>>cl> Such a faculty member faces several time-consuming problems. 
He or
>>>cl> she must exercise stewardship over the actual content and 
>>>cl> metadata: migrating the content to new formats as they 
evolve over
>>>cl> time, creating metadata describing the content, and ensuring 
>>>cl> metadata is available in the appropriate schemas and formats 
>>>cl> through appropriate protocol interfaces such as open 
>>>cl> metadata harvesting.
>>>  Sure, but academics do not like their work-, and certainly
>>>  not their publishing-habits, [to] be interfered with by external
>>>  forces. Organizing academics is like herding cats!
>>I am sure academics didn't like to be herded into publishing with the
>>threat of perishing either. Nor did they like switching from paper to
>>word-processors. Their early counterparts probably clung to the oral
>>tradition, resisting writing too; and monks did not like be herded from
>>their peaceful manuscript-illumination chambers to the clamour of
>>printing presses. But where there is a causal contingency -- as there 
>>between (a) the research impact and its rewards, which academics like 
>>much as anyone else, and (b) the accessibility of their research -- 
>>are surely no less responsive than Prof. Skinner's pigeons and rats to
>>those causal contingencies, and which buttons they will have to press 
>>in order to maximize their rewards!
>>Besides, it is not *publishing* habits that need to be changed, but
>>*archiving* habits, which are an online supplement, not a substitute,
>>for existing (and unchanged) publishing habits.
>>>cl> Faculty are typically best at creating new
>>>cl> knowledge, not maintaining the record of this process of
>>>cl> creation. Worse still, this faculty member must not only 
>>>cl> content but must manage a dissemination system such as a 
personal Web
>>>cl> site, playing the role of system administrator (or the 
manager of
>>>cl> someone serving as a system administrator).
>>>  There are lot of ways in which to maintain a web site or to get
>>>  access to a maintained one. It is a customary activity these days 
>>>  no longer requires much technical expertise. A primitive 
>>>  of the contents can be done by Google, it requires  no metadata.
>>>  Academics don't care  about long-run preservation, so that 
>>>  remains unsolved. In the meantime, the academic who uploads 
papers to a web
>>>  site takes steps to resolve the most pressing problem, access.
>>Agreed. And uploading it into a departmental OAI Eprints Archive is 
>>by far the simplest way and most effective way to do all of that. All 
>>needs is a policy to mandate it:
>>>cl> Over the past few years, this has ceased to be a reasonable 
>>>cl> for most amateurs; software complexity, security risks, 
>>>cl> requirements, and other problems have generally relegated 
>>>cl> operation of Web sites to professionals who can exploit 
economies of
>>>cl> scale, and who can begin each day with a review of recently 
>>>cl> security patches.
>>>  These are technical concerns. When you operate a linux box
>>>  on the web you simply fire up a script that will download
>>>  the latest version. That is easy enough. Most departments
>>>  have separate web operations. Arguing for one institutional
>>>  archive for digital contents is akin to calling for a single web
>>>  site for an institution. The diseconomies of scale of central
>>>  administration impose other types of costs that the ones that it 
was to
>>>  reduce. The secret is to find a middle way.
>>I couldn't quite follow all of this. The bottom line is this: The free
>> software (for example) can be installed within a few days. 
>>can then be replicated to handle all the departmental or research group
>>archives a university wants, with minimal maintenance time or costs. 
>>rest is just down to self-archiving, which takes a few minutes for the
>>first paper, and even less time for subsequent papers (as the repeating
>>metadata -- author, institution, etc., can be "cloned" into 
each new
>>deposit template). An institution may wish to impose an institutional
>>"look" on all of its separate eprints archives; but apart 
from that,
>>they can be as autonomous and as distributed and as many as desired:
>>OAI-interoperability works locally just as well as it does globally.
>>>cl> Today, our faculty time is being wasted, and expended 
>>>cl> on system administration activities and content curation. 
>>>cl> because system administration is ineffective, it places our
>>>cl> institutions at risk: because faculty are generally not 
capable of
>>>cl> responding to the endless series of security exposures and 
>>>cl> our university networks are riddled with vulnerable faculty 
>>>cl> intended to serve as points of distribution for scholarly 
>>>  This is the fight many faculty face every day, where they
>>>  want to innovate scholarly communication, but someone
>>>  in the IT department does not give the necessary permission
>>>  for network access...
>>I don't think I need to get into this. It's not specific to
>>self-archiving, and a tempest in a teapot as far as that is concerned. 
>>efficient system can and will be worked out once there is an effective
>>institutional self-archiving policy. There are already plenty of 
>>examples, such as CalTech: 
>>See also:
>>Stevan Harnad

hussein suleman ~ hussein AT ~

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

 E-mail: .