Budapest Open Access Initiative: BOAI Forum Archive[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
[BOAI] Re: Cliff Lynch on Institutional Archives
From: "Paul Cummins" <comyn AT utk.edu>
I have thought about trying to make sets for each subject entry, and then ran across the idea of a "home set" identifier that would point to the ↵ original association. But I am just beginning to work with OAI and probably need to read all the archives. :) --Paul Cummins UT Library, Systems > hi > > this may be stating the obvious, but why not use sets for the separate > disciplines, aimed at particular service providers? i say it that way > because some disciplines are not well-defined (namely, computer science) ↵ so > such archives may want to play ball with multiple service providers and > hence may need different sets. > > in any event, for something like physics, a simple set might do the trick > at the source. then, somewhat in keeping with the Kepler model (as > published in DLib a while back), the service provider can provide an > interface for potential data providers to self-register. i know this ↵ sounds > dodgy, but think of it as an alternative mechanism for > contribution. either individual users submit individual papers or groups > submit baseURLS - both go through some kind of review and while one leads > to once-off storage, the other leads to periodic harvesting. > > what remains a difficult problem, however, is how to recreate the ↵ metadata > used by the service provider as its native format. so, for a typical > example, if arXiv classifies items using a specific set > structure, this is certainly not going to be the default for an > institutional archive. does the service provider automatically or ↵ manually > reclassify? or does it not allow browsing by categories? in either event, > the quality of the metadata from the perspective of the service provider > may be an impetus for potential users to want to replicate their effort > rather than rely on the automated submission from their own institutions > ... this needs more thought ... > > ttfn, > ----hussein > > > Christopher Gutteridge wrote: >> Disciplinary/subject archives vs. Institutional/Organisation/Region ↵ based >> archives. This is going to be a key challenge now open archives begin ↵ to >> gain momentum. >> >> For example; we are planning a University-wide eprints archive. I am >> concerned that some physisists will want to place their items in both ↵ the >> university eprints service AND the arXiv physics archive. They may be >> required to use the university service, but want to use arXiv as it is ↵ the >> primary source for their discipline. This is a duplication of effort ↵ and >> a potential irritation. >> >> Ultimately, of course, I'd hope that diciplinary archives will be ↵ replaced >> with subject-specific OAI service providers harvesting from the >> institutional archives. But there is going to be a very long ↵ transition >> period in which the solution evolves from our experience. >> >> What I'm asking is; has anyone given consideration to ways of ↵ smoothing >> over this duplication of effort? Possibly some negotiated automated >> process for insitutional archives uploading to the subject archive, or ↵ at >> least assisting the author in the process. >> >> This isn't the biggest issue, but it'd be good to address it before it >> becomes more of a problem. >> >> Christopher Gutteridge >> GNU EPrints Head Developer >> http://software.eprints.org/ >> >> On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote: >> >>>On Sat, 15 Mar 2003, Thomas Krichel wrote: >>> >>> >>>> Stevan Harnad writes: >>>> >>>>sh> There is no need -- in the age of OAI-interoperability ↵ -- for sh> >>>> institutional archives to "feed" central ↵ disciplinary archives: >>>> >>>> I do not share what I see as a blind faith in ↵ interoperability through >>>> a technical protocol. >>> >>>I am quite happy to defer to the technical OAI experts on this one, ↵ but >>> let us put the question precisely: >>> >>>Thomas Krichel suggests that institutional (OAI) data-archives >>>(full-texts) should "feed" disciplinary (OAI) ↵ data-archives, >>>because OAI-interoperability is somehow not enough. I suggest that >>> OAI-interoperability (if I understand it correctly) should be ↵ enough. No >>> harm in redundant archiving, of course, for backup and security, ↵ but not >>> necessary for the usage and functionality itself. In fact, if I ↵ understand >>> correctly the intent of the OAI distinction between OAI ↵ data-providers -- >>> http://www.openarchives.org/Register/BrowseSites.pl >>>-- and OAI service-providers -- >>>http://www.openarchives.org/service/listproviders.html >>>-- it is not the full-texts of data-archives that need to be ↵ "fed" to >>> (i.e., harvested by) the OAI service providers, but only their ↵ metadata. >>> >>>Hence my conclusion that distributed, interoperable OAI ↵ institutional >>> archives are enough (and the fastest route to open-access). No ↵ need to >>> harvest their contents into central OAI discipline-based archives ↵ (except >>> perhaps for redundancy, as backup). Their OAI interoperability ↵ should be >>> enough so that the OAI service-providers can (among other things) ↵ do the >>> "virtual aggregation" by discipline (or any other ↵ computable criterion) by >>> harvesting the metadata alone, without the need to harvest ↵ full-text >>> data-contents too. >>> >>>It should be noted, though, that Thomas Krichel's excellent RePec ↵ archive >>> and service in Economics -- http://repec.org/ -- goes >>>well beyond the confines of OAI-harvesting! RePec harvests non-OAI ↵ content >>> too, along lines similar to the way ResearchIndex/citeseer -- >>> http://citeseer.nj.nec.com/cs -- harvests non-OAI content in ↵ computer >>> science. What I said about there being no need to "feed" ↵ institutional OAI >>> archive content into disciplinary OAI archives certainly does not ↵ apply to >>> *non-OAI* content, which would otherwise be scattered willy-nilly ↵ all over >>> the net and not integrated in any way. Here RePec's and ↵ ResearchIndex's >>> harvesting is invaluable, especially as RePec already does (and >>> ResearchIndex has announced that it plans to) make all its ↵ harvested >>> content OAI-compliant! >>> >>>To summarize: The goal is to get all research papers, pre- and >>>post-peer-review, openly accessible (and OAI-interoperable) as soon ↵ as >>> possible. (These are BOAI Strategies 1 [self-archiving] and 2 >>>[open-access journals]: http://www.soros.org/openaccess/read.shtml ↵ ). In >>> principle this can be done by (1) self-archiving them in central ↵ OAI >>> disciplinary archives like the Physics arXiv (the biggest and ↵ first of its >>> kind) -- http://arxiv.org/show_monthly_submissions >>>-- by (2) self-archiving them in distributed institutional OAI >>>Archives -- http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt -- by ↵ (3) >>> self-archiving them on arbitrary Web and FTP sites (and hoping ↵ they will >>> be found or harvested by services like Repec or ResearchIndex) or ↵ by (4) >>> publishing them in open-access journals (BOAI Strategy 2: >>> http://www.soros.org/openaccess/journals.shtml ). >>> >>>My point was only that because researchers and their institutions ↵ (*not* >>> their disciplines) have shared interests vested in maximizing ↵ their joint >>> research impact and its rewards, institution-based >>>self-archiving (2) is a more promising way to go -- in the age of >>> OAI-interoperability -- than discipline-based self-archiving (1), ↵ even >>> though the latter began earlier. It is also obvious that both (1) ↵ and (2) >>> are preferable to arbitrary Web and FTP self-archiving (3), which ↵ began >>> even earlier (although harvesting arbitrary Website and FTP ↵ contents into >>> OAI-compliant Archives is still a welcome makeshift strategy until ↵ the >>> practise of OAI self-archiving is up to speed). Creating new ↵ open-access >>> journals and converting the established (20,000) toll-access ↵ journals to >>> open-access is desirable too, but it is obviously a much slower ↵ and more >>> complicated path to open access than self-archiving, so should be ↵ pursued >>> in parallel. >>> >>>My conclusion in favor of institutional self-archiving is based on ↵ the >>> evidence and on logic, and it represents a change of thinking, >>>for I had originally advocated (3) Web/FTP self-archiving -- >>>http://www.arl.org/scomm/subversive/toc.html -- then switched ↵ allegiance >>> to central self-archiving (1), even creating a discipline-based ↵ archive: >>> http://cogprints.ecs.soton.ac.uk/ But with the advent of OAI in ↵ 1999, plus >>> a little reflection, it became apparent that >>>institutional self-archiving (2) was the fastest, most direct, and ↵ most >>> natural road to open access: http://www.eprints.org/ >>>And since then its accumulating momentum seems to be confirming ↵ that this >>> is indeed so: ↵ http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2212.html >>> http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt >>> >>> >>>> The primary sense of belonging >>>> of a scholar in her research activities is with the ↵ disciplinary >>>> community of which she thinks herself a part... It certainly >>>> is not with the institution. >>> >>>That may or may not be the case, but in any case it is irrelevant ↵ to the >>> question of which is the more promising route to open-access. Our ↵ primary >>> sense of belonging may be with our family, our community, our ↵ creed, our >>> tribe, or even our species. But our rewards (research grant ↵ funding and >>> overheads, salaries, postdocs and students attracted to our ↵ research, >>> prizes and honors) are intertwined and shared with our ↵ institutions (our >>> employers) and not our disciplines (which are often in fact the ↵ locus of >>> competition for those same rewards!) >>> >>> >>>> Therefore, if you want to fill >>>> institutional archives---which I agree is the best long-run ↵ way to >>>> enhance access and preservation to scholarly research--- [the] >>>> institutional archive has to be accompanied by a ↵ discipline-based >>>> aggregation process. >>> >>>But the question is whether this "aggregation" needs to ↵ be the "feeding" >>> of institutional OAI archive contents into disciplinary OAI ↵ archives, or >>> merely the "feeding" of OAI metadata into OAI services. >>> >>> >>>> The RePEc project has produced such an aggregator >>>> for economics for a while now. I am sure that other, similar >>>> projects will follow the same aims, but, with the benefit of >>>> hindsight, offer superior service. The lack of such services >>>> in many disciplines, or the lack of interoperability between >>>> disciplinary and institutional archives, are major obstacle ↵ to the >>>> filling the institutional archives. There are no >>>> inherent contradictions between institution-based archives >>>> and disciplinary aggregators, >>> >>>There is no contradiction. In fact, I suspect this will prove to be ↵ a >>> non-issue, once we confirm that (a) we agree on the need for >>>OAI-compliance and (b) "aggregation" amounts to ↵ metadata-harvesting and >>> OAI service-provision when the full-texts are in the institutional ↵ archive >>> are OAI-compliant (and calls for full-text harvesting only if/when ↵ they >>> are not). Content "aggregation," in other words, is a ↵ paper-based notion. >>> In the online era, it merely means digital sorting of the pointers ↵ to the >>> content. >>> >>> >>>> In the paper that Stevan refers to, Cliff Lynch writes, >>>> at http://www.arl.org/newsltr/226/ir.html >>>> >>>>cl> But consider the plight of a faculty member seeking only ↵ broader cl> >>>> dissemination and availability of his or her traditional ↵ journal cl> >>>> articles, book chapters, or perhaps even monographs through ↵ use of cl> >>>> the network, working in parallel with the traditional ↵ scholarly cl> >>>> publishing system. >>>> >>>> I am afraid, there more and more such faculty members. Much >>>> of the research papers found over the Internet are deposited >>>> in the way. This trend is growing not declining. >>> >>>You mean self-archiving in arbitrary non-OAI author websites? There ↵ is >>> another reason why institutional OAI archives and official ↵ institutional >>> self-archiving policies (and assistance) are so important. In ↵ reality, it >>> is far easier to deposit and maintain one's papers in ↵ institutional OAI >>> archives like Eprints than to set up and maintain one's own ↵ website. All >>> that is needed is a clear official institutional policy, plus some ↵ startup >>> help in launching it. (No such thing is possible at a ↵ "discipline" level.) >>>http://www.ecs.soton.ac.uk/~lac/archpol.html >>>http://www.eprints.org/self-faq/#institution-facilitate-filling >>> http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm >>>http://paracite.eprints.org/cgi-bin/rae_front.cgi >>> >>> >>>>cl> Such a faculty member faces several time-consuming ↵ problems. He or >>>> cl> she must exercise stewardship over the actual content ↵ and its cl> >>>> metadata: migrating the content to new formats as they evolve ↵ over cl> >>>> time, creating metadata describing the content, and ensuring ↵ the cl> >>>> metadata is available in the appropriate schemas and formats ↵ and cl> >>>> through appropriate protocol interfaces such as open archives ↵ cl> >>>> metadata harvesting. >>>> >>>> Sure, but academics do not like their work-, and certainly >>>> not their publishing-habits, [to] be interfered with by ↵ external >>>> forces. Organizing academics is like herding cats! >>> >>>I am sure academics didn't like to be herded into publishing with ↵ the >>> threat of perishing either. Nor did they like switching from paper ↵ to >>> word-processors. Their early counterparts probably clung to the ↵ oral >>> tradition, resisting writing too; and monks did not like be herded ↵ from >>> their peaceful manuscript-illumination chambers to the clamour of ↵ printing >>> presses. But where there is a causal contingency -- as there is ↵ between >>> (a) the research impact and its rewards, which academics like as ↵ much as >>> anyone else, and (b) the accessibility of their research -- ↵ academics are >>> surely no less responsive than Prof. Skinner's pigeons and rats to ↵ those >>> causal contingencies, and which buttons they will have to press ↵ in order >>> to maximize their rewards! >>>http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm >>> >>>Besides, it is not *publishing* habits that need to be changed, but >>> *archiving* habits, which are an online supplement, not a ↵ substitute, for >>> existing (and unchanged) publishing habits. >>> >>> >>>>cl> Faculty are typically best at creating new >>>>cl> knowledge, not maintaining the record of this process of >>>>cl> creation. Worse still, this faculty member must not only ↵ manage cl> >>>> content but must manage a dissemination system such as a ↵ personal Web cl> >>>> site, playing the role of system administrator (or the manager ↵ of cl> >>>> someone serving as a system administrator). >>>> >>>> There are lot of ways in which to maintain a web site or to ↵ get access >>>> to a maintained one. It is a customary activity these days and ↵ no >>>> longer requires much technical expertise. A primitive ↵ integration of >>>> the contents can be done by Google, it requires no metadata. ↵ Academics >>>> don't care about long-run preservation, so that problem ↵ remains >>>> unsolved. In the meantime, the academic who uploads papers to ↵ a web >>>> site takes steps to resolve the most pressing problem, access. >>> >>>Agreed. And uploading it into a departmental OAI Eprints Archive is ↵ by >>> far the simplest way and most effective way to do all of that. All ↵ it >>> needs is a policy to mandate it: >>>http://www.ecs.soton.ac.uk/~lac/archpol.html >>> >>> >>>>cl> Over the past few years, this has ceased to be a ↵ reasonable activity >>>> cl> for most amateurs; software complexity, security risks, ↵ backup cl> >>>> requirements, and other problems have generally relegated ↵ effective cl> >>>> operation of Web sites to professionals who can exploit ↵ economies of cl> >>>> scale, and who can begin each day with a review of recently ↵ issued cl> >>>> security patches. >>>> >>>> These are technical concerns. When you operate a linux box >>>> on the web you simply fire up a script that will download >>>> the latest version. That is easy enough. Most departments >>>> have separate web operations. Arguing for one institutional >>>> archive for digital contents is akin to calling for a single ↵ web site >>>> for an institution. The diseconomies of scale of central ↵ administration >>>> impose other types of costs that the ones that it was to ↵ reduce. The >>>> secret is to find a middle way. >>> >>>I couldn't quite follow all of this. The bottom line is this: The ↵ free >>> Eprints.org software (for example) can be installed within a few ↵ days. It >>> can then be replicated to handle all the departmental or research ↵ group >>> archives a university wants, with minimal maintenance time or ↵ costs. The >>> rest is just down to self-archiving, which takes a few minutes for ↵ the >>> first paper, and even less time for subsequent papers (as the ↵ repeating >>> metadata -- author, institution, etc., can be "cloned" ↵ into each new >>> deposit template). An institution may wish to impose an ↵ institutional >>> "look" on all of its separate eprints archives; but ↵ apart from that, they >>> can be as autonomous and as distributed and as many as desired: >>> OAI-interoperability works locally just as well as it does ↵ globally. >>> >>> >>>>cl> Today, our faculty time is being wasted, and expended ↵ ineffectively, >>>> cl> on system administration activities and content ↵ curation. And, cl> >>>> because system administration is ineffective, it places our ↵ cl> >>>> institutions at risk: because faculty are generally not ↵ capable of cl> >>>> responding to the endless series of security exposures and ↵ patches, cl> >>>> our university networks are riddled with vulnerable faculty ↵ machines cl> >>>> intended to serve as points of distribution for scholarly ↵ works. >>>> >>>> This is the fight many faculty face every day, where they >>>> want to innovate scholarly communication, but someone >>>> in the IT department does not give the necessary permission >>>> for network access... >>> >>>I don't think I need to get into this. It's not specific to >>>self-archiving, and a tempest in a teapot as far as that is ↵ concerned. An >>> efficient system can and will be worked out once there is an ↵ effective >>> institutional self-archiving policy. There are already plenty of ↵ excellent >>> examples, such as CalTech: >>>http://library.caltech.edu/digital/ >>>See also: >>>http://software.eprints.org/#ep2 >>> >>>Stevan Harnad >> >> > > > -- > ===================================================================== > hussein suleman ~ hussein AT cs.uct.ac.za ~ http://www.husseinsspace.com > ===================================================================== > > _______________________________________________ > OAI-general mailing list > OAI-general AT oaisrv.nsdl.cornell.edu > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general
[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
E-mail: firstname.lastname@example.org .