Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

boaiforum messages

[BOAI] Re: Cliff Lynch on Institutional Archives

From: "Paul Cummins" <comyn AT utk.edu>
Date: Thu, 27 Mar 2003 10:05:54 -0500 (EST)


Threading: [BOAI] Re: Cliff Lynch on Institutional Archives from hussein AT cs.uct.ac.za
      • This Message

 I have thought about trying to make sets for each subject entry, and then ran
across the idea of a "home set" identifier that would point to the 
original
association.  But I am just beginning to work with OAI and probably need to
read all the archives. :)
--Paul Cummins
UT Library, Systems


> hi
>
> this may be stating the obvious, but why not use sets for the separate
> disciplines, aimed at particular service providers? i say it that way
> because some disciplines are not well-defined (namely, computer science)  
so
> such archives may want to play ball with multiple service providers  and
> hence may need different sets.
>
> in any event, for something like physics, a simple set might do the  trick
> at the source. then, somewhat in keeping with the Kepler model (as
> published in DLib a while back), the service provider can provide an
> interface for potential data providers to self-register. i know this  
sounds
> dodgy, but think of it as an alternative mechanism for
> contribution. either individual users submit individual papers or groups
> submit baseURLS - both go through some kind of review and while one  leads
> to once-off storage, the other leads to periodic harvesting.
>
> what remains a difficult problem, however, is how to recreate the  
metadata
> used by the service provider as its native format. so, for a  typical
> example, if arXiv classifies items using a specific set
> structure, this is certainly not going to be the default for an
> institutional archive. does the service provider automatically or  
manually
> reclassify? or does it not allow browsing by categories? in  either event,
> the quality of the metadata from the perspective of the  service provider
> may be an impetus for potential users to want to  replicate their effort
> rather than rely on the automated submission from  their own institutions
> ... this needs more thought ...
>
> ttfn,
> ----hussein
>
>
> Christopher Gutteridge wrote:
>> Disciplinary/subject archives vs. Institutional/Organisation/Region 
based
>> archives. This is going to be a key challenge now open archives begin 
to
>> gain momentum.
>>
>> For example; we are planning a University-wide eprints archive. I am
>> concerned that some physisists will want to place their items in both 
the
>> university eprints service AND the arXiv physics archive. They may  be
>> required to use the university service, but want to use arXiv as it is 
the
>> primary source for their discipline. This is a duplication of  effort 
and
>> a potential irritation.
>>
>> Ultimately, of course, I'd hope that diciplinary archives will be 
replaced
>> with subject-specific OAI service providers harvesting from the
>> institutional archives. But there is going to be a very long 
transition
>> period in which the solution evolves from our experience.
>>
>> What I'm asking is; has anyone given consideration to ways of 
smoothing
>> over this duplication of effort? Possibly some negotiated automated
>> process for insitutional archives uploading to the subject archive, or 
at
>> least assisting the author in the process.
>>
>> This isn't the biggest issue, but it'd be good to address it before it
>> becomes more of a problem.
>>
>>   Christopher Gutteridge
>>   GNU EPrints Head Developer
>>   http://software.eprints.org/
>>
>> On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote:
>>
>>>On Sat, 15 Mar 2003, Thomas Krichel wrote:
>>>
>>>
>>>>  Stevan Harnad writes:
>>>>
>>>>sh> There is no need -- in the age of OAI-interoperability 
-- for sh>
>>>> institutional archives to "feed" central 
disciplinary archives:
>>>>
>>>>  I do not share what I see as a  blind faith in 
interoperability through
>>>> a technical protocol.
>>>
>>>I am quite happy to defer to the technical OAI experts on this one, 
but
>>> let us put the question precisely:
>>>
>>>Thomas Krichel suggests that institutional (OAI) data-archives
>>>(full-texts) should "feed" disciplinary (OAI) 
data-archives,
>>>because OAI-interoperability is somehow not enough. I suggest that
>>> OAI-interoperability (if I understand it correctly) should be 
enough. No
>>> harm in redundant archiving, of course, for backup and security, 
but not
>>> necessary for the usage and functionality itself. In fact, if I 
understand
>>> correctly the intent of the OAI distinction between OAI 
data-providers --
>>> http://www.openarchives.org/Register/BrowseSites.pl
>>>-- and OAI service-providers --
>>>http://www.openarchives.org/service/listproviders.html
>>>-- it is not the full-texts of data-archives that need to be 
"fed" to
>>> (i.e., harvested by) the OAI service providers, but only their 
metadata.
>>>
>>>Hence my conclusion that distributed, interoperable OAI 
institutional
>>> archives are enough (and the fastest route to open-access). No 
need to
>>> harvest their contents into central OAI discipline-based archives 
(except
>>> perhaps for redundancy, as backup). Their OAI interoperability 
should be
>>> enough so that the OAI service-providers can (among other things) 
do the
>>> "virtual aggregation" by discipline (or any other 
computable criterion) by
>>> harvesting the metadata alone, without the need to harvest 
full-text
>>> data-contents too.
>>>
>>>It should be noted, though, that Thomas Krichel's excellent RePec 
archive
>>> and service in Economics -- http://repec.org/ -- goes
>>>well beyond the confines of OAI-harvesting! RePec harvests non-OAI 
content
>>> too, along lines similar to the way ResearchIndex/citeseer --
>>> http://citeseer.nj.nec.com/cs -- harvests non-OAI content in 
computer
>>> science. What I said about there being no need to "feed" 
institutional OAI
>>> archive content into disciplinary OAI archives certainly does not 
apply to
>>> *non-OAI* content, which would otherwise be scattered willy-nilly 
all over
>>> the net and not integrated in any way. Here RePec's and 
ResearchIndex's
>>> harvesting is invaluable, especially as RePec already does (and
>>> ResearchIndex has announced that it plans to) make all its 
harvested
>>> content OAI-compliant!
>>>
>>>To summarize: The goal is to get all research papers, pre- and
>>>post-peer-review, openly accessible (and OAI-interoperable) as soon 
as
>>> possible. (These are BOAI Strategies 1 [self-archiving] and 2
>>>[open-access journals]: http://www.soros.org/openaccess/read.shtml 
). In
>>> principle this can be done by (1) self-archiving them in central 
OAI
>>> disciplinary archives like the Physics arXiv (the biggest and 
first of its
>>> kind) -- http://arxiv.org/show_monthly_submissions
>>>-- by (2) self-archiving them in distributed institutional OAI
>>>Archives -- http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt -- by 
(3)
>>> self-archiving them on arbitrary Web and FTP sites (and hoping 
they will
>>> be found or harvested by services like Repec or ResearchIndex) or 
by (4)
>>> publishing them in open-access journals (BOAI Strategy 2:
>>> http://www.soros.org/openaccess/journals.shtml ).
>>>
>>>My point was only that because researchers and their institutions 
(*not*
>>> their disciplines) have shared interests vested in maximizing 
their joint
>>> research impact and its rewards, institution-based
>>>self-archiving (2) is a more promising way to go -- in the age of
>>> OAI-interoperability -- than discipline-based self-archiving (1), 
even
>>> though the latter began earlier. It is also obvious that both (1) 
and (2)
>>> are preferable to arbitrary Web and FTP self-archiving (3), which 
began
>>> even earlier (although harvesting arbitrary Website and FTP 
contents into
>>> OAI-compliant Archives is still a welcome makeshift strategy until 
the
>>> practise of OAI self-archiving is up to speed). Creating new 
open-access
>>> journals and converting the established (20,000) toll-access 
journals to
>>> open-access is desirable too, but it is obviously a much slower 
and more
>>> complicated path to open access than self-archiving, so should be 
pursued
>>> in parallel.
>>>
>>>My conclusion in favor of institutional self-archiving is based on 
the
>>> evidence and on logic, and it represents a change of thinking,
>>>for I had originally advocated (3) Web/FTP self-archiving --
>>>http://www.arl.org/scomm/subversive/toc.html -- then switched 
allegiance
>>> to central self-archiving (1), even creating a discipline-based 
archive:
>>> http://cogprints.ecs.soton.ac.uk/ But with the advent of OAI in 
1999, plus
>>> a little reflection, it became apparent that
>>>institutional self-archiving (2) was the fastest, most direct, and 
most
>>> natural road to open access: http://www.eprints.org/
>>>And since then its accumulating momentum seems to be confirming 
that this
>>> is indeed so: 
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2212.html
>>> http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt
>>>
>>>
>>>>  The primary sense of belonging
>>>>  of a scholar in her research activities is with the 
disciplinary
>>>> community of which she thinks herself a part... It certainly
>>>>  is not with the institution.
>>>
>>>That may or may not be the case, but in any case it is irrelevant 
to the
>>> question of which is the more promising route to open-access. Our 
primary
>>> sense of belonging may be with our family, our community, our 
creed, our
>>> tribe, or even our species. But our rewards (research grant 
funding and
>>> overheads, salaries, postdocs and students attracted to our 
research,
>>> prizes and honors) are intertwined and shared with our 
institutions (our
>>> employers) and not our disciplines (which are often in fact the 
locus of
>>> competition for those same rewards!)
>>>
>>>
>>>>  Therefore, if you want to fill
>>>>  institutional archives---which I agree is the best long-run 
way to
>>>> enhance access and preservation to scholarly research--- [the]
>>>> institutional archive has to be accompanied by a 
discipline-based
>>>> aggregation process.
>>>
>>>But the question is whether this "aggregation" needs to 
be the "feeding"
>>> of institutional OAI archive contents into disciplinary OAI 
archives, or
>>> merely the "feeding" of OAI metadata into OAI services.
>>>
>>>
>>>>   The RePEc project has produced such an aggregator
>>>>  for economics for a while now. I am sure that other, similar
>>>>  projects will follow the same aims, but, with the benefit of
>>>>  hindsight, offer superior service. The lack of such services
>>>>  in many disciplines,  or the lack of interoperability between
>>>> disciplinary and  institutional archives, are major obstacle 
to the
>>>> filling  the institutional archives.  There are no
>>>>  inherent contradictions between institution-based archives
>>>>  and disciplinary aggregators,
>>>
>>>There is no contradiction. In fact, I suspect this will prove to be 
a
>>> non-issue, once we confirm that (a) we agree on the need for
>>>OAI-compliance and (b) "aggregation" amounts to 
metadata-harvesting and
>>> OAI service-provision when the full-texts are in the institutional 
archive
>>> are OAI-compliant (and calls for full-text harvesting only if/when 
they
>>> are not). Content "aggregation," in other words, is a 
paper-based notion.
>>> In the online era, it merely means digital sorting of the pointers 
to the
>>> content.
>>>
>>>
>>>>  In the paper that Stevan refers to, Cliff Lynch writes,
>>>>  at http://www.arl.org/newsltr/226/ir.html
>>>>
>>>>cl> But consider the plight of a faculty member seeking only 
broader cl>
>>>> dissemination and availability of his or her traditional 
journal cl>
>>>> articles, book chapters, or perhaps even monographs through 
use of cl>
>>>> the network, working in parallel with the traditional 
scholarly cl>
>>>> publishing system.
>>>>
>>>>  I am afraid, there more and more such faculty members. Much
>>>>  of the research papers found over the Internet are deposited
>>>>  in the way. This trend is growing not declining.
>>>
>>>You mean self-archiving in arbitrary non-OAI author websites? There 
is
>>> another reason why institutional OAI archives and official 
institutional
>>> self-archiving policies (and assistance) are so important. In 
reality, it
>>> is far easier to deposit and maintain one's papers in 
institutional OAI
>>> archives like Eprints than to set up and maintain one's own 
website. All
>>> that is needed is a clear official institutional policy, plus some 
startup
>>> help in launching it. (No such thing is possible at a 
"discipline" level.)
>>>http://www.ecs.soton.ac.uk/~lac/archpol.html
>>>http://www.eprints.org/self-faq/#institution-facilitate-filling
>>> http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
>>>http://paracite.eprints.org/cgi-bin/rae_front.cgi
>>>
>>>
>>>>cl> Such a faculty member faces several time-consuming 
problems. He or
>>>> cl> she must exercise stewardship over the actual content 
and its cl>
>>>> metadata: migrating the content to new formats as they evolve 
over cl>
>>>> time, creating metadata describing the content, and ensuring 
the cl>
>>>> metadata is available in the appropriate schemas and formats 
and cl>
>>>> through appropriate protocol interfaces such as open archives 
cl>
>>>> metadata harvesting.
>>>>
>>>>  Sure, but academics do not like their work-, and certainly
>>>>  not their publishing-habits, [to] be interfered with by 
external
>>>> forces. Organizing academics is like herding cats!
>>>
>>>I am sure academics didn't like to be herded into publishing with 
the
>>> threat of perishing either. Nor did they like switching from paper 
to
>>> word-processors. Their early counterparts probably clung to the 
oral
>>> tradition, resisting writing too; and monks did not like be herded 
from
>>> their peaceful manuscript-illumination chambers to the clamour of 
printing
>>> presses. But where there is a causal contingency -- as there is 
between
>>> (a) the research impact and its rewards, which academics like as 
much as
>>> anyone else, and (b) the accessibility of their research -- 
academics are
>>> surely no less responsive than Prof. Skinner's pigeons and rats to 
those
>>> causal contingencies, and which buttons they will have to press  
in order
>>> to maximize their rewards!
>>>http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm
>>>
>>>Besides, it is not *publishing* habits that need to be changed, but
>>> *archiving* habits, which are an online supplement, not a 
substitute, for
>>> existing (and unchanged) publishing habits.
>>>
>>>
>>>>cl> Faculty are typically best at creating new
>>>>cl> knowledge, not maintaining the record of this process of
>>>>cl> creation. Worse still, this faculty member must not only 
manage cl>
>>>> content but must manage a dissemination system such as a 
personal Web cl>
>>>> site, playing the role of system administrator (or the manager 
of cl>
>>>> someone serving as a system administrator).
>>>>
>>>>  There are lot of ways in which to maintain a web site or to 
get access
>>>> to a maintained one. It is a customary activity these days and 
no
>>>> longer requires much technical expertise. A primitive 
integration of
>>>> the contents can be done by Google, it requires  no metadata. 
Academics
>>>> don't care  about long-run preservation, so that problem 
remains
>>>> unsolved. In the meantime, the academic who uploads papers to 
a web
>>>> site takes steps to resolve the most pressing problem, access.
>>>
>>>Agreed. And uploading it into a departmental OAI Eprints Archive is 
 by
>>> far the simplest way and most effective way to do all of that. All 
it
>>> needs is a policy to mandate it:
>>>http://www.ecs.soton.ac.uk/~lac/archpol.html
>>>
>>>
>>>>cl> Over the past few years, this has ceased to be a 
reasonable activity
>>>> cl> for most amateurs; software complexity, security risks, 
backup cl>
>>>> requirements, and other problems have generally relegated 
effective cl>
>>>> operation of Web sites to professionals who can exploit 
economies of cl>
>>>> scale, and who can begin each day with a review of recently 
issued cl>
>>>> security patches.
>>>>
>>>>  These are technical concerns. When you operate a linux box
>>>>  on the web you simply fire up a script that will download
>>>>  the latest version. That is easy enough. Most departments
>>>>  have separate web operations. Arguing for one institutional
>>>>  archive for digital contents is akin to calling for a single 
web site
>>>> for an institution. The diseconomies of scale of central 
administration
>>>> impose other types of costs that the ones that it was to 
reduce. The
>>>> secret is to find a middle way.
>>>
>>>I couldn't quite follow all of this. The bottom line is this: The 
free
>>> Eprints.org software (for example) can be installed within a few 
days. It
>>> can then be replicated to handle all the departmental or research 
group
>>> archives a university wants, with minimal maintenance time or 
costs. The
>>> rest is just down to self-archiving, which takes a few minutes for 
the
>>> first paper, and even less time for subsequent papers (as the 
repeating
>>> metadata -- author, institution, etc., can be "cloned" 
into each new
>>> deposit template). An institution may wish to impose an 
institutional
>>> "look" on all of its separate eprints archives; but 
apart from that, they
>>> can be as autonomous and as distributed and as many as desired:
>>> OAI-interoperability works locally just as well as it does 
globally.
>>>
>>>
>>>>cl> Today, our faculty time is being wasted, and expended 
ineffectively,
>>>> cl> on system administration activities and content 
curation. And, cl>
>>>> because system administration is ineffective, it places our 
cl>
>>>> institutions at risk: because faculty are generally not 
capable of cl>
>>>> responding to the endless series of security exposures and 
patches, cl>
>>>> our university networks are riddled with vulnerable faculty 
machines cl>
>>>> intended to serve as points of distribution for scholarly 
works.
>>>>
>>>>  This is the fight many faculty face every day, where they
>>>>  want to innovate scholarly communication, but someone
>>>>  in the IT department does not give the necessary permission
>>>>  for network access...
>>>
>>>I don't think I need to get into this. It's not specific to
>>>self-archiving, and a tempest in a teapot as far as that is 
concerned. An
>>> efficient system can and will be worked out once there is an 
effective
>>> institutional self-archiving policy. There are already plenty of 
excellent
>>> examples, such as CalTech:
>>>http://library.caltech.edu/digital/
>>>See also:
>>>http://software.eprints.org/#ep2
>>>
>>>Stevan Harnad
>>
>>
>
>
> --
> =====================================================================
> hussein suleman ~ hussein AT cs.uct.ac.za ~ http://www.husseinsspace.com
> =====================================================================
>
> _______________________________________________
> OAI-general mailing list
> OAI-general AT oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general




[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

 E-mail:  openaccess@soros.org .