Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [options] [help]

boaiforum messages

[BOAI] Free Access vs. Open Access

From: Stevan Harnad <harnad AT>
Date: Mon, 11 Aug 2003 03:39:33 +0100 (BST)

Threading:      • This Message
             [BOAI] Re: Free Access vs. Open Access from harnad AT

BioMedCentral's "Open Access Now" is a useful newsletter, but its 
editorial contains some inadvertently misleading information that needs
to be corrected. What
actually said was this:

>     "Free Access is not Open Access"
>     "There seems to be a general misunderstanding that the aim of the
>     Open Access movement is to make the scientific research literature
>     free online. But there is a difference between "free access" 

>     and "open access"...
>     "The benefits and promise of Open Access will only be realized
>     when this distinction is clear in the minds of authors and
>     publishers. Only then can the literature move from being `free'
>     to being truly `open'."

I will quote/comment the full (short) editorial in a moment to show
why I think what it *should* instead have said is this:

    "Open Access Calls for Both Free Access and Open Usage"

    "There seems to be a general misunderstanding that the aim of the Open
    Access movement is *only* to make the scientific research literature
    free online... That is the first aim, but it also aims to make it
    fully usable."

The difference between the two messages is substantial. We are very far
from having free access to the refereed research literature, even though
it is within reach; vast amounts of potential research impact are for
this reason being needlessly lost; and it is free access that is urgently
needed to put an end to this loss. What free access we do have today,
however, is not constrained by any usage constraints. Hence the
difference between "free access" and "open access" is 
hypothetical right now: What is needed is more free access, not an
extension of free access to open access. To imply otherwise is simply to
saddle the research community with yet another red herring, instead of
what it really needs.

Here is the current situation, in rough practical and statistical

    (a) What the BOAI seeks is unrestricted toll-free
    full-text online access to the entire refereed research corpus
    (20,000 journals, 2,000,000 articles per year).

    (b) The way to achieve this is for researchers to (1) publish their
    papers in open-access journals whenever suitable ones exist
    (under 5% currently) and, for the rest of their papers (95%), to
    (2) self-archive them in their own institutional archives. [(1)
    is BOAI Strategy 2, and (2) is BOAI Strategy 1.]

    (c) Any form of restricted, gerrymandered online access (such as
    "ebrary"-based access that prevents down-loading, saving or
    printing-off) would not be open access (but there is none in sight
    so far to speak of).

That is all there is to it! Now, for those who are interested, a more
detailed quote/comment of the full (short) BMC editorial:

> Free Access is not Open Access

Not necessarily, in theory; but in reality and in practise, *all* of the
growing body of research today that is free-access is also open-access:

It can all be downloaded, saved, grepped, printed out, quote/commented,
and the URL can be sent to anyone who wishes to do likewise. All data
therein can also be used, *exactly* as they could be if read and copied
from the on-paper version. (It is simply an error, in other words, to
think of refereed, published articles as analogous to the genome database
or to software. It consists instead of texts, which are written to
be printed off, read, used, applied, built-upon, quoted/commented,
and cited. There is no question -- or need -- of republishing them or
altering them. They are already freely accessible to anyone with access
to the Web, and the only ones to update them are the authors; everyone
else must settle for quote/commenting, applying and citing.)

> There seems to be a general misunderstanding that the aim of the Open
> Access movement is to make the scientific research literature free
> online. But there is a difference between "free access" and 
"open access".

The aim of the Open Access movement *is* to make the scientific
(and scholarly) refereed-journal research literature -- full-text --
accessible toll-free online. Though there may be hypothetical ways
toll-free online access could be constrained so as to prevent
downloading, grepping, or printing, no such thing is happening. All the
free-access literature is also open-access.

> This distinction was part of what motivated the Bethesda definition of
> Open Access Principles that we published in the first issue of Open Access
> Now (July 14, 2003). That definition clearly states that access to the
> information should be free, but in addition the work should be open to
> re-use and redistribution

"Re-use and redistribution" has to be thought out more fully and 
than it is in the Bethesda definition -- insofar as refereed journal
articles are concerned. We are not talking about shared empirical
databases here but about the articles that appear in the 20,000 existing
(toll-access) peer-reviewed journals. The use one makes of those full
texts is to read them, print them off, quote/comment them, cite them,
and use their *contents* in further research, building on them. What is
"re-use"? And what is "redistribution" (when everyone on 
the planet with
access to the web has access to the full-text of every such article)?

> and that it should be deposited immediately
> upon publication in a public online repository (such as PubMed Central).

For the 95% solution, BOAI-1, depositing those toll-access articles in
the author's own institutional repository is the *means* by which they are
made free-access, by definition. For the remaining 5% (BOAI-2),
the fact that they are published by an open-access journal *entails*
(again, by definition) that they must be made freely accessible online
*somehow*. Likewise depositing them in a public online repository
(whether in a central one, like PubMed Central, or -- why not? -- in
the author's own institutional repository, this time too) seems like a
congenial solution to providing this essential feature of what it is
that makes an open-access journal open-access!

> Publishers who offer free online access on their own websites still have
> a long way to go before their research articles can be considered Open
> Access. 

I know of no publisher-provided toll-free online full-text access with
"ebrary"-style constraints on downloading, grepping, printing, etc. 
if there *are* any such cases (and they can successfully prevent downloading,
grepping, printing, etc.) then that sort of gerrymandered access should
not count as open access, and that publisher certainly doesn't count as
an open-access publisher. 

But what is the point? BOAI-1 is institutional self-archiving, not
publisher self-archiving, and it involves no ebrary-style
gerrymandering; and BOAI-2 *does* guarantee unconstrained
access. The fact that toll-access publishers do *not* provide toll-free
access is the whole point of the BOAI movement! If they did, we could
all go home now (and access it all)!

> The benefits and promise of Open Access will only be realized when
> this distinction is clear in the minds of authors and publishers. 

I think authors know perfectly well when they can and cannot access the
full text of an article (including download, storage, grepping
and printout) toll-free. Toll-access publishers know the difference
too. The difference between unconstrained free access and gerrymandered
ebrary-style access will also be fully felt and appreciated -- if and
when it ever comes to pass. So far, it's nowhere in sight! Hence, at
the moment, *all* the benefits of Open Access reside in free, full-text,
online access of the sort that a growing number of articles already have
(most of them through BOAI-1) but that most of the 2,000,000 articles
published annually still lack. It will not help them get it if we seek the
benefits and promise from promoting the free/open distinction, rather
than from promoting free access!

> Only then can the literature move from being `free' to being truly `open'.

The "move" we should all be dedicating 100% of our energy and 
to is the move from toll-access to free-access. That's the move that
awaits us impatiently, to at last stem our daily needless
impact-loss. There is no free-access literature straining to move from
free-access to open-access anywhere in sight at the moment.

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 
& 03):

Discussion can be posted to: september98-forum AT 

[BOAI] Re: Free Access vs. Open Access

From: Stevan Harnad <harnad AT>
Date: Tue, 12 Aug 2003 00:10:55 +0100 (BST)

Threading: [BOAI] Free Access vs. Open Access from harnad AT
      • This Message
             [BOAI] Re: Free Access vs. Open Access from harnad AT

On Mon, 11 Aug 2003, Matthew Cockerill wrote:

>sh>       "The use one makes of those full texts is to read them,
>sh>        print them off, quote/comment them, cite them, and use
>sh>        their *contents* in further research, building on them.
>sh>        What is "re-use"? And what is 
"redistribution" (when
>sh>        everyone on the planet with access to the web has access
>sh>        to the full-text of every such article)?"
> Having free access to articles on the publisher's website would certainly
> offer progress compared to the current status quo. But it would not offer
> anything like the benefits of true open access. 

Free access to the current 20,000 journals (2 million articles yearly)
would be like the difference between night and day. Compared to that,
the difference between "free" and "true open" access 
amounts to just a
few degrees of luminosity.

But let me agree at once that if free access were gerrymandered so
all the user could do was to browse the text on-screen, without being
able to download, save, grep, or print-off, then that would indeed
arbitrarily limit free access's usefulness. How many (if any) of the
several million free-access refereed-journal articles currently on the
web, however -- whether BOAI-1, BOAI-2, or otherwise -- are gerrymandered
in that way? If (as I suspect) the answer is "very few" or even 
that I know of," then this hypothetical constraint is not worth another
moment's thought or energy diverted from the real task at hand, which
is to turn night into day, as soon as possible.

> Here are just some of the
> reasons why re-use and re-distribution rights are vital to open access:
> (1) Digital permanence - it is not enough for the publisher to be the only
> body which curates the full archive of published research content. To 
> long term digital permanence of the scientific record, it is vital that
> articles should be deposited with multiple archives, and redistributable
> from and between those archives.

It seems to me that this is conflating (arbitrarily) two completely
independent matters. One is toll-free online *access* to the articles
in the 20K journals that are currently only accessible via tolls. The
other is the *preservation* of that toll-based corpus.

Well, preservation of that toll-based corpus was always a concern, in
on-paper days as in on-line days, and the concern has nothing whatsoever
to do with free (or open) access! We could have a failsafe preservation
system without free access, or we could have a failsafe preservation
with free access; or we could have an uncertain preservation system
without free access (as we do now) or an uncertain preservation system
with free access (bringing the present system out into the light of

The preservation burden has to be (and will be, and is being) faced in
any case. Why on earth should that entirely orthogonal longterm
task be coupled in *any way* to the immediate and urgent problem of free
access today? And why should "open access" be linked with or defined 
terms of the eventual solution to the preservation problem, one way or
the other? (This is not an argument for indifference to preservation: it
is an argument for decoupling two completely independent desiderata.)

> (2) A flexible choice of tools for searching and browsing
> The reason that Google exists is because the web is free for anyone to
> download and index. As a result, there is competition among search 
> and Google had the incentive to develop a better system for indexing web
> pages, which has since driven other search engine companies to improve the
> tools they offer.
> Compare this with the situation with scientific research. If the research
> resides only on the publisher's site, you don't have a free choice of what
> tools you use to search and browse it - you are stuck with what that
> particular publisher provides you with.

We are quite squarely in the domain of hypotheticals here. (Which
publisher's free-access corpus, inaccessible to google, are we talking
about?) But let us suppose that a publisher provides free access --
not gerrymandered free access, but free access that allows downloading,
saving, grepping and printing:

First, I will bet that such a publisher will want to maximize the
visibility and impact of his contents by allowing at least the indexing
metadata to be harvested, both by google, and by the OAI search engines
specializing in the refereed journal literature.

But even if we get doubly hypothetical here, and suppose the publisher
does *not* disclose the metadata to harvesters, there is
still a super-simple solution: Every author has an online
CV. Their CV will contain the metadata for every one of their
journal publications. (Such CVs can and will be OAI-compliant: ).
Add the URL for the free-access full-text on the publisher's website to
your CV entry and the circle is closed. (Better still, also self-archive
the full text in your own institutional OAI-compliant repository!)
End of story.

> This ties in with developments in Grid computing (e.g.
> ). With open access, published research
> would be available "on tap" via the grid, and scientists would 
be able to
> use their preferred choice of grid tools to access the data, rather than
> being stuck with the tools provided by the publisher.

As stated above, the CV/OAI gambit above already trivially takes care of
closing the circle. 

I agree, though, that for many research purposes, it is beneficial to
have not just the metadata but the full-text inverted and indexed, as
well as agent-harvestable and. Again, if the publisher's free-access site
doesn't do this, the author's institutional site certainly can and will.
In fact, authors and their institutions are the ones with the most
direct interest in making sure their own research output is maximally
usable in this way.

Let us not, however, conflate article-text archiving with
data-archiving. Data-archiving is important too, but it is an extra:
an independent new bonus of the online era, having nothing to do with
the question of toll-free access to article-texts. In the paper era, raw
data were not published, just summarized in what was published. Eventually
data will no doubt be incorporated into online publications in some way,
but until then there is certainly no need for authors to wait! They
can publish their article, as before, and, in addition, self-archive
the data on which their article is based in their own OAI-compliant
institutional research repository (the same repository in which
the full-text of their article can and should be self-archived too,
whether it appears in an open-access journal, a toll-access journal, or a
toll-access journal that offers toll-free access too). Again, the online
CV can close the circle, if it is not already closed of its own accord.

And this way, although it is functionally independent, data-archiving
can help speed the progress toward toll-free full-text access too.

> (3) Datamining
> With a million or so biomedical research articles being published each 
> the sheer volume of output is an obstacle to the comprehension and 
> of the results reported in that research. If the XML of the articles can 
> brought together in one place then the tools of datamining can be applied 
> it to extract useful but non-obvious information.

Agreed. See above. But before we get carried away with the potential
perks, let's not forget the still absent basics: Let there be Light
(toll-free full-text access), now! Leave the Solar-Energy and Club-Med
projects for when we already have our daily fill of photons.

> The simplest type of datamining is citation analysis
> Currently you need to pay ISI a lot of money to find out what cites what,
> but with true open access, citation analysis becomes trivial.

Perhaps not quite trivial. (There's still the problem of parsing,
identifying and linking the citations for all those articles without the
ultimate mark-up: But we're working on it: ).

But again, this is an independent perk, because you could have universal
citation linking and analysis even *without* toll-free full-text access!
For an article's reference list, like its indexing metadata (and its
accompanying empirical data) can all be self-archived by the author (guess
where?). We are in fact promoting this solution for royalty-based books,
whose authors, unlike journal article-authors, are unlikely to want to
make their full-texts accessible toll-free. Their metadata and reference
lists, however, are another matter, and can (and will) be tucked into
the institutional OAI-compliant repository too, with a new indicator of
global book citation impact as the harvestable reward.

> So, for example, if you view a PubMed record:
> ds=11667947&dopt=Abstract
> you already get links to all the full text articles in PubMed Central 
> cite that PubMed item
> d=11667947

And if you look at citebase, you will see how this generalizes to the
entire OAI-compliant literature:

> The more true open access research that is published and archived at 
> Central, the more useful this becomes for biomedical researchers. [Sure,
> "screen-scaping" HTML from free articles displayed on publisher 
sites could
> give some citation information, but with nothing like the ease, accuracy 
> reliability that it can be obtained with the use of XML data, as at PubMed
> Central].

Fine. But I'd rather have toll-free access to all 20K journals right
now, rather than waiting for these XML perks -- wouldn't you?

Again, toll-free access is one thing -- and extremely important,
already reachable, and already overdue -- and potential perks such as
citation-based navigation are another. Let there be light first; then we
can worry about calibrating the photometers on our Yashicas.

> Beyond citation analysis, there are many other forms of datamining that 
> possible:
> For more information see:
> e.g. Research articles can be mined for details of protein interactions

See above. Right now, it is an indisputable fact that open-access
publishing today (BOAI-2) is the solution only for that 5% of the literature
(of 20K journals) that has a suitable open-access journal today. The
immediate solution for all the rest is self-archiving (BOAI-1), rather
than continuing to wait for more open-access journals to spawn and grow.

(If, in the meanwhile, toll-access publishers also want to help hasten
things along by providing free access, they are certainly welcome
to do so! I still regret -- for the sake of open access --
that the BOAI was
not ready to count it as publisher support of open access if a
toll-access journal supported author self-archiving of their articles *Of course* that
is publisher support for open access! By the same token, I would certainly
consider it as publisher support for open access if a toll-access journal
made its full-text contents publicly accessible online toll-free. Even if
it was gerrymandered full-text access -- as long as they also supported

> And as scientific content is increasingly marked up using richer forms of
> semantically meaningful XML (e.g. CML for chemical structures, MathML for
> equations), the value of datamining will continue to increase.

All true. And it will all prevail eventually. But we need free access

> The BioLINK group are using BioMed Central's open access corpus as the raw
> material for a datamining competition, designed to stimulate progress in 
> development of tools for biological datamining.

That is commendable and welcome. But it must not be forgotten what
percentage of the annual biological journal literature that sample
actually represents. We must not be held back to that small percentage
because we are informed that mere free access is not good enough -- not
"true open access." Such rarefied fussiness does not serve the cause 
either free or open access at this point.

> (4) Derivative works and compilations
> Say that a scientist performs a meta-analysis on a group of published
> clinical trials, and wants to make available the conclusions of that
> research. Or perhaps a datamining researcher has taken a corpus of 1000
> articles breast cancer, and established some interesting conclusions.

All very welcome and valuable (indeed, inevitable) developments in the
online age. But I'd rather that progress toward free access for all 20K
did not wait for these perks. Indeed, the sooner we have free access,
the sooner the rest will come too.

> In a true open access environment, each is free to post the results of 
> research, *along with* the actual corpus of data which the research was
> based on (effectively, the raw data for that research).
> But in a non-open access environment, that raw data (i.e. the research
> articles) cannot be redistributed, which makes it far more difficult than 
> needs to be for other scientists to reproduce, critique and follow up the
> work.

I am afraid I have to disagree. As already noted above, authors are as
free to self-archive (in their institutional repositories) the empirical
data underlying their toll-access publications as they are to do so with
the data underlying their open-access publications. Data-archiving is
another thing for which there is no point sitting around awaiting the
era of universal open-access publishing. Data-archiving will encourage
article self-archiving, and both will hasten the era of universal

> Similarly, a scientist may wish to make a point by assembling a collection
> of certain articles or article fragments (perhaps they wish to assemble a
> comparison of the methods used for a certain technique).
> In an open access world, as long as they cite the sources, they are
> completely free to create and redistribute that compilation. Such a
> selective compilation may in itself be extremely useful contribution to
> science.

I can't follow this at all. A compilation is a list of articles, whether
online or on-paper, whether toll-access of open-access. If the
full-texts of the texts are *free* access, all the compilation need list
is their URLs. (Ditto for article "fragments": try section number,
paragraph number, or even [yech!] PDF page number.)

> (5) Print redistribution rights - the National Health Service, for 
> should be able to redistribute thousands of printed copies of an important
> research article (which it may have funded) to its doctors if it wishes to
> do so. It should not have to pay a hefty copyright fee for the privilege.

I have no views on this, but it has nothing to do with open access,
which even in the strict BOAI definition refers to online access, not
to multiple printing and redistribution rights. Besides, this is all
becoming moot in the online era: Why distribute print copies instead of
URLs, if the texts are publicly accessible online toll-free? 

(I think it is a big mistake, and clouds the issue, to try to link online
toll-free access arguments with paper-printing rights. Don't forget that
those worthy paper-based arguments would have been just as worthy in the
paper era. So surely they are *not* what has changed in the online era.)

> Certainly, print redistribution will likely become less significant in the
> future, but there is no logical reason that the scientific community 
> not be free to exchange and distribute the research that it has created in
> print form, as well as online.

The case for multiple printing rights is *much* weaker than the case
for toll-free online access. Please let us not needlessly weaken
the case for free access by handicapping it with such needless extra
burdens. Free access will erode the need to print, even as it erodes
publisher opposition to printing. But now, all fussing about print
"redistribution" rights does is provoke needless opposition, to no
good purpose. Keep it light, till everyone sees the light.

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 
& 03):

Discussion can be posted to: september98-forum AT 

[BOAI] Re: Free Access vs. Open Access

From: Stevan Harnad <harnad AT>
Date: Sat, 16 Aug 2003 23:52:27 +0100 (BST)

Threading: [BOAI] Re: Free Access vs. Open Access from harnad AT
      • This Message
             [BOAI] Re: Free Access vs. Open Access from harnad AT

              On the Deep Disanalogy 
              Between Text and Software and 
              Between Text and Data
              Insofar as Free/Open Access is Concerned

              Stevan Harnad

It would be a *great* conceptual and strategic mistake for the movement
dedicated to open access to peer-reviewed research (BOAI) to conflate its sense of "free"
vs. open" with the sense of "free vs. open" as it is used in the
free/open-source software movements. The two senses are not at all the
same, and importing the software-movements' distinction just adds to
the still widespread confusion and misunderstanding that there is in
the research community about toll-free access.

I will try to state it in the simplest and most direct terms possible:
Software is code that you use to *do* things. It may not be enough to
let you use the code for free to do things, because one of the things you
may want to do is to modify the code so it will do *other* things. Hence
you may need not only free use of the code, but the code itself has to
be open, so you can see and modify it.

There is simply *no counterpart* to this in peer-reviewed research
article use. None. Researchers, in using one another's articles, are
using and re-using the *content* (what the articles are reporting), and
not the *code* (i.e., the actually words in the text). Yes, they read the
text. Yes (within limits) they may quote it. Yes, it is helpful to be able
to navigate the code by character-string and boolean searching. But what
researchers are fundamentally *not* doing in writing their own articles
(which build on the articles they have read) is anything faintly analogous
to modifying the code for the original article!

I hope that that is now transparent, having been pointed out and written
in longhand like this. So if it is obvious that what researchers do with
the articles they read is not to modify the text in order to generate a
new text, as programmers may modify a program to generate a new program,
where did this open/free source/access conflation come from?

There is a second conflation inherent in it, namely, a conflation between
research publishing (i.e., peer-reviewed journal articles) and public
data-archiving (scientific and scholarly databases consisting of the
raw and processed data on which the research reports are based).

Digital data archiving (e.g., the various genome databases, astrophysical
databases, etc.) is relatively new, and it is a powerful *supplement*
to peer-reviewed article publishing. In general, the data are not *in*
the published article, they are *associated with* it. In paper days, there
was not the page-allotment or the money to publish all the data. And even
in digital days, there is no standardized practice yet of making the raw
data as public as the research findings themselves; but there is definite
movement in that direction, because of its obvious power and utility.

The point, however, is this: As of today, articles and data are not
the same thing. The 2,000,000 new articles appearing every year in the
planet's 20,000 peer-reviewed journals (the full-text literature that
-- as we cannot keep reminding ourselves often enough, apparently --
the open/free access movement is dedicated to freeing from access-tolls)
consists of articles only, *not* the research data on which the articles
are based.

Hence, today, the access problem concerns toll-access to the article
full-texts of 2,000,000 articles published yearly, not access to the
data on which they are based (most of which are not yet archived online,
let alone published; and, when they *are* archived online, they are often
already publicly accessible toll-free!). No doubt research practices will
evolve toward making all data accessible to would-be users, along with the
articles reporting the research findings. This is quite natural, and in
line with researchers' desire to maximize the use and hence the impact
of their research. What may happen is that journals will eventually include
some or all the underlying data as part of the peer-reviewed publication
itself (there may even be "peer-reviewed data"), but in an online 
supplement only, rather than in the paper edition.

(What is *dead-certain* is that, as this happens, authors will not
be idiotic enough to sign over copyright to their research data to their
publishers, the same way they have been signing over copyright to the
texts of their research reports! So let's not even waste time on that
implausible hypothetical contingency. The research community may be slow
off the mark in reaching for the free-access that is already within its
grasp, but they have not altogether taken leave of their senses!)

But that bridge (digital data supplements), if it ever comes, can be
crossed if/when we get to it. Right now, when we are talking about
the peer-reviewed literature to which we are trying to free access we
are talking about *articles* and not about *data*. Hence, exactly as
in the conflation of text with software in the incorrect and misleading
open/free source analogy, the conflation of open/free full-text access to
the refereed literature with hypothetical questions about data-access
and data re-use and re-analysis capability is simply incorrect and
misleading. The two are different, and it is only the first that is at
issue today.

Open/free access -- in this flurry of definitional fussiness and fancy
one no longer knows which word to use! -- to the refereed research
literature is already vastly overdue, even though it has been 100%
within our practical reach for several years now.  

Research usage and impact and productivity are still being needlessly
lost daily, in untold quantities, because of access-denial by
toll-barriers. Why on earth do we keep wasting our time, energy
and attention on minor diversions and irrelevancies, while keeping
the solution to the real, pressing problem on hold, as we ponder the
ramifications of incoherent analogies with software and with
data-archiving, when there is a real job to be done: freeing (sic)
full-text access to the planet's yearly 2,000,000 peer-reviewed research
articles, now!

I will now quote/comment this latest variant of that Protean microbe
that keeps on causing us Zeno's Paralysis on the road to the optimal
and inevitable. In the past, the source of this persistent virus
and its ever-mutating variants had been the opponents of free
access (toll-access publishers), as well as its over-timorous
potential beneficiaries (researchers, librarians, administrators). But now the
paralysis-inducing bug is also originating from the ranks of free-access
activists, who risk balkanizing the free-access movement by driving a
conceptual wedge between "free" and "open," despite the 
fact that nothing
substantive is to be gained, and only more time to be lost thereby. I
will pass to quote/comment mode to illustrate this:

On Thu, 14 Aug 2003, Matthew Cockerill wrote:

> The open source software community [uses] the shorthand 'free, as in beer'

The open/free distinction in software is based on the modifiability of the
code. This is irrelevant to refereed-article full-text. (And the beer
analogy was silly and uninformative in both cases! Lots of laughs, but
little light cast.)

> Sure, if you are given some limited access to something and that access is
> 'free, as in beer', that can be very useful.
> In the world of software, say, that would apply to Windows Media Player,
> which you can download for free from the Microsoft website (even though 
> software itself is highly proprietary, and Microsoft would not take kindly
> to you reverse-engineering it or distributing a modified version).

This is all irrelevant to article-access, except that toll-access
publishers can, like every other product- or service-provider, use partial
or temporary access as a marketing "hook." Temporary access is not 
access (or rather it is free access only while it is free). And partial
access is free only for whatever it is access to, not for what it is
not access to. (We're all "non-smokers" while we are asleep...)

But none of this provides any basis at all for the analogy with
proprietary code, as in software, nor with any need for code
modifiability, whatsoever.

> But free/open source software is more than 'free as in beer', it is 'free 
> in speech', and this offers hugely significant extra freedoms (which is 
> open source software has had such a revolutionary effect on the software
> industry).

This free beer/speech analogy was already dubious in the software case
(not all programmers wish to give away their code [the freedom to produce
non-give-away products/services is a freedom too!], either for use or
for modification, or both; and my speech, whether spoken or written,
is spoken/written for you to hear, not for you to claim to have been
your own words, whether in unaltered or altered form; and we are free
to say or write what we like, as long as it is indeed our own, etc. etc.).

But never mind. We will not try to repair another domain's incoherent
analogy here; but, please, let us not import it where it just sows still
more confusion in an already confused terrain: Refereed-research-article
authors (unlike the authors of most other forms of "written speech")
are not interested in earning access-royalties from the sale or use of
their words. They just want their words *used,* as much as possible. (That's
"research impact.")  But to use their words is not to modify their 
(the code) and then re-issue them, perhaps as the modifier's own. To use
their words is to use their *content*, by incorporating that content
into the user's own content, in his *own* words, with proper source
attribution, so as to produce another text, another "written speech."

It would be nice if all programmers were willing and motivated to make
all their code free, not just for use, but for modification too. It would
also be nice if the writers of all words were willing and motivated to
make their words free, not just for use, but for modification too. But
alas humans and their egos are monadic, not distributed and diffuse,
and their motivation is usually local, and quid pro quo. So there will
always be programmers who program only if it pays, and they may want the
credit as well as the first-dibs at modification and development. Nolo
contendere there. 

But the same is true of writers. Some will always want to be paid for
access to their words, and virtually all will want to keep their own
words as their own.

Refereed-article writers, however, don't want to be paid for access to
their words, because access-tolls reduce the usage of their work, which
is what they really want to maximize (because that research impact is
what brings them their rewards, both financial and
scholarly/scientific). Because the words are in natural language, there
is no question of researchers concealing their code (of they choose to
publish at all). But what they want you freely using is its *content*
(with proper attribution). There is no question of modifying its form. As
software does not have this form/content duality, the analogy simply
does not apply; it is incoherent.

> The Free Software Foundation defines these freedoms as:
> * The freedom to run the program, for any purpose (freedom 0).

Inapplicable to text: "Running the program" is accessing the text.

> * The freedom to study how the program works, and adapt it to your needs
> (freedom 1). Access to the source code is a precondition for this.

Irrelevant to text. You may study and use the content of my (giveway,
refereed-article) text (with attribution) in any way you like, and you
may quote it (with attribution). That's all. And there all analogy
between text and software ends. 

There are also many new software-based uses (indexing, search,
navigation, digitometric analyses) that one can make of online text,
which refereed-article authors also welcome, but the big hurdle is free
full-text access, and not these perks, which will come with the territory.

But no reprocessing of *my* text code in order to turn it into *your*
text code (other than via its content, as processed by your brain)!
(And remember that data, and data-processing, are not part of
refereed-article text.)

> * The freedom to redistribute copies so you can help your neighbor 
> 2).

Moot for text, when all you need redistribute is the URL of its toll-free
full-text online.

> * The freedom to improve the program, and release your improvements to the
> public, so that the whole community benefits (freedom 3). Access to the
> source code is a precondition for this.
> (see )

Irrelevant to refereed-article text. You may improve on the content, in
text of your own, with proper attribution. (And again, data re-analysis
is an orthogonal matter.)

> This philosophy fits exceptionally well with the needs of the scientific
> community to share and build on each others research, which is why very 
> academic software development projects are developed using an open source
> model.

Scientific *software*. But we were talking about scientific-article
*text*, and this was supposed to be an analogy! There is no counterpart
to collective software development at the article-code level. It is only
content that the scientific community develops collectively, and even
that, while tracking attribution through citation.

Nor did the collective, cumulative use of scientific content require any
cues from the software community! Open-source *content* has been the
rule with scholarship for centuries: That's why scholars *publish*. The
new question is only about access to their content (via their text)
online. Please let's not forget or obscure that fundamental new question
in this welter of free-associative digital analogies of doubtful
relevance and coherence.

> BioMed Central's policy of Open Access is based on giving the scientific
> community a similarly broad freedom to make use of the research articles
> that we publish. 

The scientific community already has the freedom to make use of
published articles. What it lacks is toll-free access to their texts!

> This includes giving access to the structured form of the articles, 

We're back to XML mark-up again: a perk, a welcome perk, but we first,
and far more urgently, need the basics, namely, toll-free access to the
full-text. Please let us focus on that, rather than getting side-tracked
onto perks, especially those that make it seem as if free access were
somehow not enough, somehow not "truly open." We don't have free 
today. We don't need advice on the short-comings of free access; we need
help in getting free access, as soon as possible.

> and giving the right to redistribute and create derivative works
> from the articles.

I've already replied to this in an earlier posting: When the full-text
is online and toll-free, the only relevant mode of "redistribution" 
to distribute the URL. Ditto for "derivative works." Quotes, as 
require attribution. And text without attribution may be neither 
nor modified. So what is really the point here?

> This isn't just a philosophical issue - it has practical implications:
> e.g. in the August 14 issue of Nature (Vol 424 p727), Donat Agosti, from 
> American Museum of Natural History, New York, laments the fact that the
>  database of ant taxonomy is missing much critical
> information because a large fraction of all descriptions of new ant 
> are covered by publisher copyright.

I couldn't follow this. If the database is toll-free, the database is
toll-free. If making the database useful requires toll-free access to
the full-text of refereed-articles, then the full-text of
refereed-articles needs to be made toll-free! We knew that already!
What is the point of all these further free-associations and free-floating
analogies? We are running in circles instead of breaking out of the

> In a true Open Access environment, not only could Antbase link to the
> articles on the publishers web site, but it could also make use the images
> and the text within those published descriptions to compile a universal 
> authoritative catalog of Ant taxonomy.

Translation: We need free access not only to the database, but to the
full-text. This can be clearly seen without conflating the two. (Please
jettison this "true open access" locution, or save it for when we 
universal false-but-toll-free full-text access, and we have nothing
more urgent left to do than to optimize it further. My guess is that
the rest will already have come with the territory of its own accord. But
please, let's go for the territory, before the "truth" -- see Keats
quote at end).

> Finally, to respond to Sally's point questioning the benefits of 
> deposition in a standard repository:

I re-read Sally Morris's point, and I now see that (in agreeing on #5)
I misconstrued it as as addressing only the trivial differences between
the types of "databases" -- "archives," 
"repositories": how we unfailingly
prefer to fuss with and multiply terminological trivia instead of
staying focussed on matter of substance! -- in which a full-text might
be deposited (e.g., Eprints vs Dspace, or central vs. institutional). I
now realize that Sally was refereeing there to BioMedCentral's (BMC's)
[requirement? recommendation?] that BMC authors archive their BMC
full-texts in an open-access database such as PubMed Central. Hence what
my reply to Sally should have been was this:

>sh>    5) Whether the item and/or its metadata are deposited in certain
>sh>    types of databases (this last seems to me supremely irrelevant)

         I agree it's irrelevant, if by "certain
         type" you mean, say, Eprints vs. Dspace.
         But it's certainly not irrelevant whether the item (full-text)
         is deposited in *some* type of database *at all*, for if it
         is not deposited in a free-access database of *some* type,
         it is not free access!

         Whether that database type is institutional and distributed,
         disciplinary and central, or the toll-free access database of an
         open-access or a toll-access publisher is an implementational
         and strategic matter. And whether or not that database is
         OAI-compliant is a matter of functionality and efficiency
         (OAI-compliant databases greatly preferred!).

> Although theoretically it might not matter where something is available, 
> in what format, it should be clear that in practical terms these are
> absolutely vital issues.  

Absolutely vital *relative to what*? In practical terms, we do not
have free full-text online access to most of the refereed literature
(2,000,000 annual articles, in 20,000 refereed journals) today. What
is absolutely vital is getting that free access, now, and putting an
end at last to the needless daily impact-loss that continues until that
happens. Whether that free access is via this type of archive or that,
and has or lacks these perks or those, is certainly not the absolutely
vital issue today. On the contrary, foregrounding such minor details
when we still lack the basics, and thereby raising the goal post for
what we should all be aiming for, slows and diverts rather than speeds

Free access, now! Never mind the rest until we have those long-overdue
basics in hand, at last!

> So for example, theoretically, every DNA sequencing
> lab could put up its own web page and make available the sequences they
> themselves have obtained, using their own choice of format. The scientific
> community would thereby have free access to all those DNA sequences. 

Correct. And this has absolutely *nothing* to do with the free-access
movement, which is about toll-free access to the 2M articles in the 20K
toll-access journals, not about data-archiving, which is a parallel but
independent development that proceeds apace, and does not need
free-access's (or publishers') permission! (Data-archiving, on the
other hand, might help accelerate article-archiving!)

> But in
> fact, the deposition of all DNA sequences in a standard format with 
> has a truly enormous benefit in practical terms, and has served as a 
> foundation for the development of tools to mine the genome. PubMed 
> role as a repository for biomedical research articles is very much
> analogous to Genbank's role as a repository for DNA sequence data.

An archive is an archive. There is an analogy (as well as a
complementarity) between data-archives and article-archives, but the
big difference is that both data archiving and data-archives are (1)
new, and (2) do not have a prior tradition and current status quo of
being non-free, whereas articles are (1) old, and (2) do have a prior
tradition and current status quo of being non-free. Publishers' relatively
new toll-based online article-archives are also non-free. So the relevant point
about article archiving is that article-archives should be free.
    "that is all ye know on earth, and all ye need to know"

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 
& 03):

Discussion can be posted to: september98-forum AT 

[BOAI] Re: Free Access vs. Open Access

From: Stevan Harnad <harnad AT>
Date: Mon, 27 Oct 2003 14:09:34 +0000 (GMT)

Threading: [BOAI] Re: Free Access vs. Open Access from harnad AT
      • This Message

Re: Free Access vs. Open Access
On Mon, 27 Oct 2003, Jan Velterop wrote:

> If online material is 'open' in the sense of 'free' that is of course a
> great step forward, but if it's only available in pdf...
> that is decidedly sub-optimal...
> Not being optimal... shouldn't be an excuse for not making freely 
> (self)archiving ...but
> ...we shouldn't lose sight of the ultimate... goal, 'open' access 
> (as defined in the Berlin Declaration, the Bethesda principles, 
> by Wellcome, PLoS, BioMed Central, and others) as opposed to merely 
> 'free' access. 
> It doesn't help to be sub-ambitious

I can only repeat that the open/free distinction is a red herring, no
matter how often it is invoked formally and informally.

These matters have not been thought through rigorously; and things
decided in haste have been invoked ever more solemnly without having
been examined for their usefulness or even their coherence.

Please, let's not lose sight of the problem, which is still there,
as pressing as ever, but now being kept at a distance by yet
*another* groundless, confusion-generating, and -- most important --
*inaction-encouraging* reservation.

The problem -- it can never be repeated often enough, apparently -- was
and is this: There are 24,000 journals publishing 2.5 million articles
per year, most of them not accessible to most of their potential users
worldwide because of access-tolls. This was also the problem in the
paper era, but in that medium there was no solution because of the
true costs and limited power of paper. (One could not diffuse paper
over the airways, let alone data-mine it!)

Now we are in the online era, which offers many new possibilities,
including online data-mining. But the *relevant* possibility -- relative
to what we do have, and what we still lack, *exactly* as we lacked it in
the paper era -- is the possibility of toll-free access to the full-text
online. That is what was missing then, and that very same access is
missing now.

So what do we do? We start to talk about this *absent* access as *not
enough*, "sub-optimal," not the "ultimate goal"! 

This is rather like declaring (while still sitting in the total darkness):

    "Let there be light -- but let it not be just be the good old
    sunlight we've been deprived of for centuries, but voice-activated,
    computer-controlled, fluorescent/incandescent light, 100K Lux!"

So if someone proposes: "Why don't you just open the curtains and let in
the sunlight?" the reply is "It doesn't help to be 

Considering the actual circumstances -- the curtains being still
closed, and most of us still sitting in the dark -- it does seem
rather impractical to be referring to the call to open the curtains
as sub-optimal and sub-ambitious while that simple act is still not
being performed, even though it could be, immediately, because people
still don't understand that it can be, nor what advantages it will
bring. Instead, we get ahead of ourselves, and fixate on the advantages
it will *not* bring!

Yet even those alleged shortfalls are spurious! (See the prior postings
on this thread.) Free online access to the full text (even if only
PDF!) still means being able to do *everything* one could do with paper
(if one could afford the access-tolls!), including reading the printed-off
version! But there is also on-screen browsing, reading and navigation,
downloading, storing, forwarding, *and* the capacity to convert
automatically to html or ascii for text-mining. Free-access online texts
are also harvestable and harvested, invertible and indexable, hence
boolean-searchable and otherwise navigable, singly and collectively. Not
to mention collectible into global virtual archives like oaister, the
google of the refereed research literature.

Data-mining? First, let us not forget that the text of a journal article
usually does not contain the empirical data on which it is based (in
part because it would have been too expensive to publish all those data
on paper in the paper era!), only the summary tables and analyses. So
the empirical data were and still are a separate database -- one that
should likewise be made freely accessible, alongside the refereed article
literature, certainly, but that is a separate matter, not to be conflated
with the open-access movement's first, second and third goal, which is
to free access to the refereed article literature!

What quantitative data do appear in the text of an article are just text,
like the rest of the article. Even in XML format, the problems of how
to make generic data interoperable remain to be worked out, so let us
not delay opening the curtains on that account either!

> 'free' will come in the wake of the open access movement, but
> I doubt if the reverse is true.

It's not the "wake" that's the problem, but the *wait* (and, yes, it 
indeed beginning to feel ever more funereal!).

Since "free access" is one of the necessary conditions for fulfilling
the definition of "open access", it is tautological that 
"free" will
follow (!) "open": It "follows" it logically, as surely as 
"p" follows
"not-not-p"! But the crucial question is *when* will we have 
For we can already have "free" now, if we just open the curtains! 

So in real-time, we can already have "free" now (by each of us 
self-archiving, one by one, of all 2.5 million of our annual research
articles), whereas to have "open" others first have to create or 
23,500 more open-access journals, one by one.

I'd rather not wait for that, to get access to the sunlight. 

The reverse is true? Getting free access now will reduce the probability
of later getting any online text-mining powers we may be missing? I would
like to see the logic of that sketched out explicitly, for I don't see it
at all! Surely a free, online, full-text corpus will inspire all manner
of further online optimizations sooner than sitting and waiting for them
in the dark!

I know Jan is not actually recommending that we wait in the dark! But
the fact is that we *are* waiting in the dark, and what we really need
is elucidation of the feasibility and benefits of the opening of the
curtains that is within our immediate reach. It does not help to
encourage our Zeno's Paralysis by suggesting that letting in the light
would merely be "sub-optimal"!

> analyse, data-mine, and text-mine... it has to be in a
> machine-readable format.

Please see the above on the distinction between data-mining and
text-mining, the difference between article texts and their empirical
data, and on the convertibility of PDF to html and ascii. (PDF is a red
herring anyway, for authors with data tables can also self-archive the RTF
and HTML, and eventually no doubt also a vanilla XML generated from
their own word-processor version.)

> Open Access really is more than just an economical goal (although it goes
> without saying that being able to access literature without having to be 
> an institution that can afford the access tolls helps enormously).

Please translate this into the terms of the "let there be light" 
introduced here an intuition pump for the toll-free access that was, is,
and will remain the essence of the open-access movement.

    "Toll-free access to the refereed literature is more than just an
    economic goal"

Of course! And also *other* than an economic goal. It is the toll-curtains
that keep us in the dark, but the light can be let in by opening the
curtains on our own work without changing the overall economics: what we
need and want is light, not economic-change. (It is open-access journals
that offer the light through economic change.)

    "(although it goes without saying that being able to access 
     the refereed literature toll-free helps enormously)"

Is this not another tautology (once it has been freed of the
spurious open/free distinction that I have to tried to show to
be functionally empty above)?

> Perhaps the difference in approach between open access publishing and
> self-archiving, while both working in parallel to strengthen one another, 
> the sense of priority of a qualitative (in terms of usability) versus a
> quantitative one. 

Not at all. Both approaches yield *exactly* the same thing, both
quantitatively and qualitatively: toll-free access to the digital
full-text of all refereed journal articles online. The difference is
that one approach requires the founding, funding and filling of 24,000
new open-access journals with the digital texts of the yearly 2,500,000
articles, whereas the other requires only the the founding, funding and
filling of institutional eprint archives with the digital texts of the
yearly 2,500,000 articles. You will find that the number, cost and
difficulty of the respective founding/funding/filling steps is far lower
for the one than the other, both quantitatively and qualitatively:

Stevan Harnad

NOTE: Complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98 & 99 & 00 & 01 & 02 
& 03):
    Posted discussion to: september98-forum AT 

Dual Open-Access Strategy:
    BOAI-2: Publish your article in a suitable open-access journal
            whenever one exists.
    BOAI-1: Otherwise, publish your article in a suitable toll-access
            journal and also self-archive it.

> > -----Original Message-----
> > From: Stevan Harnad [mailto:harnad AT]
> > Sent: 24 October 2003 18:20
> > Subject: [Manifesto] Re: Open Access and Humanities Monographs
> > 
> > On Fri, 24 Oct 2003, Stefan Gradmann wrote:
> > 
> > > [Willard,] as you state, the online version of a
> > > book is not satisfying (and this already has caused the 
> > death of the rather
> > > silly e-book paradigm), and thus self-archiving of book 
> > material (even if it
> > > was available for the authors) may not be a solution at 
> > all. Open access to
> > > electronic information only gets attractive in our context once 
> > > material is published in a way that is appropriate to the 
> > > environment and that makes use of its innovative potential in a 
> > > PDF-documents modeled on the printing analogy simply don't!
> > 
> > I *completely* disagree! Consider the following (I think much more
> > realistic) logic:
> > 
> > (1) It is a *good* thing that online access to full-text monographs 
> > not as attractive as having the book on paper. That removes one
> > prima-facie obstacle to self-archiving them and thereby providing 
> > access for those who cannot afford to buy the monograph yet 
> > might still
> > make some use of the text!
> > 
> > (2) Once open access -- reminder: that means toll-free 
> > full-text online
> > access for anyone on the web -- becomes widespread for 
> > monographs, there
> > will be much more motivation for designing ways to make online access
> > more convenient, useful, effective.
> > 
> > It makes no sense whatsoever *not* to self-archive a monograph merely
> > because online access may not be optimal! It's certainly 100% better
> > than no access! (This reasoning is simply the flip-side of the 
> > self-paralytic reasoning that they should not be self-archived 
> > they *would* be preferred over the paper version! At least the latter
> > would have a publisher, and possibly a royalty-seeking author 
> > to endorse
> > the reasoning; but with the online-is-nonoptimal argument it is 
> > a rationalization for inaction! No losers; no winners.)
> > 
> > >wm> "Open" is a word like "free", whose 
meaning and import 
> > >wm> greatly depends on the preposition that implicitly 
> > > 
> > > You are perfectly right in pointing out some facets of the 
> > connotation aura
> > > of a term like 'open' (and much more could be said here) - 
> > I would only like
> > > to add that the same kind of reflexion could be made 
> > regarding the term
> > > 'access' which may have very different connotative values 
> > depending on
> > > whether you use it with a 'text culture' or with an 
> > 'empiristic' background
> > > ...
> > 
> > It is here that I feel that we non-hermeneuticists and 
> > non-semioticians
> > may have a bit of an advantage, in not getting too wrapped up in
> > far-fetched connotations. Here is a black and white distinction:
> > 
> > (1) 2,500,000 articles in 24,000 journals can only be read 
> > online if the
> > user's institution can afford to pay the access tolls.
> > 
> > (2) Open access means being able to do the same thing as those lucky
> > users, but without having to be at an institution that can afford the
> > access tolls.
> > 
> > Open access is not about access to the printed edition. (But 
> > the online
> > edition can always be printed off, if one wishes.)
> > 
> > No philosophical problem. It is clear what we do not have 
> > now, and what
> > we would have if there were open access to the journal article
> > literature. Ditto for the monograph literature. (And note that 
> > was said about the superiority or even parity of online 
> > access compared
> > to on-paper access for monographs. It's only about about tolled
> > vs. toll-free online access.)
> > 
> > Cheers, Stevan
> > 

[BOAI] [Forum Home] [index] [options] [help]

 E-mail: .