Budapest Open Access Initiative      

Budapest Open Access Initiative: BOAI Forum Archive

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

boaiforum messages

RE: [BOAI] Launch of SPARC Europe Seal for OA (standards, license and metadata)

From: "David Prosser" <david.prosser AT>
Date: Tue, 13 May 2008 17:33:30 +0100

Threading: RE: [BOAI] Launch of SPARC Europe Seal for OA (standards, license and metadata) from t.d.wilson AT
      • This Message
             RE: [BOAI] Launch of SPARC Europe Seal for OA (standards, license and metadata) from t.d.wilson AT


I think that you are downplaying what data- and text- mining have already
achieved.  There have been some very interesting results from text-mining
just the abstracts of papers in Medline (and abstracts have been used
because the researchers did not have access to all of the full-text).  Also,
in the humanities, text-mining has been used with good effect using
newspapers as the 'mines'.  (And to head-off a tangent - I know that access
to newspapers is not the aim of OA, but I mention it to show that results
are possible in the humanities.)  Yes this is an embryonic field, but one
that is already providing results.

So, would an individual working in a commercial setting be allowed to
download copies of all of the papers in your journal and host them locally,
reformat them, and run text-mining programmes over them?  If not, whose
permission would they need:  yours as publisher or that of each author?  If
they need permission and wanted to run their programme across 100,000 papers
from 2000 publishers how would they go about getting that permission?

Best wishes


-----Original Message-----
From: owner-boai-forum AT
[mailto:owner-boai-forum AT] On Behalf Of Prof. Tom Wilson
Sent: 11 May 2008 15:45
To: BOAI Forum
Subject: RE: [BOAI] Launch of SPARC Europe Seal for OA (standards, license
and metadata)

Data mining is usually defined as searching for hitherto unrecognized
in collections of data, employing a variety of statistical and AI
To do this, one needs collections of DATA sets, not simply collections of
papers. Papers are predominantly text and the treatment of text to discover
relationships is not only in its infancy but probably embryonic - the
are the usual ones faced by AI - texts, particularly a collection of texts
as the entire contents of a journal are notoriously difficult to carry out
automatic extraction process upon.  Even information retrieval after about
years of development has still not cracked the fundamentally ambiguous
of human language.

However, let us suppose that some machine exists that could take the entire
contents of an OA journal and somehow "mine" it, so that the 
discover relationships among concepts of which s/he was previously unaware.
is this different from the human being carrying out a literature review to
exactly the same thing?  In neither case is there any infringement of
- research is published precisely to enable this kind of analysis and the
consequent further progression of the field of knowledge. At the level of
within a document, consider the publication of a new way of calculating a
statistical measure that makes that measure more useful in certain
circumstances to do with the nature of the sample population: and suppose
datum is discovered in a search (by man or machine) - am I banned from using
because it is in a copyrighted document?  Of course not. I am banned from
passing it off as my own discovery, but not from using it.  It was published
with the intention that it should be used.

The notion, therefore, that only the BY-CC licence can aid "data 
mining" -
whatever that turns out to be beyond the usual hype - is untenable. To base
Seal upon this perception, therefore, is misguided.

Professor T.D. Wilson, PhD, Hon.PhD
Publisher/Editor in Chief
Information Research
e-mail: t.d.wilson AT
Web site:

Quoting David Prosser <david.prosser AT>:

> The confusion for me (and this may just be my misunderstanding) comes from
> the way in which people do data-mining.  It is not just a question of
> searching across a range of articles.  Many data-miners want to copy the
> articles onto a local computer, possibly re-format them so that they are
> a standard form, and then perform the data-mining.  It is not clear to me
> that a researcher at a commercial organisation could do that to papers
> are published under a non-commercial license.  If that is so, then they
> would need to contact either all the publishers individually (in the best
> case) or each author (in the worse case).  Surely this would not be
> practical.
> (I would welcome comment from either copyright lawyers or data-miners to
> tell me if I have this wrong.).
> David 
> -----Original Message-----
> From: owner-boai-forum AT
> [mailto:owner-boai-forum AT] On Behalf Of Andras Holl
> Sent: 08 May 2008 15:22
> To: boai-forum AT
> Subject: RE: [BOAI] Launch of SPARC Europe Seal for OA (standards, license
> and metadata)
> V. Sasi Kumar wrote:
> > I have a small doubt here. Can we say that searching a journal is a
> > commercial use of the journal?
> No, You are right, I have to clarify my position. Doing the 
> search is not commercial, and there is no need for permission
> if they use the full-text search facilities provided by the
> journal. Nor is any need for getting permission if the journal
> permits transfer of their whole content. Permission is needed only,
> where is no such permission, when they want to transfer the whole 
> content to their site, for data mining, in my opinion. Also, most 
> journals provide their full-text search or other facilities for the 
> individual user, for "normal" use. It is advisable at least, for 
> commercial
> company (or anyone who is capable for that) to contact the
> journal publisher if they want to run "industrial", 
> searches or downloads.
> In my opinion, there is a difference between the "normal" or
> "average" or "fair" use - when an individual 
researcher pursues his/her
> own scholarly agenda, and a commercial company, doing something
> on industrial scales, and for profit. If nothing else,
> such an activity could owerload the server, and deny other
> users' access to the content potentially.
> On the other hand, whether there is need for permission from the
> authors, when a company would use their work - if they use the
> result, the conclusion, then probably not. If they use the
> paper as a whole, a table, a figure reproduced in a product which 
> is for sale, then I would say yes.
> Andras Holl

[BOAI] [Forum Home] [index] [prev] [next] [options] [help]

 E-mail: .