[BOAI] Formats for electronic dissemination

From: Dario Taraborelli <tarabore AT>
Date: Mon, 27 Oct 2003 17:55:55 +0100 (MET)

[Apologies: slightly OT]

As a contribution to the discussion about appropriate formats for
self-archiving, I would like to point out that a much more fundamental
issue has been so far underestimated (at least to my knowledge) : the
question of the overall accessibility and interoperability of FORMATS for
archived documents.
This seems to be - at least prima facie - much more urgent than the
problem of which formats allow full-text data mining
(I confess that I don't thoroughly understand the problem with pdf's,
since pdf documents can be indexed by search engines as easily as html
documents: it doesn't look like an insuperable technical problem).

As far as I know, the current version of Eprints allows users to deposit a
document in either open standards (like HTML, PDF, PS, DVI etc.) or
semi-open (like RTF) and proprietary formats (like DOC, PPT, XLS etc.).
Correct me if I'm wrong: I've never found in the OAI discussion lists any
clear statement about which formats are appropriate for electronic
archiving and which formats should be avoided. Still, there is a huge
debate in the digital library community about how to grant accessibility
and perennity of electronic content: one of the main recommendations is
that institutions involved in dissemination of electronic documents should
begin to strongly discourage the use of proprietary standard and
promote the use of accessible and public standards.
[See for instance UNESCO's Preliminary Draft Charter on the Preservation
of the Digital Heritage -

Consider the following scenario: a growing number of electronic documents
are deposited in open archives in proprietary formats and one day the
software/plugin required for displaying such formats suddenly is no more
available (it is already the case with older versions of existing document
formats). The result is that a considerable part of online papers made
available through Open Access Archives will simply become *no more
accessible* for technical reasons.

To put it another way, does it make sense to promote *toll-free access* to
electronic papers without considering the crucial but often ignored issue
of granting *format accessibility* to this content?

Do we have any statistics about formats used in open access archives?
Has this general issue ever been raised within the OAI community (if it
has, my apologies: could someone please give me some pointers)?  If not,
don't you find it is urgent to think through this kind of problems?



Dario Taraborelli

Institut Jean-Nicod
1bis, avenue de Lowendal
F-75007 Paris
+33 (0)1 53593294

taraborelli AT

