Budapest Open Access Initiative: BOAI Forum Archive[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
Re: [BOAI] Formats for electronic dissemination
From: "Michael J. O'Donnell" <michael_odonnell AT acm.org>
Dario Taraborelli wrote: > I would like to point out that a much more fundamental > issue has been so far underestimated (at least to my knowledge) : the > question of the overall accessibility and interoperability of FORMATS for > archived documents. > This seems to be - at least prima facie - much more urgent than the > problem of which formats allow full-text data mining > (I confess that I don't thoroughly understand the problem with pdf's, > since pdf documents can be indexed by search engines as easily as html > documents: it doesn't look like an insuperable technical problem). Of course, getting things archived is more important than the choice of format. But the format is important, too, as long as we don't let the choice delay archiving. I wrote about this in 1993: http://people.cs.uchicago.edu/~odonnell/Scholar/Technical_papers/Electronic_Journal/description.html Well-crafted PDFs are searchable for words and phrases, but PDF is inherently a format for describing page layouts, rather than a format for describing texts. E.g., textual structure (section headings, etc.) is much harder to pull from PDF than from HTML. My article emphasizes the data-structural qualities of different formats, because we can evaluate these relatively objectively and definitely. Many of the crucial long-term issues, such as indefinite readability, cannot be attacked directly, because they depend on unknown future developments. But a format that is well structured technically will be easier to convert in the future, and has one advantage in attracting a large enough user community to insure that it will be converted when necessary. Mike O'Donnell The University of Chicago
[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
E-mail: firstname.lastname@example.org .