Budapest Open Access Initiative: BOAI Forum Archive[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
Re: [BOAI] Formats for electronic dissemination
From: Radu <radu AT monicsoft.net>
At 11:55 AM 10/27/03, Dario Taraborelli wrote: >(I confess that I don't thoroughly understand the problem with pdf's, >since pdf documents can be indexed by search engines as easily as html >documents: it doesn't look like an insuperable technical problem). There's something else about archived pdfs, much worse than the relative inaccessibility of the semantics for their content, and that's image-based text. I have seen many journal archives which simply dump page scans into pdf format. The resulting documents are huge and totally impenetrable by current classification/data mining tools. It's even impossible to copy/paste text out of these 'archives'. Yours, Radu -- Eastcree.org project Carleton University www.monicsoft.net/proj/creeTime.html (613) 520-2600x2174
[BOAI] [Forum Home] [index] [prev] [next] [options] [help]
E-mail: firstname.lastname@example.org .