XML and Information Retrieval: a SIGIR 2000 Workshop David Carmel, Yoelle Maarek, Aya Soffer IBM Research Lab in Haifa Introduction XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Internet It is believed that it will become a universal format for data exchange on the Web and that in the near future we will find vast amounts o f documents in X M L format on the Web. As a result, it has become crucial to address the question o f how large collections o f XML documents can be sorted and retrieved efficiently and effectively. To date, most work on storing, indexing, querying, and searching documents in XML has stemmed from the database community's work on semi-structured data An alternative approach, that has received less attention to date, is to view XML documents as a collection of text documents with additional tags and relations between these tags IR techniques have traditionally been applied to search large sets of textual data and should thus be extended to encode the structure and semantics inherent in X M L documents Integrating IR and XML search techniques will enable more
/lp/association-for-computing-machinery/xml-and-information-retrieval-a-sigir-2000-workshop-NhS22oD18T