RR_03_02
RR_03_02
RR_03_02
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
hierarchical structure along with additional information<br />
concerning them. Specific information that can be<br />
harvested from this may include:<br />
• Exact location of each block, in rectangular<br />
coordinates, equivalent to rendition using a<br />
standard browser.<br />
• Size of each block of content.<br />
• Type of content, e.g. text, graphics etc.<br />
• Weight of content, in terms of size and placement<br />
within a page.<br />
• Continuity information, derived from physical<br />
association in terms of geometrical collocation.<br />
• Classification of content into a set of pre-defined<br />
classes, e.g. main story, sidebars, links and so on.<br />
• Linkage information from the XML representation,<br />
indicating the layers of information that can be<br />
hidden at a level of summary. This can represent<br />
the content in many levels, but more than two or<br />
three levels are unsuitable for easy navigation.<br />
5. Conclusion<br />
This is a concept paper discussing a specific pathway<br />
to the future of web page re-authoring provided accurate<br />
layout information is available. This in no way represents<br />
a state of the art discussion about the possible use of<br />
layout information. Rather, it focuses on one small part<br />
within an array of possibilities. It will be interesting to<br />
discuss other possibilities in this space during the DLIA<br />
workshop.<br />
References<br />
1. Alam, H., Hartono. R., Kumar, A., Tarnikova, Y.,<br />
Rahman, F. and Wilcox, C., "Web Page<br />
Summarization for Handheld Devices: A Natural<br />
Language Approach", 7th Int. ConE on Document<br />
Analysis and Recognition (ICDAR'<strong>03</strong>). In press.<br />
2. Brown, M., Glinski, S. and Schmult, 8., "Web page<br />
analysis for voice browsing", First International<br />
Workshop on Web Document Analysis (WDA2001),<br />
Seattle, Washington, USA, September 8, 2001.<br />
3. Takagi, H. and Asakawa, c., "Web Content<br />
Transcoding For Voice Output", Technology And<br />
Persons With Disabilities Conference, 20<strong>02</strong>.<br />
4. Hori, M., Mohan, R., Maruyama, H. and Singhal, S.,<br />
"Annotation of Web Content for Transcoding", E3C<br />
Note. http://www.w3 .org/TRIannot.<br />
5. Jones, M. , Marsden, G., Mohd-Nasir, N. and<br />
Buchanan, G. , "A site based outliner for small screen<br />
Web access", Proceedings of the 8th World Wide<br />
Web conference, pages 156-157, 1999.<br />
6. Berger, A. and Mittal, V., "OCELOT: A System for<br />
Summarizing Web Pages", Research and<br />
40<br />
Development In Information Retrieval, page 144-<br />
151,2000.<br />
7. Rahman, A., Alam, H. and Hartono, R., "Content<br />
Extraction from HTML Documents", Proceedings of<br />
the Web Document Analysis Workshop (WDA01),<br />
Seattle, USA, 8 September, 2001, pp. 7-10. Online<br />
Proc. at http://www.csc.liv.ac.uk/- wda2001l.<br />
8. Alam, H. , Rahman, F. and Tarnikova, Y., "Web<br />
Document Analysis: How can Natural Language<br />
Processing Help in Determining Correct Content<br />
Flow?", WDA 20<strong>03</strong>. Submitted.