29.05.2013 Views

RR_03_02

RR_03_02

RR_03_02

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

hierarchical structure along with additional information<br />

concerning them. Specific information that can be<br />

harvested from this may include:<br />

• Exact location of each block, in rectangular<br />

coordinates, equivalent to rendition using a<br />

standard browser.<br />

• Size of each block of content.<br />

• Type of content, e.g. text, graphics etc.<br />

• Weight of content, in terms of size and placement<br />

within a page.<br />

• Continuity information, derived from physical<br />

association in terms of geometrical collocation.<br />

• Classification of content into a set of pre-defined<br />

classes, e.g. main story, sidebars, links and so on.<br />

• Linkage information from the XML representation,<br />

indicating the layers of information that can be<br />

hidden at a level of summary. This can represent<br />

the content in many levels, but more than two or<br />

three levels are unsuitable for easy navigation.<br />

5. Conclusion<br />

This is a concept paper discussing a specific pathway<br />

to the future of web page re-authoring provided accurate<br />

layout information is available. This in no way represents<br />

a state of the art discussion about the possible use of<br />

layout information. Rather, it focuses on one small part<br />

within an array of possibilities. It will be interesting to<br />

discuss other possibilities in this space during the DLIA<br />

workshop.<br />

References<br />

1. Alam, H., Hartono. R., Kumar, A., Tarnikova, Y.,<br />

Rahman, F. and Wilcox, C., "Web Page<br />

Summarization for Handheld Devices: A Natural<br />

Language Approach", 7th Int. ConE on Document<br />

Analysis and Recognition (ICDAR'<strong>03</strong>). In press.<br />

2. Brown, M., Glinski, S. and Schmult, 8., "Web page<br />

analysis for voice browsing", First International<br />

Workshop on Web Document Analysis (WDA2001),<br />

Seattle, Washington, USA, September 8, 2001.<br />

3. Takagi, H. and Asakawa, c., "Web Content<br />

Transcoding For Voice Output", Technology And<br />

Persons With Disabilities Conference, 20<strong>02</strong>.<br />

4. Hori, M., Mohan, R., Maruyama, H. and Singhal, S.,<br />

"Annotation of Web Content for Transcoding", E3C<br />

Note. http://www.w3 .org/TRIannot.<br />

5. Jones, M. , Marsden, G., Mohd-Nasir, N. and<br />

Buchanan, G. , "A site based outliner for small screen<br />

Web access", Proceedings of the 8th World Wide<br />

Web conference, pages 156-157, 1999.<br />

6. Berger, A. and Mittal, V., "OCELOT: A System for<br />

Summarizing Web Pages", Research and<br />

40<br />

Development In Information Retrieval, page 144-<br />

151,2000.<br />

7. Rahman, A., Alam, H. and Hartono, R., "Content<br />

Extraction from HTML Documents", Proceedings of<br />

the Web Document Analysis Workshop (WDA01),<br />

Seattle, USA, 8 September, 2001, pp. 7-10. Online<br />

Proc. at http://www.csc.liv.ac.uk/- wda2001l.<br />

8. Alam, H. , Rahman, F. and Tarnikova, Y., "Web<br />

Document Analysis: How can Natural Language<br />

Processing Help in Determining Correct Content<br />

Flow?", WDA 20<strong>03</strong>. Submitted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!