13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

IADIS International Conference <strong>WWW</strong>/<strong>Internet</strong> 2010proceed to the next screen that can be seen in figure 5. Here we embed in an applet form the web site crawlerdescribed earlier. The crawler is simplified to the basics however the user still has a number ‘Advanced’options. All the adjustments necessary are set by default, thus all the user has to <strong>do</strong> is to provide the startingURL of the web site. Instructions are provided in case the user wants to eliminate parts of his site from thecrawl, or specify unwanted pages or files. The crawler by default looks for pages within the server and onlyunder the specified homepage. The user during the crawl can see real time statistics for the ongoing crawl.After the crawling is finished the data is stored to the database and the user can move on to the next step.The user can get to this next intermediate page of the ‘Available Crawls’ also from the homepage, if he orshe selects to work on an existing crawl. From this page the web master can see a table containinginformation for all the web site instances stored during previous sessions and can also edit basic, descriptiveinformation about these instances or delete them completely. What’s important is to select from this page thecrawl to work with (from hereon referred to as ‘active crawl’), by clicking on the corresponding radio button.This action activates a specific web site - instance and allows the user to begin interacting with it. A new,relevant sub menu appears on the right part of the tool titled: ‘Active Crawl: crawl_name’, with two optionsspecific for the selected active crawl. The first option reads ‘Visualize’ and the second ‘Hotlinks Home’.ICON INDEX: Page’s 1 st appearance (site’s DAG): Hotlink added by the user: View page’s detailsFigure 6. A small part of a tree-like representation of a dummy web site, crawled by our tool.By clicking the ‘Visualize’ link the user progresses to the web site’s map visualization page. Theconnectivity data stored by the crawler is properly fetched and a tree-like structure is created, depicting theweb site. At first only the home page appears awaiting the expansion, by clicking on the corresponding ‘+’button. Each page, if it has children, can be expanded or collapsed. Some of the web pages-nodes dependingon whether they are hotlinks or whether they belong to the site’s direct acyclic graph (consisted of all of thefirst appearances of each page) can have appropriate symbols accompanying them. In figure 6 we can see asmall part of a web site’s map as generated by our tool. Each page is shown in the site’s map is by its tag, as it is retrieved during the parsing phase. If the title for any reason in not successfully retrieved, thecomplete URL is shown instead. The title or URL shown is a physical link to the web page that opens in anew tab or browser, in order for the user to have immediate views of each page. By clicking on the arrownext to the title or link of the page, the user is transferred to the ‘Node Details’ page. The index in figure 6explains the icons that can be seen in a site’s map.In the detailed page of each link-node the web master can view a full connectivity status concerning thespecific page. Apart from the standard information retrieved during the crawl, such as the URL, depth etc. theweb master can see by a click of a corresponding button a detailed table with all of the other appearances ofthe node in the site, another table with all the possible father nodes and also all the children nodes. Finallyfrom this page the web master can initiate the procedure for a hotlink addition that has as a father page thecurrent node and as a target a page that he or she will specify next. The addition is completed with the help ofa new side menu that appears when the procedure begins and that guides the user through this quick process.The ‘Hotlinks Home’ page is one where the web master can view the hotlinks and their status that havealready been added for the active crawl and perform the automated addition of those already specified but notyet physically added to the site. The automatic addition uses information about the physical path of the website’s files in the web server that the user has already specified. The source code of the links that wasretrieved and stored during the crawl is also used in order to provide better formatted links. The links areappended to the bottom of the father page and though this <strong>do</strong>es not produce the desired result, it usuallysimply requires the transfer of the link to the suitable place, thus making the completion of the changes thatthe web master aims to be easy, fast and straightforward.To better illustrate the functionality of the ‘Hotlink Visualizer’ we present bellow the steps required for aweb master to acquire the exact map of a site, perform the optimizations that he sees best, make themeffective and also maintain the older image of the web site for his own reference and comparison purposes.1. The user begins by acquiring the initial map of the web site. He <strong>do</strong>es that by choosing to start anew crawl and then using the embedded crawler as aforementioned.79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!