12.01.2015 Views

Download - Academy Publisher

Download - Academy Publisher

Download - Academy Publisher

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

main types of horizontal search environment: web search<br />

and desktop search (though the areas overlap, and<br />

systems may address these scenarios as well).<br />

The major challenge faced by Enterprise search is the<br />

need to index data and documents from a variety of<br />

sources such as: file systems, intranets, document<br />

management systems, e-mail, and databases and then<br />

present a consolidated list of relevance ranked resources<br />

from these various sources. In addition, many<br />

applications require the integration of structured data as<br />

part of the search criteria and when presenting results<br />

back to the users. And of course access controls are vital<br />

if users are to be restricted to data and documents which<br />

they are granted access by the various document<br />

repositories within the enterprise. These major challenges<br />

are unique to enterprise search.<br />

C. Internet versus enterprise search<br />

The internet and enterprise domains differ<br />

fundamentally, such as contents, user behavior and<br />

economic motivations.<br />

• The notion of a "good" answer: Internet search is<br />

vaguely defined Relevant and Popular. Enterprise search<br />

is “right” answered Specific documents.<br />

• Social forces behind the creation of Internet and<br />

intranet contents: Internet is created for attracting and<br />

holding the attention of any specific group of users, and<br />

the collective voice of many authors who are free to<br />

publish content. Intranet is created for disseminating<br />

information. There is no incentive for content creation,<br />

and all users may not have permission to publish content.<br />

• Security and copy rights: Internet includes public<br />

domain, and the user has copy rights. Enterprise: access<br />

privileges, and usually, the corporation has the copy<br />

rights.<br />

• Deployment environments: Internet is controlled by<br />

one organization as a service. Enterprise: licensed to and<br />

deployed by a variety of organizations in diverse<br />

environments.<br />

III. SECURITY REQUIREMENTS AND INFRASTRUCTURE OF<br />

TWO SEARCHES<br />

On the public World Wide Web, there's not much of a<br />

connection between security and search. You may have<br />

logins for certain web sites, and you may use https when<br />

buying something with your credit card, but that doesn't<br />

have much to do with a general search that you might do<br />

on Google or Yahoo. Yes, a few subscription sites limit<br />

what non-subscribers can see, or may require an account<br />

to read the full text of a document [7].<br />

Enterprise security can be heavily tied to search. In<br />

larger organizations, your company login may be used to<br />

restrict which documents you can see in the search results<br />

list. Access control, usually in the form of an Access<br />

control list (ACL), is often required to restrict access to<br />

documents based on individual user identities. There are<br />

many types of access control mechanisms for different<br />

content sources making this a complex task to address<br />

comprehensively in an enterprise search environment.<br />

Generally speaking, the security infrastructure and<br />

protocols tend to be different between the public web and<br />

the enterprise; and when security is needed in search, the<br />

search engine must integrate with the available security<br />

infrastructure and protocols. The Internet and Intranet<br />

both use SSL and HTTPS.<br />

But on private networks, Single Sign On (SSO), LDAP<br />

and Active Directory are still the norm. A search<br />

application that cares about security will likely be using<br />

those protocols. A few enterprise apps still use the older<br />

"application level security"; search engine applications<br />

can do likewise. We haven't seen the distributed identity<br />

assurance model taking root in corporations yet, they still<br />

seem to prefer a tightly controlled central resource. One<br />

exception to this apathy is between different government<br />

agencies; they are starting to realize that government<br />

employees frequently need data from other agencies, and<br />

that distributed, cooperative security systems make this<br />

much more efficient, so that government employees don't<br />

have to keep creating new logins for every agency they<br />

visit.<br />

IV. CHARACTERIZING ENTERPRISE SEARCH<br />

The total is well known that there are various<br />

characteristics of the enterprises information. Such as text<br />

content in electronic form, diversity of content sources<br />

and formats, source access, structured and semistructured<br />

search, federated search, content management,<br />

people and behaviors.<br />

Beyond the difference in the kinds of materials being<br />

indexed, enterprise search systems also typically include<br />

functionality that is not associated with the mainstream<br />

web search engines [6]. These include:<br />

• Adapters to index content from a variety of<br />

repositories, such as databases and content management<br />

systems.<br />

• Federated search, which consists of (1) transforming<br />

a query and broadcasting it to a group of disparate<br />

databases or external content sources with the appropriate<br />

syntax, (2) merging the results collected from the<br />

databases, (3) presenting them in a succinct and unified<br />

format with minimal duplication, and (4) providing a<br />

means, performed either automatically or by the portal<br />

user, to sort the merged result set.<br />

• Enterprise bookmarking, collaborative tagging<br />

systems for capturing knowledge about structured and<br />

semi-structured enterprise data.<br />

• Entity extraction that seeks to locate and classify<br />

elements in text into predefined categories such as the<br />

names of persons, organizations, locations, expressions of<br />

times, quantities, monetary values, percentages, etc.<br />

• Faceted search, a technique for accessing a<br />

collection of information represented using a faceted<br />

classification, allowing users to explore by filtering<br />

available information.<br />

• Access control, usually in the form of an Access<br />

control list (ACL), is often required to restrict access to<br />

documents based on individual user identities. There are<br />

many types of access control mechanisms for different<br />

239

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!