Knowledge bases, Ontologies and Key-‐Value Stores Contents - ETH

cbrg.ethz.ch

Knowledge bases, Ontologies and Key-‐Value Stores Contents - ETH

Knowledge bases, Ontologies and Key-­‐Value Stores Contents Knowledge bases, Ontologies and Key-­‐Value Stores 1. Knowledge bases………………………………………………………………………………………….…2 1.1 Concepts of knowledge bases…………………………………………………...……………….2 2. Ontologies……………………………………………………………………………………...……………….4 2.1 Implementation of ontologies……………………………………………………………………5 3. Key-­‐Value Stores……………………………………………………………………………………………..5 3.1 NoSQL………………………………………………………………………………………………………6 4. References………………………………………………………………………………………………………7 Written and submitted by: Aarti Krishnan


2 1. Knowledge Bases 1.1 Concepts of Knowledge bases A knowledge base can be defined as an organized repository of information consisting of concepts, data, rules and specifications for efficient knowledge management. It is a repository where information can be collected, organized, shared and searched. Knowledge bases can be classified into Machine-­‐readable knowledge bases that largely consist of Artificial intelligence (AI) or expert system based retrieval techniques and Human readable knowledge bases that comprise of physical documents and textual information such as Tutorials, FAQs etc. [1] In machine-­‐readable knowledge bases, the knowledge is in a computer-­‐readable form, mainly for the purpose of having an automated deductive reasoning applied to it. They consist of a primitive set of data, often in the form of rules that describe the knowledge. These rules are then extended to deduce or derive a logical conclusion. Classical deduction can also be used to reason about the knowledge in the knowledge base. For example, “All men are mortal. Socrates is a man” “Hence, Socrates is mortal” à Initial rules à Extended rule and deduced conclusion A knowledge base comprises of large amount of detail about the configuration and facts pertaining to a specific domain or subject of interest. If the domain knowledge is encoded in a computer, the knowledge-­‐based system should be able to answer some questions that normally require human expertise. Further, the encoding should also be understandable by a human, so that it can be used, verified and expanded. In order to develop such knowledge-­‐based systems, one needs to be able to efficiently model the knowledge and also be able to represent it. Creation of a knowledge base involves understanding the concept of ontologies. Ontology is a set of terms or attributes that are assigned to objects and their interrelations, allowing the knowledge base to extract solutions. A knowledge-­‐based system (KBS) provides intelligent decisions with justifications and uses artificial intelligence for problem solving and to support human learning and action. It is part of an expert system that contains the facts and rules needed to solve a problem. It contains an underlying set of concepts, assumptions and rules (ontology) which a computer system has available to solve any problem. Therefore, knowledge-­‐based systems are basically computer programs that contain large amounts of information, rules and a reasoning system for making intelligent inferences.


3 The KBS is an extension of the relational database management system, with many additional features. It is known to have borrowed many important concepts (such as relations, classes, attributes, keys etc.) from relational databases and has drastically evolved from simple file systems over time (Figure 1). File System Relational Database Knowledge Base Figure 1: Shows the extension of simple file systems into complex knowledge bases. Maxim: “The hierarchy of the FS/RDB/KB is continuous. The higher we are; the better!” Architecture of a Knowledge Base Rules Definitions etc. Knowledge Base Interface Engine User Interface Query Explanation Conclusion End User Figure 2: The architecture of a knowledge base comprising of a database that stores all domain knowledge, an inference engine that processes all the information and a user interface through with the end user interacts with the expert system. As depicted in Figure 2, a knowledge base stores all domain knowledge used in the system, the documents’ metadata, as well as any other structural or semantic data, for example information on the object definitions and relationships. It also stores the general principles or rules that could be applied to any problem. The inference engine searches the knowledge base and applies the relevant rules and relationships when and where required. It processes all the information also provides a logical justification for its decisions. The interface is where the communication between the user and the expert system takes place. All the queries sent and responses received are managed by the user interface.


4 2 Ontologies An ontology is a formal description of concepts and relationships that can exist for a particular domain of knowledge. Like a dictionary, ontologies collect and organize related entities. It describes accurately the structure of a main domain and how its entities are inter-­‐related. The backbone of an ontology is most often a taxonomy (classification of entities in a hierarchical form, Figure 4), but is however not limited to taxonomical hierarchies alone. The structure is usually in the form of a tree (or graph), where the entities are specified as the nodes and the edges specify the relationship between them. The graph is directed and acyclic, where the terms follow an “IS A” relation with one another and their definition gets more specialized as we move down the graph. Organism “IS A” Plant Animal “IS A” Mammal Fish Figure 4: Example of an ontology (in a hierarchical form) depicting a particular domain of organisms, with terms following an “IS A” relationship Ontologies also enable easy sharing and re-­‐use of knowledge by defining the entities and the relationship among them. In other words, as described by Thomas Gruber, an ontology is a description or a formal specification of the concepts and relationships that can exist for a particular domain. Also, the more formal and expressive description we have, the faster can the description be automatically processed, the better we can capture the intended meaning and the easier it will be to share an ontology. [4] In a traditional relational database, concepts can be stored using tables, but the system does not contain any information about what the concepts mean and how they relate to each other. Ontologies provide the means to store such information, allowing for a much richer of storage. Ontologies are said to be at the "semantic" level of data, whereas database schema are models at the "logical" or "physical" level. Since consistent vocabulary is needed for unambiguous querying and unifying information from multiple sources, ontologies are currently being used for communication of knowledge, mainly for integrating heterogeneous databases and enabling inter-­‐operability among different systems.


5 2.1 Implementation of ontologies -­‐-­‐-­‐Reusable domain ontologies: The idea of reusable ontologies would allow different information systems to inter-­‐work and cooperate with each other to accomplish larger goals (such as database interoperability, cross database search, and the integration of web services) and support the interchange of information. Once ontologies are well defined for one or several domains, they can be organized to create a library of reusable ontologies and be used for effective knowledge acquisition. The key role of ontologies with respect to database systems is to specify a representation for data modeling at a level of abstraction above specific database layers (logical or physical), so that data can be efficiently exported, translated, queried, and unified across independently developed systems and services. 3 Key-­‐Value Stores Key-­‐value stores are records that consist of a unique indexed “key” and a corresponding “value”. The store maintains the data as key/value pairs and provides a single operation of fetching a particular value using its key. Key-­‐value stores allow the application developer to store structure-­‐less data which replaces the need for a fixed data model and makes the requirement for properly formatted data less strict. [5] It allows the storage of arbitrary data that is being indexed using a single key, which is an effective means for data retrieval. Because of their key role in data-­‐retrieval performance, key-­‐value stores are said to have an associative memory. They attempt to store most of the data in the main memory, thus avoiding expensive input/output operations. The main advantage of using key-­‐value stores would be for the increased speed in storing and retrieving data. The other advantages are: Scalability: Effective distribution of both the data and the load of simple operations over many servers, with no disk shared among the servers. This kind of distribution is also termed as “horizontal scalability” and storing records on different servers according to some key is called “sharding”. Horizontal scaling allows the system to support a large number of simple read/write operations per second. [6]


6 Availability and Fault Tolerance: High availability of resources for data and query processing is ensured by distributed and replicated data over many servers, preferably in two or more locations. A key-­‐value store is highly available in the sense that it has the ability to effectively replicate and distribute/ partition data over multiple physical servers. This also guards the data against failure of one or more servers and can continue serving client requests. Greater performance: Key-­‐value stores efficiently use distributed indexes and utilize the main memory for data storage. Also, a key-­‐value store has a very simple API (Application Programming Interface) which performs simple put, get and delete functions. This simplicity quickens the response time, increasing the performance of the application. 3.1 NoSQL NoSQL stands for “Not Only SQL”. The architecture of the NoSQL systems is distributed, meaning the data is stored and operations can be performed in multiple servers. This makes the system fault-­‐tolerant as no single node is responsible for the entire operation. NoSQL uses the concept of key-­‐value stores to store its data. NoSQL does not adhere to the standard relational database models and is also characterized by being non-­‐structured and useful to store information that require no relations and tables. It refers to a number of non-­‐relational distributed database architectures as it does not follow any fixed schema. It can support numerous ad-­hoc data formats. Due to its fast and reduced run-­‐time flexibility, it is used for quick storage and retrieval of large amounts of data. Recently, it is being used for real-­‐time analysis of data and largely growing data over the Internet, such as for Twitter and Facebook feeds. The disadvantage of the system is that it may not give the ACID guarantees. ACID Properties (Atomicity, Consistency, Isolation and Durability) are a set of properties that guarantee that database transactions are processed reliably. [7] -­‐ Atomic: Every action of the transaction must takes place and if any action cannot be completed, the data must remain exactly the same as it was before the transaction was attempted. -­‐ Consistent: The transaction must perform correctly, within the rules of the application. -­‐ Isolated: Every transaction must run as if there were no other instances of related transactions occurring at the same time. In practice, this means that


7 there must be a mechanism to lock the information to a particular transaction while changes take place. -­‐ Durable: Once a transaction is complete, its effects must be permanent and must survive failures. 4 References [1] http://en.wikipedia.org/wiki/Knowledge_base [2] http://en.wikipedia.org/wiki/Transitive_closure [3] http://www.obitko.com/tutorials/ontologies-­‐semantic-­‐web/ontologies.html [4] H. V. Jagadish, A compression technique to materialize transitive closure, ACM Transactions on Database Systems (TODS), v.15 n.4, p.558-­‐598, Dec. 1990 [5] Marc Seeger, Key-­‐Value stores: a practical overview, Computer Science and Media Ultra-­‐Large-­‐Sites SS09 Stuttgart, Germany [6] Rick Cattell, Scalable SQL and NoSQL data stores, ACM SIGMOD Record, v.39 n.4, December 2010 [7] http://en.wikipedia.org/wiki/NoSQL

More magazines by this user
Similar magazines