Download Complete Article in PDF Format - vsrd international ...

R E S E A R C H A R T II C L E 

____________________________ 

Available ONLINE www.vsrdjournals.com 

VSRD-IJCSIT, Vol. 2 (4), 2012, 285-295 

Data Centric Knowledge Management System 

Using Post-Clustering Technique 

ABSTRACT 

1 Asadi Srinivasulu*, 2 Ch.D.V. Subba Rao and 3 M. Sreedevi 

The purpose of Data Centric Knowledge Management System (DCKMS) is to centralize knowledge generated 

by employees working within and functional areas and to organize that knowledge such that it can be easily 

accessed, searched, browsed and navigated. It is a one stop shop for finding solutions for your problems. It 

provides a facility for the employees to register themselves as ‘experts’ as well as search for other ‘experts’ 

incase of any problem/requirement in their project. It is a one stop shop for finding solutions for your problems. 

This system design is modularized into various categories. This system has enriched UI so that a novice user did 

not feel any operational difficulties. This system mainly concentrated in designing various reports requested by 

the users as well as higher with export to excel options. This paper addresses the expectations, organizational 

implications, and information processing requirements, of the emerging knowledge management paradigm. A 

brief discussion of the enablement of the individual through the wide-spread availability of computer and 

communication facilities is followed by a description of the structural evolution of organizations, and the 

architecture of a computer-based knowledge management system. The author discusses two trends that are 

driven by the treatment of information and knowledge as a commodity, increased concern for the management 

and exploitation of knowledge within organizations, and, the creation of an organizational environment that 

facilitates the acquisition, sharing and application of knowledge. 

Keywords : Data, Data-Centric, Data Mart, Data Portal, Data Warehouse, Enabled Individual, Information, 

Information-Centric, Information Management, Knowledge, Knowledge Management, Ontology, 

Organizational Structure, Clustering, Data Mining, Fuzzy C-Means Clustering Algorithm, K-Means 

Clustering Algorithm. 

1. INTRODUCTION 

The Data Centric Knowledge Management System is a web based application which allows employees of a 

company to share their knowledge with others in the company. Also it allows them to search for knowledge 

1,3 Associate Professor, 2 Professor, 1,2,3 Department of Information Technology, Sree Vidyanikethan Engineering College, 

Tirupathi, Andhra Pradesh, INDIA. *Correspondence : srinu_asadi@yahoo.com

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012 

assets when in need. It provides a facility for the employees to register themselves as ‘experts’ as well as search 

for other ‘experts’ incase of any problem/requirement in their project. It is a one stop shop for finding solutions 

for your problems. As information technology begins to permeate all aspects of life and the economy turns 

decidedly information-centric, wealth is increasingly defined in terms of information-related products and the 

availability of knowledge. Under these conditions employment, whether self-employment or organizational 

employment is becoming singularly focused on the skills and capabilities of the individual. In other words 

knowledge has become a commodity that has value far in excess of the manufactured products that represented 

the yardstick of wealth during the industrial age. How this new form of human wealth should be effectively 

utilized and nurtured in commercial and government organizations have in recent times become a major 

preoccupation of management. Two parallel and related trends have emerged. The first trend is related to the 

management and exploitation of knowledge. The question being asked is: How can we capture and utilize the 

potentially available knowledge for the benefit of the organization? The phrase “…potentially available” is 

appropriate, because much of the knowledge is hidden in an overwhelming volume of computer-based data. 

What is not commonly understood is that the overwhelming nature of the stored data is due to current 

processing methods rather than volume. These processing methods have to rely largely on manual methods 

because only the human user can provide the necessary context for interpreting the computer-stored data into 

information and knowledge. If it were possible to capture information (i.e., data with relationships), rather than 

data, at the point of entry into the computer then there would be sufficient context for computer software to 

process the information automatically into knowledge. This is not just a desirable 

2. RELATED WORK 

The main purpose of functional requirements within the requirement specification document is to define all the 

activities or operations that take place in the system. These are derived through interactions with the users of the 

system. Since the Requirements Specification is a comprehensive document & contains a lot of data, it has been 

broken down into different Chapters in this report. The depiction of the Design of the System in UML is 

presented in a separate chapter. The Data Dictionary is presented in the Appendix of the system. But the general 

Functional Requirements arrived at the end of the interaction with the Users are listed below. A more detailed 

discussion is presented in this, which talk about the Analysis & Design of the system. Administrator of this 

system can add a new employee as well as delete an existing employee and he can view all the existing users of 

the system. Administrator can create; delete user logins for different employees. Administrator can view 

different reports (My Submission report, Ratings reports, document status report etc). 

� Administrator of this system can add a new employee as well as delete an existing employee and he can 

view all the existing users of the system. 

� Administrator can create; delete user logins for different employees. 

� A K-User/ K-Team Member/Reviewer can search for a document based on his criteria (author, technology 

etc). 

� A K-User/ K-Team Member/Reviewer can download a document. 

286

� A K-User/ K-Team Member/Reviewer can rate a document. 

� A K-User/ K-Team Member/Reviewer can submit a document. 

� A K-User/ K-Team Member/Reviewer can register as an expert. 

� A K-User/ K-Team Member/Reviewer can search for an expert. 


� A K-Team Member/Reviewer can evaluate the above documents for initial screening. 

� A K-Team Member can manage the reviewers list. 

� A K-team Member can assign a document to particular reviewer 

� A Reviewer can view the list of documents forwarded to him 

� A Reviewer can publish or reject a document. 

3. EXISTING ALGORITHM 

Fig. 1 : Context Level Diagram 

Here in the existing system, the company maintains all the knowledge based documents in a separate system 

which will be accessible for all employees through LAN and they can post their new documents into this and 

access the earlier documents. Searching for related documents based on author, technology etc is a time taking 

process. Managing the documents category wise and restrict them not to be accessible based on the user type 

becomes complicated. This system doesn’t restrict unnecessary documents to be posted. 

DRAWBACKS: 

� Difficulty in maintaining security levels for the documents. 

� Difficulty in browsing, navigating and searching for required document. 

� Difficulty in giving ratings for the documents. 

287

� Availability of information in this manner is subjected to damage. 


� Difficulty in restricting the employees not to update the documents. 

� Difficulty in generating different reports. 

4. PROPOSED SYSTEM 

The proposed system is fully computerized, which removes all the drawbacks of existing system. In the 

proposed system, it allows different employees of the company to upload their knowledge document into this 

system which will be verified by next level users to avoid unnecessary documents. Also it allows them to search 

for knowledge assets very easily when in need. It provides a facility for the employees to register themselves as 

‘experts’ as well as search for other ‘experts’ incase of any problem/requirement in their project. It provides a 

facility for the evaluator to rate the documents posted by the employees. 

ADVANTAGES: 

� It provides a facility a to share knowledge documents across the company 

� It allows the employees to upload and download the documents from their systems 

� Easy in browsing, navigating and searching for required documents 

� Provides a facility to restrict the unnecessary documents to be posted 

� Provides flexible way in generating different reports 

� By the following the new approach the information can be accessed from anywhere just with a mouse click. 

This helps the users by saving lot of time providing the user with the up to date information Centralized 

database helps in avoiding conflicts 

� This project provides a rich user interface for the user to access information with least effort (“look and 

feel”). 

� It allows to rate the documents at different levels 

� It allows publishing or rejecting the documents. 

4.1. K-MEANS ALGORITHM 

Step 1) Put the first K feature vectors as initial centers 

Step 2) Assign each sample vector to the cluster with minimum distance assignment principle. 

Step 3) Compute new average as new center for each cluster 

Step 4) If any center has changed, then go to step 2, else terminate. 

288

4.2. K-MEANS 


Fig. 5 : Applying Clustering Technique Similarity Weight and Filter Method 

Fig. 6 : Results of Clustering Showing Groups Divided Into Clusters 

Fig. 7 : Initialization and Input 

289


Fig. 8 : Final EMST Edges Path 

Fig. 1 : Graph for K-Means 

K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem 

.K-means is a popular clustering method that uses prototypes (centroid) to represent clusters by minimizing 

within-cluster errors. The main idea is to define k centroid, one for each cluster. 

This centroid should be placed in a cunning way because of different location causes different result. The next 

step is to take each point belonging to a given data set and associate it to the nearest centroid. After we have 

these k new centroid, a new binding has to be done between the same data set points and the nearest new 

centroid. Finally, this algorithm aims at minimizing an objective function. 

The objective function : 

290


We apply the above algorithm in our project by taking input attributes like number of assignments submitted; 

number of tasks done successfully, number of times had face to face interactions among team members. Now 

applying above algorithm results in division of groups into k clusters .The groups in each cluster would have 

shown nearly similar behavior hence grouped into same cluster. Now it becomes easy for the facilitator to give 

feedback as now he can give feedback to the entire cluster instead of giving to each and every group 

5. RESULTS 

Fig. : This Screen Is Login Page for All Users and Administrator 

Fig. : Administrator Can Find the Experts for Getting the Assistance 

Fig. : Administrator Can Register As Experts 

291

6. CONCLUSION 


Fig. : This Screen Shows the K Team Actions 

The new system, Data Centric Knowledge Management System has been implemented to cater the needs of 

company employees in sharing different knowledge assets effectively with role based access. The present 

system has been integrated with the already existing. The database was put into the My SQL server. This was 

connected by JDBC. The database is accessible through Intranet on any location. This system has been found to 

meet the requirements of the users and departments and also very satisfactory. The database system must 

provide for the safety of the information stored, despite system crashes or attempts at unauthorized access. If 

data are to be shared among several users, the system must avoid possible anomalous results. Future 

enhancement is Extendibility provides high level extendibility. It means it provides all the basic features and 

allows us to extend their features very easily without disturbing the existing code. We can make this Internet 

application if we desire. We can make this application is suitable to work on any application just by changing 

the deployment files. By providing some more features like providing accessibility to internet users to involve in 

this process. 

7. REFERENCES 

[1] Srinivasulu Asadi, Dr. Ch.D.V.Subbarao, V. Saikrishna, “Finding the number of clusters using Dark Block 

Extraction”, IJCA International Journal of Computer Applications (0975 – 8887), Volume 7– No.3, 

September, 2010. 

[2] A. Ahmad and L. Dey, (2007), A k-mean clustering algorithm for mixed numeric and categorical data’, 

Data and Knowledge Engineering Elsevier Publication, vol. 63, pp 503-527. 

[3] Srinivasulu Asadi, Dr.Ch.D.V.SubbaRao, V.Saikrishna and Bhudevi Aasadi “Clustering the Labeled and 

Unlabeled Datasets using New MST based Divide and Conquer Technique,” International Journal of 

Computer Science & Engineering Technology (IJCSET), (0975 – 8887), IJCSET | July 2011 | Vol 1, Issue 

6,302-306, ISSN:2231-0711, July, 2011. 

[4] Xiaochun Wang, Xiali Wang and D. Mitchell Wilkes, IEEE Members, “A Divide-and-Conquer Approach 

for Minimum Spanning Tree-Based Clustering”, IEEE Knowledge and Data Engineering Transactions, vol 

21, July 2009. 

292


[5] Srinivasulu Asadi, Dr.Ch.D.V.Subba Rao, O.Obulesu and P.Sunil Kumar Reddy, “Finding the Number of 

Clusters in Unlabelled Datasets Using Extended Cluster Count Extraction (ECCE)”, ,” IJCSIT International 

Journal of Computer Science and Information Technology (ISSN: 0975 – 9646), Vol. 2 (4) , 2011, 1820- 

1824, August, 2011. 

[6] S Deng, Z He, X Xu, 2005. Clustering mixed numeric and categorical data: A cluster ensemble approach. 

Arxiv preprint cs/0509011. 

[7] Srinivasulu Asadi, Dr.Ch.D.V.Subba Rao, O.Obulesu and P.Sunil Kumar Reddy,“A Comparative study of 

Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count 

Extraction Extended Dark Block Extraction and Extended Cluster Count Extraction”, IJCSIT International 

Journal of Computer Science and Information Technology (ISSN:0975 – 9646), Vol. 2(4) , 2011, 1825- 

1831,August, 2011. 

[8] S. Guha, R. Rastogi, and K. Shim, 2000. ROCK: A Robust Clustering Algorithm for Categorical Attributes. 

Information Systems, vol. 25, no. 5 : 345-366. 

[9] V.V. Cross and T.A. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: assessment and 

Applications, Physica-Verlag, New York, 2002. 

[10] M. Kalina, Derivatives of fuzzy functions and fuzzy derivatives, Tatra 

[11] Jiawei Han and Micheline Kamber. “Data Ware Housing and Data Mining. Concepts and Techniques”, 

Third Edition 2007. 

[12] Zhexue Huang; Ng, M.K.;Manage. Inf. Principles Ltd., Melbourne, Vic.A fuzzy k-modes algorithm for 

clustering categorical data. vol.7, pp 446-452 

[13] Tengke Xiong; Shengrui Wang; Mayers, A.; Monga, E.; Dept. Comput. Sci., Univ. of Sherbrooke, 

Sherbrooke, QC, Canada. A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical 

Data. 

[14] Iam-On, N.; Boongeon, T.; Garrett, S.; Price, C.;Aberystwyth University, Aberystwyth. A Link-Based 

Cluster Ensemble Approach for Categorical Data Clustering. vol. PP 1. 

[15] Izakian, H.; Abraham, A.; Snasel, V.;Machine Intell. Res. Labs. (MIR Labs.), Auburn, WA, USA. 

Clustering categorical data using a swarm-based method. pp. 1720-1724 

[16] Charu C.Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications. 

SIGKDD ’03, August 2427, 2003, Washington, DC, USA 

[17] Huajie Zhang; Zhiyue Cao; Fangzhu Qiang;Dept. of Comput. Sci., China Univ. of Geosci., Wuhan. 

Representation and clustering of numeric data in concept formation. vol.1, pp 597-600. 

[18] M. Mahdavi and H. Abolhassani, (2009) Harmony K-means algorithm for document clustering, Data Min 

Knowl Disc (2009) 18:370–391. 

[19] Yong Wang; Naohiro Ishii.Learining Feature Weight for Similarity Measures. 

[20] Bainian Li; Kongsheng Zhang; and Jian Xu. Similarity measures and weighted fuzzy c-mean clustering 

algorithm. World Academy of Science, Engineering and Technology 76 2011 

[21] K. Rajendra Prasad, dr. P.Govinda Rajulu, a survey on clustering Technique for datasets using Efficient 

graph structures, vol. 2 (7), 2010, 2707-2714 

[22] Sotirios P. Chatzis. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and 

categorical attributes employing a probabilistic dissimilarity functional. Department of Electrical and 

293


Electronic Engineering, Imperial College London, Exhibition Road, South Kensington Campus SW7 2BT, 

UK. 

[23] G. Gan, Z. Yang, and J. Wu (2005), A Genetic k-Modes Algorithm for Clustering for Categorical Data, 

ADMA, LNAI 3584, pp. 195–202. 

[24] J. Z. Haung, M. K. Ng, H. Rong, Z. Li (2005) Automated variable weighting in k-mean[1] type clustering, 

IEEE Transaction on PAMI 27(5). 

[25] K. Krishna and M. Murty (1999), ‘Genetic K-Means Algorithm’, IEEE Transactions on Systems, Man, and 

Cybernetics vol. 29, NO. 3, pp. 433-439. 

[26] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), ‘Incremental genetic K-means algorithm and its 

application in gene expression data analysis’, BMC Bioinformatics 5:172. 

[27] [27] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), FGKA: A Fast Genetic K-means Clustering 

Algorithm’, ACM 1-58113-812-1. 

[28] Z. He, X. Xu, & S. Deng,(2005) Scalable algorithms for clustering categorical data, Journal of Computer 

Science and Intelligence Systems 20, 1077-1089. 

[29] A. Juan and E. Vidal, “Fast K-Means-like Clustering in Metric Space,” Pattern Recognition Letters, vol. 15, 

no. 1, pp. 19-25, 1994. 

[30] Decomposition Methodology for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, eds., 

pp. 90-94. World Scientific, 2005. 

[31] W. McCormick, P. Schweitzer, and T. White, “Problem Decomposition and Data Reorganization by a 

Cluster Technique,”Operations Research, vol. 20, no. 5, pp. 993-1009, 1972. 29] Statistical Pattern 

Recognition. A. Webb, ed., pp. 345-357. John Wiley & Sons, 2002. 

[32] A. Gordon, Classification, second ed. Chapman and Hall, CRC, 1999. 

[33] S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 

290, no. 5500, pp. 2323-2326, 2000. 

[34] J.B. Tenenbaum, V. Silva, and J. Langford, “A Global Geometric Framework for Nonlinear Dimensionality 

Reduction,” Science, vol. 290, no. 5500, pp. 2319-2323, 2000. 

[35] J.C. Bezdek and R. Hathaway, “VAT: A Tool for Visual Assessment of (Cluster) Tendency,” Proc. Int’l 

Joint Conf. Neural Networks (IJCNN ’02), pp. 2225-2230, 2002. 

[36] M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” 

Proc. Advances in Neural Information Processing Systems (NIPS), 2002. 

[37] M. Breitenbach and G. Grudic, “Clustering through Ranking on Manifolds,” Proc. 22nd Int’l Conf. 

Machine Learning (ICML), 2005. 

[38] R.B. Catelli, “A Note on Correlation Clusters and Cluster Search Methods,” Psychometrika, vol. 9, no. 3, 

pp. 169-184, 1944. 

[39] P. Sneath, “A Computer Approach to Numerical Taxonomy,” J. General Microbiology, vol. 17, pp. 201- 

226, 1957. 

[40] T.C. Havens, J.C. Bezdek, J.M. Keller, M. Popescu, and J.M. Huband, “Is VAT Really Single Linkage in 

Disguise?” Pattern Recognition Letters, 2008, in review.Liang Wang received the PhD. 

�� 

294

Download Complete Article in PDF Format - vsrd international ...

Create successful ePaper yourself

Delete template?

Save as template?