7 - Indira Gandhi Centre for Atomic Research

More documents

Recommendations

Info

‣ Legend words prefixed to a word with initial upper case letter is a certain indicator of personal names. ‣ Example: Mr. Powell ‣ A single uppercase letter followed by a dot connecting two words both with initial upper case letters or a one or more single upper case letters each followed by a dot and further followed by one or more words with initial upper case letters are also good indicators of personal names. ‣ Examples: Bernard I. Palmer ‣ S. R. Ranganathan ‣ A.P.J. Abdul Kalam ‣ Jack Wells ‣ Two Consecutive words starting with uppercase letters and followed by a comma and single uppercase letter or a word beginning with an uppercase letter. ‣ Example: Wells, A. J. ‣ Mills, Jack 3.1 Name Extraction Process: The processes involved in name extraction could be summarized as below: (i) Removal of HTML Tags (ii) Tokenization (iii) Checking for Trigger words (iv) Identifying names without trigger words (v) Splitting sequences into smaller names (vi) Alphabetization (vii) Establishing link with the source file (viii) Meta-Name Creation I. Removal of HTML Tags and unformatted characters: Any HTML file will necessarily carry tags beginning with angular brackets [
• Any initial letter that precedes the name, it will be transferred to the end of the name and a comma inserted between the initials and the name All the words / phrases are now alphabetized. 3.2 Identification of Keyword / Key phrases: In OPACs and similar databases the elements representing the subject of the resource usually take the form of a set of data fields that may include keywords, descriptors, subject headings, abstract, classification codes, etc. However, in automatic extraction of keywords from a document it is necessary to look for appropriate lexical clues. The major type of lexical clue to the subject of a document is the set of domain terms the document contains. This usually takes the form of keywords. The experiment reported in this paper involves the use of a set of simple heuristics to identify keywords and key phrases in HTML documents. The module involves the following major inputs: • One or more HTML files constituting the documentary information resources in a domain from which keywords / key phrases are to be extracted automatically; The program requires that all the HTML files be in a single folder • A database which is in effect a list of domain terms in the subject area / discipline of the HTML files: In this study the ASIS thesaurus was employed • A database of ‘stop words’ consisting of all non-noun words taken from the Pocket English dictionary which itself is derived from the New Oxford Dictionary of English. The principal output of the program is a HTML page consisting of extracted keywords / key phrases with hyperlinks to the HTML pages from which they were extracted. Keyword Extraction: The major problems involved in KWE are extraction of keywords and omission of non-significant words. The experience with techniques such as those adopted by the popular search engines clearly brings out the need for a different approach. In this study it was decided to experiment with a validation process using two databases of terms to assist in the identification of keywords and non-significant words in the input file. The validation process employed made certain assumptions: • It was assumed that a word / phrase in the input HTML file that is also part of a controlled vocabulary in the concerned subject domain is a key word / key phrase with a high probability of indicating the subject content of the input file. • Non-noun words in the input file are assumed to be non-significant words. • In the present experiment the following inputs / tools were employed: • A paper entitled ‘Information Retrieval and Cognitive Research’ was used as the input HTML document to test the utility and limitations of the Program. An idea of the paper can be had from the details given in the Table 1 below. • As for identifying keywords and key phrases in the input file online tools were used.. • The ASIS thesaurus (http://www.asis.org/Publications/Thesaurus/isframe.htm) i. Stop-word Terms (ST): Uncontrolled vocabularies have always presented problems in IR. The most common words in English may account for 50% or more of any given text. Their semantic content measured in terms of their value in describing / indicating the subject matter of the text is minimal. Further, such words tend to lessen the impact of frequency differences among the less common words. In addition, they necessitate a large amount of unnecessary processing. In all methods of automatic indexing such less significant words are ignored based on a stop-word list of such words. As already mentioned in the present experiment all non-noun words were 63
Page 2 and 3:
Proceedings of the Conference on Re
Page 4 and 5:
CONTENTS Invited Talks Knowledge Sh
Page 6 and 7:
Technical Session IV : KNOWLEDGE SH
Page 8 and 9:
Digital Collection Development Dhan
Page 10 and 11:
3.3 E-Journals Electronic journals
Page 12 and 13:
disseminate information on speciali
Page 14 and 15:
2.Justification of the Project The
Page 16 and 17:
ack 12 Leased Line 1 mbps port char
Page 18 and 19:
Central Institute of Indian Languag
Page 20 and 21: access the full text articles from
Page 22 and 23: • Integrated authority control an
Page 24 and 25: References 1. Gopal, Krishan, Digit
Page 26 and 27: 1. Characteristics of Digital Libra
Page 28 and 29: Computer CPU, PCI Bus, Ethernet, S
Page 30 and 31: Digital Library Collection Developm
Page 32 and 33: will have to develop mechanisms for
Page 34 and 35: collection management in a distribu
Page 36 and 37: Electronic Collections Collection T
Page 38 and 39: Timeliness The electronic resources
Page 40 and 41: Digital library Infrastructure and
Page 42 and 43: • User acceptability from their d
Page 44 and 45: • Technical skills (Knowledge of
Page 46 and 47: is most prevalent format. In PDF fo
Page 48 and 49: NextGen Digital Resource Centre: On
Page 50 and 51: Liaison”. To streamline, some of
Page 52 and 53: 4. Online Resources With careful wa
Page 54 and 55: 5.3 Lakshya: Mode of Access In curr
Page 56 and 57: Information Management
Page 58 and 59: 2. Content Management Activities Id
Page 60 and 61: 2.3 IGC Reports digitization activi
Page 62 and 63: (TEI) have been proposed as encodin
Page 64 and 65: Overview of Object Oriented Databas
Page 66 and 67: 6. Object Oriented Database Managem
Page 68 and 69: Automatic Extraction of Proper name
Page 72 and 73: ii. iii. considered as non-signific
Page 74 and 75: Keywords and phrases Information pr
Page 76 and 77: In the Internet era, digital librar
Page 78 and 79: • Reduce Maintenance Time - Devel
Page 80 and 81: preservation but differs from it in
Page 82 and 83: First, an organization must underst
Page 84 and 85: 13. Multimedia Metadata Standards M
Page 86 and 87: 3. Definition of Metadata . www.ter
Page 88 and 89: All Dublin Core elements are option
Page 90 and 91: Some libraries use TEI headers to d
Page 92 and 93: Keywords relate solely to the subje
Page 94 and 95: possible, data are entered by choos
Page 96 and 97: Label: Definition : Comment : Eleme
Page 98 and 99: Some digital library systems are us
Page 100 and 101: definition or meaning of the elemen
Page 102 and 103: Digital Library: File Formats, Stan
Page 104 and 105: that area. It provides an improved
Page 106 and 107: subject, description, source, langu
Page 108 and 109: 9.1 Internet Explorer (IE) This is
Page 110 and 111: e well specified and may be differe
Page 112 and 113: General Packet Radio Service (GPRS)
Page 114 and 115: can be split into as many areas as
Page 116 and 117: to the fact that GSM network is a d
Page 118 and 119: network. cdmaOne supports data traf
Page 120 and 121:
9. Conclusion Wireless networks con
Page 122 and 123:
ISH information is supplementary to
Page 124 and 125:
Production and Information use of p
Page 126 and 127:
emanded by the U.S. Circuit Court o
Page 128 and 129:
Information Technology Infrastructu
Page 130 and 131:
The object of Change Management is
Page 132 and 133:
Continuity Management is concerned
Page 134 and 135:
The web service activity statement
Page 136 and 137:
‘A set of interrelated units that
Page 138 and 139:
---To introduce new communication t
Page 140 and 141:
2. Systems approach to appropriate
Page 142 and 143:
• Utilizing world-wide resources
Page 144 and 145:
5. Proposed Model As has already be
Page 146 and 147:
3. Kumaruguru College of Technology
Page 148 and 149:
If visuals, like handwritten or cop
Page 150 and 151:
Knowledge Discovery Tools and Techn
Page 152 and 153:
8. Interpretation: Includes interpr
Page 154 and 155:
and accuracy of the generated rules
Page 156 and 157:
with other security mechanisms alre
Page 158 and 159:
As simulation was added to the trad
Page 160 and 161:
data. The plant data generated at d
Page 162 and 163:
Information professionals can play
Page 164 and 165:
2. Knowledge organization: An organ
Page 166 and 167:
financial resources on information
Page 168 and 169:
innovation systems of a country. Ho
Page 170 and 171:
esults to vocabulary which occurs d
Page 172 and 173:
epresented in the figure. The basic
Page 174 and 175:
Metadata is data that describes dat
Page 176 and 177:
7. Knowledge Management Using GNOWS
Page 178 and 179:
19.Joel Mintzes, James Wandersee an
Page 180 and 181:
2. Merits The successful implementa
Page 182 and 183:
a KMS program. Like product develop
Page 184 and 185:
Advances in Knowledge Management Ku
Page 186 and 187:
applied information. Both forms of
Page 188 and 189:
a mind set of working with co-worke
Page 190 and 191:
Rewards and recognition address the
Page 192 and 193:
The process architecture covers tho
Page 194 and 195:
Institutionalization of 'best pract
Page 196 and 197:
people are so busy just getting the
Page 198 and 199:
and it is matter of time that other
Page 200 and 201:
knowledge management practices alon
Page 202 and 203:
(Fig.1 A frame work for Our Knowled
Page 204 and 205:
# Consistency in Content delivery #
Page 206 and 207:
Knowledge Sharing Techniques
Page 208 and 209:
2. Traditional Translation In tradi
Page 210 and 211:
This is particularly useful for com
Page 212 and 213:
7. Conclusion In this age of digita
Page 214 and 215:
Figure - 1 SIRD SET UP DB 1 DB 2 Fl
Page 216 and 217:
Figure 3 Deployment Diagram 5. Impl
Page 218 and 219:
Figure .5 Search Results Screen 6.S
Page 220 and 221:
Coordination Model for Communicatio
Page 222 and 223:
them. That is, the world where agen
Page 224 and 225:
suit open applications, where a num
Page 226 and 227:
Preservation of Digital Information
Page 228 and 229:
documents. It increases the product
Page 230 and 231:
DESKTOP WORD, PDF DOC XML HTML WIRE
Page 233 and 234:
Digital Libraries and Changing Role
Page 235 and 236:
(iv) Those who are intermixed betwe
Page 237 and 238:
Digital Library: The Change Managem
Page 239 and 240:
• The Change Problem • Change a
Page 241 and 242:
9. Change happens only through peop
Page 243 and 244:
3. Access the required photocopy fo
Page 245 and 246:
Date of request Date of request mai
Page 247 and 248:
TIFR 36% TMC 2% SINP 13% BARC 17% E
Page 249 and 250:
References 1. Mignon Adams. Rethink
Page 251 and 252:
systems. It is believed that while
Page 253 and 254:
3.4 Magnetic Diskettes Magnetic Dis
Page 255 and 256:
6. Contextual Demands The contextua
Page 257 and 258:
Information overload Organizational
Page 259 and 260:
AGRILIBNET: A Web Portal Rathinasab
Page 261 and 262:
3. Agricultural Library Resource Sh
Page 263 and 264:
o The proposed AGRILIBNET will be h
Page 265 and 266:
7. Projct Progress The project has
Page 267 and 268:
APPENDIX -1 List of State Agricultu
Page 269 and 270:
APPENDIX-II State Agricultural Univ
Page 271 and 272:
century and a half ago.” And so m
Page 273 and 274:
The database is divide into two sec
Page 275 and 276:
the work/s within a manuscript. 11.
Page 277 and 278:
266 The Data Elements Which Are Not
Page 279 and 280:
268 Table -3 The Elements Which Are
Page 281 and 282:
270 References: 1. M L Saini. “Ma
Page 283 and 284:
2. RFID Technology in Libraries The
Page 285 and 286:
3.3 Anti-collision If many tags are
Page 287 and 288:
3.5.5 Class 4: Read-Write (with Int
Page 289 and 290:
5. Implementation of RFID: The meth
Page 291 and 292:
several items in a stack can be rea
Page 293:
Author Index Akhtar Hussain, 77 Mah
show all

7 - Indira Gandhi Centre for Atomic Research

Create successful ePaper yourself

Delete template?

Save as template?