Creating Databases â Importing a Delimited ASCII Text ... - LexisNexis
Creating Databases â Importing a Delimited ASCII Text ... - LexisNexis
Creating Databases â Importing a Delimited ASCII Text ... - LexisNexis
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>LexisNexis</strong> ® Concordance ® 2007<br />
<strong>Creating</strong> <strong>Databases</strong> –<br />
<strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File<br />
Document Overview<br />
• Before You Begin<br />
• <strong>Creating</strong> a New Database File<br />
• Configuring Fields for Your Data<br />
• <strong>Importing</strong> Your Data<br />
• Additional Resources
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 2<br />
Concordance ® 2007 Quick Help<br />
Concordance is a registered trademark of Applied Discovery, Inc. © 2007 Concordance. All rights<br />
reserved.<br />
<strong>LexisNexis</strong> and the Knowledge Burst logo are registered trademarks of Reed Elsevier Properties Inc.,<br />
used under license. Concordance is a registered trademark and FYI is a trademark of Applied Discovery,<br />
Inc. Other products or services may be trademarks or registered trademarks of their respective companies.<br />
© 2007 Concordance. All rights reserved.<br />
Concordance ®<br />
Concordance ® Image<br />
Concordance ® FYI <br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 3<br />
Before You Begin<br />
<strong>Delimited</strong> <strong>ASCII</strong> text files store 2-dimensional arrays of data by separating the values in each row with<br />
specific delimiter characters. Most database and spreadsheet programs are able to read or save data in a<br />
delimited format. <strong>Delimited</strong>-text files may have extensions such as .DAT, .ASC, .CSV or even .TXT, as<br />
long as the file is structured properly with text qualifiers, field delimiters and line breaks.<br />
For many Concordance databases the files will also include optical character recognized (OCR) text and<br />
scanned document images. DAT files will often accompany the OCR text and image files containing the<br />
metadata for each document.<br />
The procedure outlined in this document describes how to import a delimited <strong>ASCII</strong> text (.DAT) file.<br />
You will need…<br />
• Concordance<br />
• <strong>Text</strong> editor program (<strong>Text</strong>Pad, UltraEdit or similar)<br />
• <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) file<br />
<strong>Creating</strong> a New Database File<br />
1 Open Concordance.<br />
2 In the File menu select New.<br />
Figure 1: Concordance Menu – File<br />
3 In the Create database from template dialog (see figure 2), select the Blank database type.<br />
Figure 2: Create database from template – General tab<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 4<br />
4 Click OK.<br />
5 When prompted, choose a file name and directory (choose to store your database locally or on a<br />
network drive).<br />
NOTE – You must have full access to the directory.<br />
6 Click Open to save the database and begin creating and customizing your fields.<br />
Configuring Fields for Your Data<br />
Selecting the Blank database template creates an empty database containing no fields and is best to use<br />
when you are creating a custom structure for a delimited <strong>ASCII</strong> text (.DAT) file.<br />
Plan your database structure<br />
Open your DAT file with a text editor.<br />
Note the following:<br />
• Delimiters used in the file (<strong>Text</strong> qualifier, field, and new line delimiters)<br />
• Field Headers (the first line will usually contain the field headers)<br />
• Type, format, and length of data<br />
• Date fields are 8 digits max, may be in any order with slashes, or in the universal “true date” format<br />
without slashes<br />
• Field(s) database users need to search and sort<br />
• Field (if any) to be linked to an image<br />
• OCR content (if any) to be imported<br />
Tip - While you have the DAT file open, scroll to the bottom of the file, and ensure that the last record<br />
(the last line) has a new line delimiter (create by pressing Enter on your keyboard) at the end of the<br />
record. Without the final return, the last record will not be imported into your database.<br />
Immediately upon creating a blank database the New field dialog will open prompting you to begin<br />
creating and configuring your fields.<br />
1 Type the name of your first field in the Name field (see figure 3).<br />
NOTE – Field names do not need to match field headers specified in the DAT file. They may be up<br />
to 12 characters long and entered in upper or lower case letters. All characters will all be converted to<br />
upper case by the system. They must begin with a letter and may contain only alphanumeric<br />
characters and the underscore.<br />
2 Select the field type in the Type drop-down, and select the appropriate attributes for the field.<br />
Types and Attributes - To successfully import your DAT file, you must create fields to match the<br />
data type and size of your data. Refer to Tables 1 and 2 below for information about Field Types and<br />
Attributes.<br />
Field Order - Create your fields in the order in which you will want to view them in Table and<br />
Browse views. Use the Insert and Delete (Similar functions to Paste and Cut respectively in MS<br />
Office products) buttons to arrange fields into the desired order as necessary.<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 5<br />
3 Click New to confirm your choices and to create the next field.<br />
NOTE – If you accidentally click OK instead of New to create a new field, the New field definition<br />
dialog will close. To access this panel again, select Modify in the File menu.<br />
Figure 3: New field definition dialog<br />
Field Types<br />
Type Capacity Notes<br />
<strong>Text</strong>*<br />
Numeric*<br />
Date*<br />
Paragraph<br />
MMDDYYYY<br />
YYYYMMDD<br />
DDMMYYYY<br />
1-60 alpha or numeric characters, keyed by<br />
default<br />
1-20 digits long (including the decimal<br />
place, negative sign, and all digits following<br />
the decimal place), keyed by default<br />
Use for numeric values that are not used<br />
mathematically (i.e. phone numbers, social<br />
security numbers, and other serial numbers)<br />
Note - If you intend to sort records based on<br />
this field, zero fill any numeric values stored<br />
in to ensure they sort correctly.<br />
Display options available:<br />
• Currency<br />
• Commas<br />
• Zero filled<br />
• Plain<br />
8 bytes in length The date format selected here will control<br />
how the data appears after it is imported<br />
into the database. It does not need to match<br />
the date format in DAT file.<br />
12,000,000 characters (12 MB), indexed by<br />
default<br />
Most flexible and variable in size, not ideal<br />
for sorting or searching by comparison.<br />
Supports rich text formatting.<br />
*Fixed-length field<br />
Table 1: Field Types<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 6<br />
Field Attributes<br />
Attribute* Use Notes<br />
Key<br />
Image<br />
Most commonly applied to fixed length<br />
fields, however it may be applied to any field<br />
(including paragraph fields) to make<br />
relational searches faster.<br />
Used to link Concordance with an Image<br />
viewer, it indicates which field contains the<br />
image name or alias.<br />
Keying a field creates a .KEY file, as KEY<br />
files grow in size, their efficiency decreases<br />
and may slow relational searches.<br />
All keyed fields will appear in the default<br />
table view.<br />
Select only one field per database as an<br />
Image field.<br />
Identifying multiple fields in a database as<br />
an image field will interfere with the linkage<br />
between Concordance and the viewer.<br />
Indexed Enables full text searching. Places every word in the field into a<br />
dictionary file (.NDX and .DCT) for fast<br />
retrieval.<br />
System<br />
Accession<br />
Optical Character<br />
Recognition (OCR) Indexing<br />
Table 2: Field Attributes<br />
Special field that is hidden with no read or<br />
write access to end-users.<br />
Unique serial numbers internally assigned<br />
to each record, managed entirely by<br />
Concordance.<br />
Will not index text that is not contained in a<br />
defined dictionary.<br />
System fields should never be indexed,<br />
added, deleted or modified by users.<br />
Concordance will create these fields for<br />
replication and synchronization information.<br />
Accession numbers may not be edited or<br />
modified. Helpful as load order identifier.<br />
Note – As records are edited, exported or<br />
removed you gaps in numbering may occur.<br />
Not recommended for any fields. Causes<br />
increased indexing times, and will limit the<br />
indexing to Webster’s dictionary and will<br />
include only English words.<br />
Use Synonyms instead.<br />
*Not every Attribute is available for every field type<br />
4 Repeat steps 1 through 3 as necessary to create a structure to match your DAT file.<br />
5 When you have completed creating all your fields, click OK.<br />
Your database structure is ready for the data import.<br />
Additional Considerations<br />
Embedded Punctuation<br />
Embedded punctuation is provided so that hyphenated words, dates, decimal numbers, and contractions<br />
are not split into two or more words. You may add or delete punctuation as needed, by default<br />
Concordance includes ‘ . , / characters as embedded punctuation for all fields.<br />
If you will be importing OCR…<br />
Create your OCR fields now, in addition to the fields for your DAT file import.<br />
As a best practice, create at least two OCR fields labeled with ascending numbers (example: OCR1 &<br />
OCR2) When using the ReadOCR.cpl to import your OCR text, the CPL will automatically overflow<br />
text from the first OCR field if it is over 12 million characters into the next sequential named OCR field.<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 7<br />
NOTE – If you do not create a second sequentially named OCR field, you run the risk of losing overflow<br />
data. You will not receive an error on the import if your content exceeds the 12 million character limit.<br />
<strong>Importing</strong> your data<br />
1 In the Documents menu, select Import then <strong>Delimited</strong> text.<br />
Figure 4: Documents Menu – Import> <strong>Delimited</strong> text…<br />
2 Select the Import/Overlay Wizard in the Import method dialog, and then click OK.<br />
Figure 5: Import Method<br />
3 Accept the default Load option for your initial import of data, and then click Next.<br />
Figure 6: Import Wizard dialog – Load Method<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 8<br />
4 Select the delimited format that matches the one used in your DAT file, then click Next.<br />
NOTE – The Import Wizard defaults to the standard Concordance delimiters, but you may also<br />
select Comma <strong>Delimited</strong> (CSV), Tab <strong>Delimited</strong>, or choose the Custom format and specify your<br />
unique <strong>ASCII</strong> character delimiters in the drop-down menu shown in figure 7.<br />
Figure 7: Import Wizard dialog – Format<br />
5 In the Date format window, select a date format that matches the dates in your DAT file, and then<br />
click Next.<br />
NOTE – Selecting the date format will not affect how it will display in table and browse view. That<br />
preference was set when the date field was created in the New field definition dialog.<br />
Figure 8: Import Wizard – Date format<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 9<br />
6 By default all of the fields you created will appear in the Selected Fields box, make sure the order of<br />
the fields matches the order in your DAT file.<br />
Figure 9: Import Wizard – Fields<br />
If you need to change the order of the files<br />
• Move all the Selected Fields to the Available fields list by clicking on the button.<br />
Or<br />
• Click on a field to reorder and use the Up and Down buttons as needed to correct the order.<br />
NOTE – If the DAT file contains the field information as the first line in the file, select the Skip first<br />
line checkbox to ensure that the data imported from the DAT File has the associated fields in the<br />
Selected Fields window.<br />
7 Click Next to confirm the Selected Fields and their order.<br />
8 Click Browse in order to navigate to and select your DAT file (delimited <strong>ASCII</strong>), and then click<br />
Next.<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 10<br />
Figure 10: Import Wizard – Open<br />
9 Confirm the location of your DAT file in the File field and click Finish to import your data.<br />
Figure 11: Import Wizard – Finish<br />
10 When the import is complete, the dialog will close. Select the Browse view to verify that your data<br />
import was successful.<br />
If you are not linking to images or loading OCR, you are ready to index your database and get started<br />
searching, tagging, and working with your records.<br />
Copyright © 2007 Concordance. All rights reserved.
<strong>Creating</strong> <strong>Databases</strong> – <strong>Importing</strong> a <strong>Delimited</strong> <strong>ASCII</strong> <strong>Text</strong> (DAT) File 11<br />
Additional Resources<br />
General Product Information<br />
http://law.lexisnexis.com/concordance<br />
Concordance Technical Support<br />
Phone: 866-495-2397<br />
Email: concordancesupport@lexisnexis.com<br />
Concordance Training<br />
Phone: 425-463-3503<br />
Email: concordancetraining@lexisnexis.com<br />
Copyright © 2007 Concordance. All rights reserved.