achadoop

Recommendations

Info

10 Chapter 1 • SAS/ACCESS Interface to Hadoop Naming Conventions for Hive Data Types for Hadoop Overview For general information about this feature, see “SAS Names and Support for DBMS Names” in Chapter 2 of SAS/ACCESS for Relational Databases: Reference. SAS and Hadoop objects include tables, views, table references, columns, and indexes. They follow these naming conventions. • A SAS name must be from 1 to 32 characters long. When Hive column names and table names are 32 characters or less, SAS handles them seamlessly. When SAS reads Hive column names that are longer than 32 characters, a generated SAS variable name is truncated to 32 characters. Hive table names should be 32 characters or less because SAS cannot truncate a table reference. If you already have a table name that is greater than 32 characters, create a Hive table view or use the explicit SQL feature of PROC SQL to access the table. • If truncating would result in identical names, SAS generates a unique name. • Even when it is enclosed in single or double quotation marks, a Hive name does not retain case sensitivity. Hive table and column names can contain uppercase letters A through Z (A-Z), lowercase letters A through Z (a-z), numbers from 0 to 9, and the underscore (_). Hive converts uppercase characters to lowercase. Therefore, such SAS table references as MYTAB and mytab are synonyms—referring to the same table. • A name can begin with a letter or an underscore but not a number. • A name cannot be a Hadoop reserved word. If a name generates a Hadoop error, try to append a number or underscore in an appropriate place. For example, if shipped results in an error, try shipped1 or ship_date. Although the PRESERVE_COL_NAMES= and PRESERVE_TAB_NAMES= options are supported for SAS/ACCESS Interface to Hadoop, you should not need to use them. Hive is a data warehouse that supplies metadata about data that is stored in Hadoop files. Hive includes a data dictionary and an accompanying SQL-like interface called HiveQL or Hive SQL. HiveQL implements data definition language (DDL) and data manipulation language (DML) statements similar to many relational database management systems. Hive tables are defined with a CREATE TABLE statement, so every column in a table has a name and a data type. The data type indicates the format in which the data is stored. Hive exposes data that is stored in HDFS and other file systems through the data types that are described in this section. These are accepted column types in a Hive CREATE TABLE statement. This section includes information about Hive data types and data conversion between Hive and SAS. For more information about Hive, Hadoop, and data types, see these online documents at https://cwiki.apache.org/confluence/display/Hive. • Hive Getting Started
Simple Hive Data Types • Hive Language Manual • Hive Language Manual UDF • Hive Tutorial Complex Hive Data Types STRING specifies variable-length character data. The maximum length is 2 gigabytes. TINYINT specifies a signed one-byte integer. SMALLINT specifies a signed two-byte integer. INT specifies a signed four-byte integer. BIGINT specifies a signed eight-byte integer. FLOAT specifies a signed four-byte, floating-point number. DOUBLE specifies a signed eight-byte, floating-point number. SAS/ACCESS does not yet support the BOOLEAN and BINARY data types that are available in Hive 8. ARRAY specifies an array of integers (indexable lists). MAP specifies an associated array from strings to string (key-value pairs). STRUCT specifies how to access elements within the type—namely, by using the dot (.) notation. Hive Date, Time, and Timestamp Data Data Types for Hadoop 11 Hive does not define data types for dates, times, or timestamps. By convention, these are typically stored in ANSI format in Hive STRING columns. For example, the last day of this millennium is stored as the string ‘2999-12-31’. SAS/ACCESS assumes ANSI format for dates, times, and timestamps that are stored in Hadoop. Therefore, if you use the DBSASTYPE option to indicate that a Hive STRING column contains a date, SAS/ACCESS expects an ANSI date that it converts to SAS date format. For output, SAS DATE, TIME, and DATETIME formats are converted to ANSI format and are stored in Hive STRING columns. SAS/ACCESS does not currently support conversion of Hadoop binary UNIX timestamps to SAS DATETIME format.
Page 1 and 2: SAS ® Reference 9.3 Interface to H
Page 3 and 4: Contents About This Document . . .
Page 5 and 6: About This Document Audience This d
Page 7 and 8: Recommended Reading Here is the rec
Page 9 and 10: Chapter 1 SAS/ACCESS Interface to H
Page 11 and 12: LIBNAME Statement Specifics for Had
Page 13 and 14: Hadoop LIBNAME Statement Examples T
Page 15 and 16: etween Hadoop and SAS, Hadoop might
Page 17: Examples table name, and all files
Page 21 and 22: SAS Table Properties for Hive and H
Page 23 and 24: them. The result is an overly wide
Page 25 and 26: Data Types for Hadoop 17 achieve th
Page 27 and 28: Hadoop Null Values put bgint=; put
Page 29 and 30: Chapter 2 HADOOP Procedure Overview
Page 31 and 32: Submitting Pig Language Code The Ap
Page 33 and 34: Syntax MAPREDUCE Statement HDFS ;
Page 35 and 36: Syntax PIG ; Pig Code Options CODE
Page 37 and 38: and password on the Hadoop server,
Page 39 and 40: Chapter 3 FILENAME, Hadoop Access M
Page 41 and 42: would use the FILEEXT option to rea
Page 43 and 44: mapred.job.tracker xxx.unx.sas.com:
Page 45 and 46: Index A ARRAY data type Hive (Hadoo

achadoop

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?