06.02.2014 Views

A Library for Processing

A Library for Processing

A Library for Processing

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A <strong>Library</strong> <strong>for</strong> <strong>Processing</strong><br />

Ad-hoc Data in Haskell<br />

Embedding a Data Description Language<br />

Yan Wang and Veronica Gaspes<br />

Halmstad University, Sweden<br />

IFL2008, Sep 12 2008<br />

Yan Wang, Halmstad University, IFL08


Data Is Everywhere<br />

• Standardized data <strong>for</strong>mats<br />

– HTML<br />

– JPEG&MPEG<br />

– XML<br />

– Databases<br />

– … …<br />

• Tools<br />

– Visualizers<br />

– Languages<br />

– Standard libraries<br />

– Trans<strong>for</strong>mers<br />

– … …<br />

<br />

...<br />

<br />

xml<br />

databases<br />

Yan Wang, Halmstad University, IFL08


Data Is Everywhere<br />

• Ad-hoc data <strong>for</strong>mats (Non-standard data <strong>for</strong>mats)<br />

– In geography<br />

– In chemistry<br />

– In genetics<br />

– In finance<br />

– … …<br />

• Tools not available<br />

– Parsers<br />

– Queriers<br />

– Visualizers<br />

– Trans<strong>for</strong>mers<br />

– … …<br />

Yan Wang, Halmstad University, IFL08


Ad-hoc Data in Business<br />

Train e-ticket<br />

Flight e-ticket<br />

Time<br />

Date<br />

Departure&Arrival<br />

Transport<br />

Yan Wang, Halmstad University, IFL08


Ad-hoc Binary Data in Networks<br />

YMSG Packet -- Yahoo Instance Message<br />

Src 83.178.165.157<br />

Dst 76.13.15.53<br />

Yan Wang, Halmstad University, IFL08


Existing Approaches<br />

• Conventional languages<br />

– C, Java, etc.<br />

– Time consuming & error-prone<br />

• Traditional Parsers<br />

– Yacc, Happy, Parsec<br />

– Heavy-weight<br />

• Data Decription Languages<br />

– PADS, Datascript, Packettype<br />

– Difficult to extend<br />

Yan Wang, Halmstad University, IFL08


Data Description Calculus<br />

• DDC: the calculus of dependent types <strong>for</strong> describing data.<br />

– Base types: atomic pieces of data, e.g., intFW(3), stringUtil(ʻ.ʼ)<br />

– Type constructors: richer structures, e.g., {x:intFW(3)| x


Our Approach<br />

• Embedding a DDL into Haskell<br />

– Data <strong>for</strong>mats are described in dependent types using<br />

• Primitive parsers (Base types)<br />

• Parser combinators (Type constructors)<br />

Type Description<br />

Parser<br />

[t] AdhocParser t a<br />

τ<br />

Representation<br />

a<br />

PD<br />

Parse Descriptor<br />

Yan Wang, Halmstad University, IFL08


• C(e)<br />

•Examples<br />

123456<br />

Base Types<br />

class Basetype t a where ... ...<br />

base :: (Basetype t a) =>AdhocParser t a -- [C()]<br />

baselen :: (Basetype t a) => Int AdhocParser t a -- [C(n)]<br />

baseend :: (Basetype t a) => t AdhocParser t a -- [C(t)]<br />

string(n)<br />

stringlen :: Int AdhocParser Char String<br />

stringlen = baselen<br />

stringlen 5<br />

654,321<br />

int(t)<br />

intlen :: Char AdhocParser Char Int<br />

intlen = baseend<br />

intend ’,’<br />

Yan Wang, Halmstad University, IFL08


• {x :τ| e}<br />

• Examples<br />

Constraint<br />

constrainp :: (a Bool) -- e<br />

AdhocParser t b -- [τ]<br />

AdhocParser t (Either a a) -- [{x :| e}]<br />

123,456<br />

654,321<br />

{x :int() | \x x>0 && x0 && x


Dependent Pairs<br />

• Σ x :τ1 .τ2<br />

sigmap :: AdhocParser t a -- [τ1]<br />

(a AdhocParser t b) -- [τ2 (x)]<br />

AdhocParser t (a, b) -- [Σ x :1 .2]<br />

• Examples<br />

’HELLO’<br />

”HELLO”<br />

5HELLO<br />

6HELLO.<br />

Σ x :char().stringend(x)<br />

s1 = sigmap char stringend<br />

Σ len :int().stringlen(len)<br />

s2 = sigmap int stringlen<br />

Yan Wang, Halmstad University, IFL08


Union<br />

• τ1 +τ2<br />

orp :: AdhocParser t a -- [τ1]<br />

AdhocParser t a -- [τ2]<br />

AdhocParser t (Either a b) -- [τ1 + τ2]<br />

• Examples<br />

123.45 is a float<br />

100 is not a float.<br />

float() + int()<br />

num = orp float int<br />

Yan Wang, Halmstad University, IFL08


Adding Tools<br />

Ad-hoc<br />

Data<br />

[t]<br />

Parser<br />

Type Description<br />

τ<br />

AdhocParser t a<br />

Pretty<br />

Printer<br />

Representation<br />

a<br />

Parse Descriptor<br />

Pretty Document<br />

Doc<br />

Error Report<br />

PD<br />

ErrRep<br />

Error<br />

Reporter<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

YMSG<br />

TCP<br />

IP<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

1<br />

type HexChar = Char<br />

instance Basetype HexChar Int where<br />

... ...<br />

instance Basetype HexChar Char where<br />

... ...<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

2<br />

intlen :: Int -> AdhocParser HChar Int<br />

intlen = baselen<br />

charlen :: Int -> AdhocParser HChar Char<br />

charlen = baselen<br />

… …<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

3.1<br />

ippacket =<br />

do version constrainp (==4) (intlen 1)<br />

ihl intlen 1<br />

... ...<br />

tlen intlen 4<br />

src seqp unit unit (intlen 2)<br />

(\xs -> length xs == 4)<br />

dest … …<br />

options orp<br />

(seqp unit unit (intlen 8)<br />

(\xs length xs == (ihl-5)))<br />

unit<br />

(port, sender, reciever, msg) tcppacket<br />

return (Ymsg src dest sender reciever msg)<br />

YMSG<br />

TCP<br />

IP<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

3.2<br />

tcppacket =<br />

do<br />

... ...<br />

port constrainp (== 5050) (intlen 1)<br />

... ...<br />

(sender, reciever, msg) ymsgpacket<br />

return (port, sender, reciever, msg)<br />

ymsgpacket =<br />

do<br />

... ...<br />

return (sender, reciever, msg)<br />

YMSG<br />

TCP<br />

IP<br />

Yan Wang, Halmstad University, IFL08


Example: YMSG Packet<br />

40 fe 20 00 06 00 00 00<br />

06 00 00 00 08 00 45 00<br />

00 83 c7 f6 40 00 80 06<br />

e8 ec 53 b2 9a 9d 4c 0d<br />

0f 35 0d 25 13 ba a0 d4<br />

11 3f c7 5d 20 be 50 18<br />

f7 29 dd 17 00 00 59 4d<br />

53 47 00 0f 00 00 00 47<br />

00 06 5a 55 aa 56 00 49<br />

c6 af 31 c0 80 77 61 6e<br />

67 6b 69 74 38 36 c0 80<br />

35 c0 80 74 61 72 65 6b<br />

31 32 61 6c 79 c0 80 31<br />

34 c0 80 48 65 6c 6c 6f<br />

c0 80 39 37 c0 80 31 c0<br />

80 36 33 c0 80 3b 30 c0<br />

80 36 34 c0 80 30 c0 80<br />

32 30 36 c0 80 31 c0 80<br />

ippacket<br />

Yahoo msg in IPv4:<br />

from Alice (83.178.165.157)<br />

to Bob (76.13.15.53)<br />

on port 5050<br />

msg is ”Hello”<br />

Yan Wang, Halmstad University, IFL08


Thanks <strong>for</strong> your attention!<br />

Questions & Suggestions?<br />

Yan Wang, Halmstad University, IFL08


Implementation<br />

• newtype AdhocParser t a<br />

= P (([t], PD) -> (Either String a, [t],PD)))<br />

• newtype PD = MkPD Int ErrCode Span Body<br />

newtype ErrCode = Ok | Err | Fail<br />

type Span = (Offset, Offset)<br />

data Body = Unit | Pair PD PD | Or (Either PD PD)<br />

| Constrain PD | Seq Int [PD]<br />

| Scan (Maybe (Int,PD)) | Struct [PD]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!