21.08.2013 Views

Give me my drawing back! - The Document Foundation Wiki

Give me my drawing back! - The Document Foundation Wiki

Give me my drawing back! - The Document Foundation Wiki

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Give</strong> <strong>me</strong> <strong>my</strong> <strong>drawing</strong> <strong>back</strong>!<br />

Dragging your Visio, Publisher and CorelDraw files<br />

to free-sofware world<br />

Fridrich Štrba<br />

Software Engineer, SUSE<br />

1


Agenda<br />

LibreOffice's contribution to wider<br />

FOSS eco-system<br />

Visio, CorelDraw, Publisher,...<br />

Interesting parts of the reverseengineering<br />

Incre<strong>me</strong>ntal reverse-engineering<br />

Evolution of file-formats observed<br />

2


LibreOffice's contribution to wider<br />

FOSS eco-system


Designed to be re-used<br />

LibreOffice uses technologies available in the FOSS ecosystem<br />

We love to give <strong>back</strong> and share the fruit of our sweat<br />

Libwpg, libvisio, libcdr and libmspub<br />

Standalone libraries<br />

Using the sa<strong>me</strong> interface<br />

Internal class generating SVG for lazy hackers :)<br />

More users, more bug reports and (eventually) fixes<br />

Reverse-engineering is by principle trial & error exercise<br />

4


Visio Import filter - libvisio<br />

Google Sum<strong>me</strong>r of Code 2011<br />

Eilidh McAdam<br />

Previous reverse-engineering work by re-lab's Valentin<br />

Filippov<br />

Started with Visio 2000 – Visio 2010 file-formats<br />

LibreOffice 3.5 release<br />

Visio 2000 and Visio 2002 – version 6 file-format<br />

Visio 2003 to Visio 2010 – version 11 file-format<br />

Extended in 2012 to ALL Visio file-format versions that ever<br />

existed<br />

Upcoming LibreOffice 4.0 release<br />

Visio 2013 – OOXML-ish version (*.vsdx)<br />

Visio 1 – 5<br />

Visio XML Drawings (*.vdx)<br />

5


<strong>The</strong> team<br />

Valentin Filippov Fridrich Štrba Eilidh McAdam<br />

6


CorelDraw import filter - libcdr<br />

Work started in late 2011<br />

Released in LibreOffice 3.6.x<br />

Still improving<br />

Valek's reverse-engineering work<br />

cdr_explorer<br />

So<strong>me</strong> of it reused in sk1 project, which is currently dormant<br />

An interesting challenge after the success of libvisio<br />

Continuation of a fruitful collaboration<br />

Support for ALL CorelDraw file-formats<br />

Starting from version 1 (code Waldo)<br />

Ending by CorelDraw x6 released in March 2012<br />

7


Microsoft Publisher Import filter - libmspub<br />

Google Sum<strong>me</strong>r of Code 2012<br />

Brennan T. Vincent<br />

Flagship feature of LibreOffice 4.0<br />

Reverse-engineering started by Valek Filippov<br />

Completed in tandem.<br />

Version support<br />

MS Publisher 97<br />

MS Publisher 98/2000<br />

MS Publisher 2002-2013<br />

8


Interesting ele<strong>me</strong>nts:<br />

Incre<strong>me</strong>ntal reverse-engineering


Progressive develop<strong>me</strong>nt of file-formats<br />

Nobody reinvents a wheel from scratch<br />

It is useful to know the release dates of different versions when doing<br />

reverse-engineering<br />

Two subsequent versions of the sa<strong>me</strong> file-format will have many things in<br />

common<br />

Design parser to be able to parse lower and higher versions<br />

Opened version conditions<br />

Guard assumptions by exceptions and be verbose in debug mode<br />

Try to parse lower or higher version using the existing parser<br />

Fix issues as they appear<br />

Importance of a small number of reference docu<strong>me</strong>nts covering many<br />

features<br />

10


Extending the CorelDraw version coverage (1)<br />

Departing point<br />

Support for versions 7 to x3<br />

Basically the knowledge from cdr_explorer<br />

Extending the coverage upwards<br />

x4 and x5<br />

Support for RIFF docu<strong>me</strong>nts inside structured ZIP storage<br />

x6<br />

More complicated structure inside the ZIP storage<br />

Extending the coverage downwards<br />

Version 6 (first 32-bit version)<br />

Only so<strong>me</strong> RIFF na<strong>me</strong>s different<br />

Versions 4 and 5 (16-bit versions)<br />

Different way to express coordinates<br />

11


Extending the CorelDraw version coverage (2)<br />

Extending the coverage downwards (cont'ed)<br />

Version 3<br />

First RIFF based CDR file-format<br />

but we did not know it by then<br />

Fill and outline information embedded inside the shape<br />

Shape transform does not accumulate group transforms<br />

Versions 2 and 1<br />

Not RIFF based at all<br />

Version 2 more structured<br />

With so<strong>me</strong> exception handling both can be parsed alike<br />

A header with pointers to different sequences of chunks<br />

Imple<strong>me</strong>ntation of linked list (“type 1”) and shape information (“type 2”)<br />

Embedded raster (“type 3” and “6”), group transforms (“type 7”),<br />

arrow information (“type 8”),<br />

12


Extending the Visio version coverage (1)<br />

Departing point<br />

Versions 6 and 11<br />

Difference in so<strong>me</strong> offsets and in text encoding<br />

Common structure<br />

A trailer pointing to “streams”<br />

So<strong>me</strong> “streams” consist in a hierarchical sequence of “chunks”<br />

Shapes and text content in “chunks”<br />

Bug driven rewrite<br />

A docu<strong>me</strong>nt (most likely generated by SDK)<br />

Challenged completely our assumptions and led to more generalized<br />

parser<br />

13


Extending the Visio version coverage (2)<br />

Microsoft Visio 2013 Preview<br />

We wanted to support it before the official release<br />

xml-based (ooxml-ish) file-format (*.vsdx)<br />

Another rewrite of the parsers<br />

Need to separate more clearly the parsing and information processing<br />

Side-effect: support of Visio XML Drawing (*.vdx)<br />

Versions 1 to 5<br />

So<strong>me</strong> “chunks” of type list different<br />

An override for readers of so<strong>me</strong> chunks<br />

“streams” format very similar<br />

Little abstractions and generalizations needed<br />

Improved understanding of the file-format<br />

Cleaner and simpler parser<br />

14


Getting involved ...<br />

how you can make a difference


Future file-formats to import?<br />

Google Sum<strong>me</strong>r of Code<br />

<strong>The</strong> possibility for a student to work with outstanding<br />

<strong>me</strong>ntors<br />

Valentin Filippov<br />

Your faithful<br />

(Altsys, Aldus, Macro<strong>me</strong>dia & Adobe) Freehand<br />

File-format partially reverse-engineered<br />

<strong>The</strong> big lines of the structure<br />

Ripe to be a successful project<br />

A talented student can make difference in LibreOffice<br />

16


Impact within LibreOffice and the known universe<br />

Happy users will reward you<br />

You will be the hero of the people who can now read their<br />

docu<strong>me</strong>nts...<br />

… and they will get on your nerves listing features that are<br />

not converted.<br />

Users outside LibreOffice<br />

Inkscape reuses libvisio and libcdr in 0.49<br />

Calligra reuses libvisio and (possibly) libcdr since 2.5<br />

17


QA and Stoning session<br />

All text and image content in this docu<strong>me</strong>nt is licensed under the Creative Commons Attribution-Share Alike 3.0 License<br />

(unless otherwise specified). "LibreOffice" and "<strong>The</strong> Docu<strong>me</strong>nt <strong>Foundation</strong>" are registered trademarks. <strong>The</strong>ir respective logos<br />

and icons are subject to international copyright laws. <strong>The</strong> use of these therefore is subject to the trademark policy.<br />

18

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!