Office 12 new XML formats
This is big news and widely reported, and live on video. I am increasingly impressed around the evolution of Office, I think Microsoft is finally realising that people don’t want more incremental functions to refine what they already do. They want new way of working to be enabled. The new features in 12 seem to be going in that direction at least in the collaboration and information management areas, I can’t wait to see what they do when they can build on top of Longhorn and WinFS. That’s not to say that OOo is not doing some great creative stuff as well, and of course the killer value proposition of OOo will be its ubiquity as within 3 years I doubt there will be a corporate desktop anywhere that does not have access to OOo, I don’t think we will be able to say that about office 12, so Microsoft needs to get creative, Metro is the first glimmer of that, we will have to wait and see!
One of the best places to keep informed seems to be Brian Jone’s blog, a bit about Brian:
Brian is a program manager on the Word team. He’s been at Microsoft for about 6 years, and has been working on XML support in Word and across Office for a good percentage of that time. He set up his blog to talk with people about what Microsoft are doing in the next version of Office around XML
It seems to be fairly well received so far, main point are:
- Open Format: These formats use XML and ZIP, and they will be fully documented. Anyone will be able to get the full specs on the formats and there will be a royalty free license for anyone that wants to work with the files.
- Compressed: Files saved in these new XML formats are less than 50% the size of the equivalent file saved in the binary formats. This is because we take all of the XML parts that make up any given file, and then we ZIP them. We chose ZIP because it’s already widely in use today and we wanted these files to be easy to work with. (ZIP is a great container format. Of course I’m not the only one who thinks so… a number of other applications also use ZIP for their files too.)
- Robust: Between the usage of XML, ZIP, and good documentation the files get a lot more robust. By compartmentalizing our files into multiple parts within the ZIP, it becomes a lot less likely that an entire file will be corrupted (instead of just individual parts). The files are also a lot easier to work with, so it’s less likely that people working on the files outside of Office will cause corruptions.
- Backward compatible: There will be updates to Office 2000, XP, and 2003 that will allow those versions to read and write this new format. You don’t have to use the new version of Office to take advantage of these formats. (I think this is really cool. I was a big proponent of doing this work)
- Binary Format support: You can still use the current binary formats with the new version of Office. In fact, people can easily change to use the binary formats as the default if that’s what they’d rather do.
- New Extensions: The new formats will use new extensions (.docx, .pptx, .xlsx) so you can tell what format the files you are dealing with are, but to the average end user they’ll still just behave like any other Office file. Double click & it opens in the right application.
There are two white-papers for more details:
The Microsoft Office Open XML Formats: New File Formats for “Office 12”
This first whitepaper is a general overview of the file format, and is targeted at multiple audiences. It starts off with an introduction about what’s going on and also briefly touches on the history of the current binary formats and how we got to where we are today.The Microsoft Office Open XML Formats: Preview for Developers
This paper talks more about the architecture of the formats and is targeted at developers. This paper has a similar introduction to the first (but from a slightly different angle). The last 7 or so pages of the paper go into solutions and what people can do with these files. It’s a great way to start thinking about the possibilities, and what types of things you can probably expect to see built on top of the format.