Metadata, in its most simple definition, means “data about data”. So what’s wrong with data about data? Leveraging metadata contained in a document contract created using Microsoft Word, you can learn who originally created a document, the names of the last ten editors (except in Office 2007), and the names of the computer and network drives the document has been stored on. You also can view all changes made to the document, read comments the author thought had been deleted, tell how many revisions have been made and how long it took to make them, and much more.
Incidentally, so can your client, opposing counsel, and even the most amateur computer forensics expert.
Here’s how it happens – you send your client a finished legal document as a Word e-mail attachment. Your client opens the document and learns that the document was drafted by your paralegal by cutting and pasting from a prior client’s document and edited by a first-year associate. The client also discovers that the total editing time tracked in the document doesn’t match the total hours billed by the billing attorney, whose name and billing rate don’t match the names and billing rates of those who actually did the work. Odds are, this won’t be a client for long.
And ultimately that’s why this matters to today’s lawyer. This is a service and ethics issue with far-reaching effect. Information you thought was invisible and consequentially confidential really isn’t. It is critical that you be familiar with the origins of metadata, what it consists of and, most importantly, how to remove it when necessary.
What exactly is the Metadata in Your Document?
- Authors: Everyone (up to 10) that has collaborated on the document
- Comments: Comments from reviewers
- Company or firm name
- Computer name: This is a network level setting that often identifies the computer used to make changes to the document
- Document revisions: This can include deleted text
- Document versions: How many copies of the document exist
- File location: Where to find the file, whether on the PC or anywhere else on the network it was saved
- File properties: Exact byte size of document
- Headers, footers, and watermarks: Word documents and Excel workbooks can contain information in headers and footers. Additionally, you might have added a watermark to your Word document.
- Hidden text: Word documents can contain text that is formatted as hidden text
- Hyperlinks: links to websites or other data locations
- Initials: Authors and/or editors
- Tracked changes
- Undo/redo history
- Hidden text, rows, columns, and worksheets: In an Excel workbook, rows, columns, and entire worksheets can be hidden. If you distribute a copy of a workbook that contains these, they can be easily uncovered.
- Off-slide content: PowerPoint presentations can contain objects that are not immediately visible because they were dragged off the slide into the off-slide area including text boxes, clip art, graphics, and tables.
Options for Handling Metadata
No single, central place can address all metadata issues in all programs. You can turn off some features that create metadata in the Microsoft Word “Options” dialog boxes by selecting “Settings” under the Tools menu. Other programs have an Options menu and include “Security” settings to address some metadata issues. Good places to look are the pull-down menus under File, Edit, View, Insert, Format, Table and Tools or at the software provider’s website.
Manual Options:
- RTF: One option is to save your document to RTF (Rich Text Format) before attaching it to an e-mail. Under the File menu, select “Save As” and, in the dialog box for the File Type, select the option of saving your document in RTF Rich Text Format. Documents in this format will show “.rtf” at the end of the file name.
- Scan the Document: Another option is to print your word processing document and then scan it and turn it into a Tiff or PDF (Portable Document File).
- Print to PDF: While printing to PDF will not remove ALL metadata, it will remove the ‘track changes’ type data. Author and Date created types of information can be copied to the PDF when the file is created.
- Microsoft Prepare/Inspect Document: Microsoft has included a metadata cleaning tool as part of the Office 2003/2007 program. This tool is said to remove the vast majority of potentially private data.
Automated Options:
You also have the option of investing in an automated tool for keeping your documents clean. The principle behind these metadata cleansing programs is to give the end-user the ability to automatically identify and remove metadata before it leaves your office. These programs integrate with your email systems such as Outlook.
Since metadata becomes an issue with sending electronic versions, you should be sure that the metadata cleansing program works with your email program, whether it’s Microsoft Outlook, Eudora or Lotus Notes.
One of the most widely used tools in the legal market today is a product called Payne’s Metadata Assistant. There are several versions of this tool ranging from a basic retail version ($80) to a customizable Enterprise version with volume licensing options. Like most Metadata cleansing systems, this product integrates with your word processing application to clean your documents while you are working on them. It also provides a ‘gate keeper’ functionality that keeps files from being sent to outside individuals or contacts without being properly cleaned. As was previously mentioned, there may be times when you wish to leave the metadata in place as part of a collaborative effort with individuals outside the firm.