Montag, 23. Mai 2011

Extracting PDF Bookmark Information with itext

iText is an open source Java and C# library for manipulation of PDFs, such as building PDF documents from scratch or insertion of new content into an existing PDF. Yet it can also be used to simply extract information from PDF documents. While the API seems to be well-structured and easy to understand, good documentation on iText is rare. This short blog post shows how to extract bookmark information from PDF documents using iText, which turns out to be quite simple once you know which classes to use.

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.SimpleBookmark;
import java.io.IOException;
import java.util.List;
import java.util.HashMap;
import java.io.FileWriter;

public class Test {

  public static void main(String [] args) throws IOException {
    PdfReader pr = new PdfReader("myDocument.pdf");
    List<HashMap<string,object> > bookmarks = SimpleBookmark.getBookmark(pr);
    FileWriter fw = new FileWriter("myBookmarks.xml");
    SimpleBookmark.exportToXML(bookmarks, fw, "utf8", false);
  }

}