Converting HTML to PDF using iText

I am posting this question because many developers ask more or less the same question in different forms. I will answer this question myself (I am the Founder/CTO of iText Group), so that it can be a "Wiki-answer." If the Stack Overflow "documentation" feature still existed, this would have been a good candidate for a documentation topic.

The source file:

I am trying to convert the following HTML file to PDF:
  Colossal (movie) .poster < width: 120px;float: right; >.director < font-style: italic; >.description < font-family: serif; >.imdb < font-size: 0.8em; >a  

Colossal (2016)

Directed by Nacho Vigalondo
Gloria is an out-of-work party girl forced to leave her life in New York City, and move back home. When reports surface that a giant creature is destroying Seoul, she gradually comes to the realization that she is somehow connected to this phenomenon.
Read more about this movie on IMDB

enter image description here

In a browser, this HTML looks like this:

The problems I encountered:

HTMLWorker doesn't take CSS into account at all When I used HTMLWorker , I need to create an ImageProvider to avoid an error that informs me that the image can't be found. I also need to create a StyleSheet instance to change some of the styles:

public static class MyImageFactory implements ImageProvider < public Image getImage(String src, Maph, ChainedProperties cprops, DocListener doc) < try < return Image.getInstance( String.format("resources/html/img/%s", src.substring(src.lastIndexOf("/") + 1))); >catch (DocumentException e) < e.printStackTrace(); >catch (IOException e) < e.printStackTrace(); >return null; > > public static void main(String[] args) throws IOException, DocumentException < Document document = new Document(); PdfWriter.getInstance(document, new FileOutputStream("results/htmlworker.pdf")); document.open(); StyleSheet styles = new StyleSheet(); styles.loadStyle("imdb", "size", "-3"); HTMLWorker htmlWorker = new HTMLWorker(document, null, styles); HashMapproviders = new HashMap(); providers.put(HTMLWorker.IMG_PROVIDER, new MyImageFactory()); htmlWorker.setProviders(providers); htmlWorker.parse(new FileReader("resources/html/sample.html")); document.close(); > 

The result looks like this: enter image description here For some reason, HTMLWorker also shows the content of the tag. I don't know how to avoid this. The CSS in the header isn't parsed at all, I have to define all the styles in my code, using the StyleSheet object. When I look at my code, I see that plenty of objects and methods I'm using are deprecated: enter image description here So I decided to upgrade to using XML Worker. Images aren't found when using XML Worker I tried the following code:

public static final String DEST = "results/xmlworker1.pdf"; public static final String HTML = "resources/html/sample.html"; public void createPdf(String file) throws IOException, DocumentException

enter image description here

This resulted in the following PDF: Instead of Times-Roman, the default font Helvetica is used; this is typical for iText (I should have defined a font explicitly in my HTML). Otherwise, the CSS seems to be respected, but the image is missing, and I didn't get an error message. With HTMLWorker , an exception was thrown, and I was able to fix the problem by introducing an ImageProvider . Let's see if this works for XML Worker. Not all CSS styles are supported in XML Worker I adapted my code like this:

public static final String DEST = "results/xmlworker2.pdf"; public static final String HTML = "resources/html/sample.html"; public static final String IMG_PATH = "resources/html/"; public void createPdf(String file) throws IOException, DocumentException < Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file)); document.open(); CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true); HtmlPipelineContext htmlContext = new HtmlPipelineContext(null); htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory()); htmlContext.setImageProvider(new AbstractImageProvider() < public String getImageRootPath() < return IMG_PATH; >>); PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer); HtmlPipeline html = new HtmlPipeline(htmlContext, pdf); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); XMLWorker worker = new XMLWorker(css, true); XMLParser p = new XMLParser(worker); p.parse(new FileInputStream(HTML)); document.close(); > 

enter image description here

My code is much longer, but now the image is rendered: The image is larger than when I rendered it using HTMLWorker which tells me that the CSS attribute width for the poster class is taken into account, but the float attribute is ignored. How do I fix this?

The remaining question:

So the question boils down to this: I have a specific HTML file that I try to convert to PDF. I have gone through a lot of work, fixing one problem after the other, but there is one specific problem that I can't solve: how do I make iText respect CSS that defines the position of an element, such as float: right ?