I am posting this question because many developers ask more or less the same question in different forms. I will answer this question myself (I am the Founder/CTO of iText Group), so that it can be a "Wiki-answer." If the Stack Overflow "documentation" feature still existed, this would have been a good candidate for a documentation topic.
Colossal (movie) .poster < width: 120px;float: right; >.director < font-style: italic; >.description < font-family: serif; >.imdb < font-size: 0.8em; >a Colossal (2016)
Directed by Nacho Vigalondo Gloria is an out-of-work party girl forced to leave her life in New York City, and move back home. When reports surface that a giant creature is destroying Seoul, she gradually comes to the realization that she is somehow connected to this phenomenon. Read more about this movie on IMDB
In a browser, this HTML looks like this:
HTMLWorker doesn't take CSS into account at all When I used HTMLWorker , I need to create an ImageProvider to avoid an error that informs me that the image can't be found. I also need to create a StyleSheet instance to change some of the styles:
public static class MyImageFactory implements ImageProvider < public Image getImage(String src, Maph, ChainedProperties cprops, DocListener doc) < try < return Image.getInstance( String.format("resources/html/img/%s", src.substring(src.lastIndexOf("/") + 1))); >catch (DocumentException e) < e.printStackTrace(); >catch (IOException e) < e.printStackTrace(); >return null; > > public static void main(String[] args) throws IOException, DocumentException < Document document = new Document(); PdfWriter.getInstance(document, new FileOutputStream("results/htmlworker.pdf")); document.open(); StyleSheet styles = new StyleSheet(); styles.loadStyle("imdb", "size", "-3"); HTMLWorker htmlWorker = new HTMLWorker(document, null, styles); HashMapproviders = new HashMap(); providers.put(HTMLWorker.IMG_PROVIDER, new MyImageFactory()); htmlWorker.setProviders(providers); htmlWorker.parse(new FileReader("resources/html/sample.html")); document.close(); >
The result looks like this: For some reason, HTMLWorker also shows the content of the tag. I don't know how to avoid this. The CSS in the header isn't parsed at all, I have to define all the styles in my code, using the StyleSheet object. When I look at my code, I see that plenty of objects and methods I'm using are deprecated: So I decided to upgrade to using XML Worker. Images aren't found when using XML Worker I tried the following code:
public static final String DEST = "results/xmlworker1.pdf"; public static final String HTML = "resources/html/sample.html"; public void createPdf(String file) throws IOException, DocumentException
This resulted in the following PDF: Instead of Times-Roman, the default font Helvetica is used; this is typical for iText (I should have defined a font explicitly in my HTML). Otherwise, the CSS seems to be respected, but the image is missing, and I didn't get an error message. With HTMLWorker , an exception was thrown, and I was able to fix the problem by introducing an ImageProvider . Let's see if this works for XML Worker. Not all CSS styles are supported in XML Worker I adapted my code like this:
public static final String DEST = "results/xmlworker2.pdf"; public static final String HTML = "resources/html/sample.html"; public static final String IMG_PATH = "resources/html/"; public void createPdf(String file) throws IOException, DocumentException < Document document = new Document(); PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file)); document.open(); CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true); HtmlPipelineContext htmlContext = new HtmlPipelineContext(null); htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory()); htmlContext.setImageProvider(new AbstractImageProvider() < public String getImageRootPath() < return IMG_PATH; >>); PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer); HtmlPipeline html = new HtmlPipeline(htmlContext, pdf); CssResolverPipeline css = new CssResolverPipeline(cssResolver, html); XMLWorker worker = new XMLWorker(css, true); XMLParser p = new XMLParser(worker); p.parse(new FileInputStream(HTML)); document.close(); >
My code is much longer, but now the image is rendered: The image is larger than when I rendered it using HTMLWorker which tells me that the CSS attribute width for the poster class is taken into account, but the float attribute is ignored. How do I fix this?
So the question boils down to this: I have a specific HTML file that I try to convert to PDF. I have gone through a lot of work, fixing one problem after the other, but there is one specific problem that I can't solve: how do I make iText respect CSS that defines the position of an element, such as float: right ?