Why not try Google it will give you plenty of article to learn how to convert HTML string to PDF using ITextSharp whatever you can start from here and if you have tried something and have problem so please share your effort with us we will try to solve it. Oct 09, 2011 What version of DLL are you using. Practically you should be able to cast it by converting it to arraylist. If you send me the codesnippet you have and the dll version i will try to help you out.
I want to convert the below HTML to PDF using iTextSharp but don't know where to start:
Chris HaasChris Haas5 Answers
First, HTML and PDF are not related although they were created around the same time. HTML is intended to convey higher level information such as paragraphs and tables. Although there are methods to control it, it is ultimately up to the browser to draw these higher level concepts. PDF is intended to convey documents and the documents must 'look' the same wherever they are rendered.
In an HTML document you might have a paragraph that's 100% wide and depending on the width of your monitor it might take 2 lines or 10 lines and when you print it it might be 7 lines and when you look at it on your phone it might take 20 lines. A PDF file, however, must be independent of the rendering device, so regardless of your screen size it must always render exactly the same.
Because of the musts above, PDF doesn't support abstract things like 'tables' or 'paragraphs'. There are three basic things that PDF supports: text, lines/shapes and images. (There are other things like annotations and movies but I'm trying to keep it simple here.) In a PDF you don't say 'here's a paragraph, browser do your thing!'. Instead you say, 'draw this text at this exact X,Y location using this exact font and don't worry, I've previously calculated the width of the text so I know it will all fit on this line'. You also don't say 'here's a table' but instead you say 'draw this text at this exact location and then draw a rectangle at this other exact location that I've previously calculated so I know it will appear to be around the text'.
Second, iText and iTextSharp parse HTML and CSS. That's it. ASP.Net, MVC, Razor, Struts, Spring, etc, are all HTML frameworks but iText/iTextSharp is 100% unaware of them. Same with DataGridViews, Repeaters, Templates, Views, etc. which are all framework-specific abstractions. It is your responsibility to get the HTML from your choice of framework, iText won't help you. If you get an exception saying The document has no pages
or you think that 'iText isn't parsing my HTML' it is almost definite that you don't actuallyhave HTML, you only think you do.
Third, the built-in class that's been around for years is the HTMLWorker
however this has been replaced with XMLWorker
(Java / .Net). Zero work is being done on HTMLWorker
which doesn't support CSS files and has only limited support for the most basic CSS properties and actually breaks on certain tags. If you do not see the HTML attribute or CSS property and value in this file then it probably isn't supported by HTMLWorker
. XMLWorker
can be more complicated sometimes but those complications also make itmoreextensible.
Below is C# code that shows how to parse HTML tags into iText abstractions that get automatically added to the document that you are working on. C# and Java are very similar so it should be relatively easy to convert this. Example #1 uses the built-in HTMLWorker
to parse the HTML string. Since only inline styles are supported the class='headline'
gets ignored but everything else should actually work. Example #2 is the same as the first except it uses XMLWorker
instead. Example #3 also parses the simple CSS example.
2017's update
There are good news for HTML-to-PDF demands. As this answer showed, the W3C standard css-break-3 will solve the problem... It is a Candidate Recommendation with plan to turn into definitive Recommendation this year, after tests.
As not-so-standard there are solutions, with plugins for C#, as showed by print-css.rocks.
@Chris Haas has explained very well how to use itextSharp
to convert HTML
to PDF
, very helpful
my add is:
By using HtmlTextWriter
I put html tags inside HTML
table + inline CSS i got my PDF as I wanted without using XMLWorker
.
Edit: adding sample code:
ASPX page:
C# code:
of course include iTextSharp Refrences to cs file
Hope this helps!
Thank you
As of 2018, there is also iText7 (A next iteration of old iTextSharp library) and its HTML to PDF package available: itext7.pdfhtml
Usage is straightforward:
Method has many more overloads.
Update: iText* family of products has dual licensing model: free for open source, paid for commercial use.
Here's the link I used as a guide. Hope this helps!
You can download the sample file. Just place the html
you want to convert in the files
folder and run. It will automatically generate the pdf file and place it in the same folder. But in your case, you can specify your html path in the htmlFileName
variable.
Convert Html To Pdf Using Itextsharp In Mvc Example
protected by Bruno LowagieMar 19 '15 at 17:35
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Not the answer you're looking for? Browse other questions tagged c#pdf-generationitextsharpxmlworker or ask your own question.
Itextsharp Convert Html String To Pdf
I am posting this question because many developers ask more or less the same question in different forms. I will answer this question myself (I am the Founder/CTO of iText Group), so that it can be a 'Wiki-answer.' If the Stack Overflow 'documentation' feature still existed, this would have been a good candidate for a documentation topic.
I am trying to convert the following HTML file to PDF:
In a browser, this HTML looks like this:
HTMLWorker doesn't take CSS into account at all
When I used HTMLWorker
, I need to create an ImageProvider
to avoid an error that informs me that the image can't be found. I also need to create a StyleSheet
instance to change some of the styles:
The result looks like this:
For some reason, HTMLWorker
also shows the content of the <title>
tag. I don't know how to avoid this. The CSS in the header isn't parsed at all, I have to define all the styles in my code, using the StyleSheet
object.
When I look at my code, I see that plenty of objects and methods I'm using are deprecated:
So I decided to upgrade to using XML Worker.
Images aren't found when using XML Worker
I tried the following code:
This resulted in the following PDF:
Instead of Times-Roman, the default font Helvetica is used; this is typical for iText (I should have defined a font explicitly in my HTML). Otherwise, the CSS seems to be respected, but the image is missing, and I didn't get an error message.
With HTMLWorker
, an exception was thrown, and I was able to fix the problem by introducing an ImageProvider
. Let's see if this works for XML Worker.
Not all CSS styles are supported in XML Worker
I adapted my code like this:
My code is much longer, but now the image is rendered:
The image is larger than when I rendered it using HTMLWorker
which tells me that the CSS attribute width
for the poster
class is taken into account, but the float
attribute is ignored. How do I fix this?
So the question boils down to this: I have a specific HTML file that I try to convert to PDF. I have gone through a lot of work, fixing one problem after the other, but there is one specific problem that I can't solve: how do I make iText respect CSS that defines the position of an element, such as float: right
?
When my HTML contains form elements (such as <input>
), those form elements are ignored.
2 Answers
As explained in the introduction of the HTML to PDF tutorial, HTMLWorker
has been deprecated many years ago. It wasn't intended to convert complete HTML pages. It doesn't know that an HTML page has a <head>
and a <body>
section; it just parses all the content. It was meant to parse small HTML snippets, and you could define styles using the StyleSheet
class; real CSS wasn't supported.
Then came XML Worker. XML Worker was meant as a generic framework to parse XML. As a proof of concept, we decided to write some XHTML to PDF functionality, but we didn't support all of the HTML tags. For instance: forms weren't supported at all, and it was very hard to support CSS that is used to position content. Forms in HTML are very different from forms in PDF. There was also a mismatch between the iText architecture and the architecture of HTML + CSS. Gradually, we extended XML Worker, mostly based on requests from customers, but XML Worker became a monster with many tentacles.
Eventually, we decided to rewrite iText from scratch, with the requirements for HTML + CSS conversion in mind. This resulted in iText 7. On top of iText 7, we created several add-ons, the most important one in this context being pdfHTML.
Using the latest version of iText (iText 7.1.0 + pdfHTML 2.0.0) the code to convert the HTML from the question to PDF is reduced to this snippet:
The result looks like this:
As you can see, this is pretty much the result you'd expect. Since iText 7.1.0 / pdfHTML 2.0.0, the default font is Times-Roman. The CSS is being respected: the image is now floating on the right.
Developers often feel opposed to upgrade to a newer iText version when I give the advice to upgrade to iText 7 / pdfHTML 2. Allow me to answer to the top 3 of arguments I hear:
I need to use the free iText, and iText 7 isn't free / the pdfHTML add-on is closed source.
iText 7 is released using the AGPL, just like iText 5 and XML Worker. The AGPL allows free use in the sense of free of charge in the context of open source projects. If you are distributing a closed source / proprietary product (e.g. you use iText in a SaaS context), you can't use iText for free; in that case, you have to purchase a commercial license. This was already true for iText 5; this is still true for iText 7. As for versions prior to iText 5: you shouldn't use these at all. Regarding pdfHTML: the first versions were indeed only available as closed source software. We have had heavy discussion within iText Group: on the one hand, there were the people who wanted to avoid the massive abuse by companies who don't listen to their developers when those developers tell the powers that be that open source isn't the same as free. Developers were telling us that their boss forced them to do the wrong thing, and that they couldn't convince their boss to purchase a commercial license. On the other hand, there were the people who argued that we shouldn't punish developers for the wrong behavior of their bosses. Eventually, the people in favor of open sourcing pdfHTML, that is: the developers at iText, won the argument. Please prove that they weren't wrong, and use iText correctly: respect the AGPL if you're using iText for free; make sure that your boss purchases a commercial license if you're using iText in a closed source context.
I need to maintain a legacy system, and I have to use an old iText version.
Seriously? Maintenance also involves applying upgrades and migrating to new versions of the software you're using. As you can see, the code needed when using iText 7 and pdfHTML is very simple, and less error-prone than the code needed before. A migration project shouldn't take too long.
I've only just started and I didn't know about iText 7; I only found out after I finished my project.
That's why I'm posting this question and answer. Think of yourself as an eXtreme Programmer. Throw away all of your code, and start anew. You'll notice that it's not as much work as you imagined, and you'll sleep better knowing that you've made your project future-proof because iText 5 is being phased out. We still offer support to paying customers, but eventually, we'll stop supporting iText 5 altogether.
Bruno LowagieBruno Lowagieprotected by Bruno LowagieJun 29 '18 at 8:30
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?