PDF to HTML Converter

The Documentize PDF to HTML Converter for .NET is a dynamic tool that simplifies the conversion of PDF documents into HTML format. This plugin is designed not just for simple file format changes but also for enhancing accessibility, making documents more user-friendly and adaptable to web environments.

How to Convert PDF to HTML

To convert a PDF document into HTML, follow these steps:

  1. Create an instance of the PdfHtml class.
  2. Create an instance of the PdfToHtmlOptions class to configure the conversion options.
  3. Add the input PDF file using the AddInput method.
  4. Add the output HTML file path using the AddOutput method.
  5. Call the Process method to convert the PDF to HTML.
1var pdfHtml = new PdfHtml();
2var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);
3
4// Set input and output file paths
5options.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
6options.AddOutput(new FileDataSource(@"C:\Samples\output.html"));
7
8// Process the PDF to HTML conversion
9pdfHtml.Process(options);

How to Convert HTML to PDF

The PDF to HTML Converter also supports converting HTML files back into PDF format, allowing for full bidirectional conversion.

1var pdfHtml = new PdfHtml();
2var options = new HtmlToPdfOptions();
3
4// Set input and output file paths
5options.AddInput(new FileDataSource(@"C:\Samples\input.html"));
6options.AddOutput(new FileDataSource(@"C:\Samples\output.pdf"));
7
8// Process the HTML to PDF conversion
9pdfHtml.Process(options);

Customizing PDF to HTML Conversion

You can customize the conversion process by specifying encoding, fonts, or other settings. Here’s an example of setting UTF-8 encoding and the Arial font for the conversion:

 1var pdfHtml = new PdfHtml();
 2var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);
 3
 4// Set encoding and font
 5options.Encoding = Encoding.UTF8;
 6options.Font = "Arial";
 7
 8// Add input and output files
 9options.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
10options.AddOutput(new FileDataSource(@"C:\Samples\output.html"));
11
12// Process the conversion
13pdfHtml.Process(options);

Batch Conversion from PDF to HTML

This plugin also supports batch processing, enabling you to convert multiple PDFs into HTML files in one go.

 1var pdfHtml = new PdfHtml();
 2var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);
 3
 4// Add multiple input PDF files
 5options.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 6options.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
 7
 8// Set output file paths for each conversion
 9options.AddOutput(new FileDataSource(@"C:\Samples\output_file1.html"));
10options.AddOutput(new FileDataSource(@"C:\Samples\output_file2.html"));
11
12// Process the batch conversion
13pdfHtml.Process(options);

Key Features:

  • Convert PDF to HTML: Seamlessly convert PDF documents into fully functional HTML files.
  • Embedded Resources: Choose whether to embed resources (such as images and fonts) directly into the HTML or link them externally.
  • Bidirectional Conversion: Convert PDFs to HTML and vice versa with full support for both directions.
  • Maintain Layout: Ensure that the original layout and formatting are preserved during conversion.
  • Custom Encoding: Specify the encoding format such as UTF-8 for precise text rendering in the converted HTML.