HTML Converter

The Documentize HTML Converter for .NET provides robust capabilities for converting documents between PDF and HTML formats, ideal for web applications, archiving, and report generation. With multiple options for handling resources and layouts, the converter adapts to various project requirements.

Key Features

PDF to HTML Conversion

Convert PDF files to HTML to make documents accessible for web-based viewing or integration into applications where HTML format is preferred.

HTML to PDF Conversion

Transform HTML content into high-quality PDFs, perfect for generating printable reports, archiving web content, or creating shareable document formats.


Detailed Guide

Converting PDF to HTML

To convert a PDF to HTML:

  1. Initialize the Converter: Create an instance of HtmlConverter.
  2. Set Conversion Options: Use PdfToHtmlOptions to customize output, choosing either embedded or external resources.
  3. Define Input and Output Paths: Set the paths for your input PDF and output HTML.
  4. Execute the Conversion: Call the Process method to convert the file.

Example: Convert PDF to HTML with Embedded Resources

// Step 1: Initialize the HTML Converter
var converter = new HtmlConverter();

// Step 2: Configure options for PDF to HTML conversion
var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);

// Step 3: Set file paths
options.AddInput(new FileDataSource("input.pdf"));
options.AddOutput(new FileDataSource("output.html"));

// Step 4: Run the conversion
converter.Process(options);

Available Options for PDF to HTML Conversion

  • SaveDataType:

    • FileWithEmbeddedResources: Generates a single HTML file with all resources embedded.
    • FileWithExternalResources: Saves resources separately, ideal for large HTML files.
  • Output Customization:

    • BasePath: Set the base path for resources in the HTML document.
    • IsRenderToSinglePage: Optionally render all PDF content on a single HTML page.

Converting HTML to PDF

To convert an HTML document to a PDF, follow these steps:

  1. Initialize the Converter: Create an instance of the HtmlConverter.
  2. Configure PDF Options: Use HtmlToPdfOptions to define layout and media settings.
  3. Specify Paths: Set input HTML and output PDF file paths.
  4. Execute the Conversion: Run the Process method to complete the conversion.

Example: Convert HTML to PDF

// Step 1: Initialize the HTML Converter
var converter = new HtmlConverter();

// Step 2: Configure options for HTML to PDF conversion
var options = new HtmlToPdfOptions();

// Step 3: Set file paths
options.AddInput(new FileDataSource("input.html"));
options.AddOutput(new FileDataSource("output.pdf"));

// Step 4: Execute the conversion
converter.Process(options);

Additional Options for HTML to PDF Conversion

  • Media Type:

    • HtmlMediaType.Print: Ideal for generating PDFs suited for printing.
    • HtmlMediaType.Screen: Use when converting content designed for digital viewing.
  • Layout Adjustments:

    • PageLayoutOption: Adjusts how HTML content fits the PDF layout, like ScaleToPageWidth to ensure the content scales to the PDF width.
    • IsRenderToSinglePage: Enables rendering the entire HTML content on a single PDF page if needed for concise presentations.

This converter is versatile for a variety of applications, from generating PDF reports based on web content to converting archives of PDF documents for web-based accessibility. For more advanced configurations, refer to the full Documentize documentation.