Developer's Guide

PDF ChatGPT

Integrate ChatGPT API with .NET PDF applications

PDF Merger

Merge multiple PDF documents into a single file using C# .NET

PDF Optimizer

Reduce file sizes, rotate pages, crop content, and resize documents

PDF Security

Encrypt and decrypt PDF documents using C# .NET

PDF Signature

.NET plugin offers a streamlined process for adding signatures, ensuring authenticity, and securing PDF content

PDF Splitter

.NET tool that simplifies the process of splitting large PDF documents into smaller, more manageable files

PDF Text Extractor

.NET plugin allows you to extract text efficiently while preserving formatting or omitting it based on your needs

PDF Timestamp Adder

Add secure timestamps to your PDF documents using C# .NET

PDF to DOC Converter

.NET tool allows to convert PDF documents into DOC or DOCX formats

PDF to XLS Converter

.NET plugin allows seamless conversion of PDF documents into Excel spreadsheets (XLS/XLSX)

PDF/A Converter

.NET plugin converts PDF documents into the PDF/A format, ensuring that your content remains compliant with long-term archiving standards

HTML Converter

Comprehensive guide to Documentize HTML Converter's PDF to HTML and HTML to PDF features.

Sep 12, 2024

Subsections of Developer's Guide

PDF ChatGPT

The Documentize ChatGPT for .NET plugin is a powerful tool designed to integrate the ChatGPT API with PDF applications. This plugin allows developers to generate chat responses based on input messages and save the output in PDF format, making it suitable for creating conversational interfaces or analysis reports directly within PDF documents.

Key Features:

  • Chat Completions: Generate responses using the ChatGPT API based on custom input.
  • System & User Messages: Provide both system context and user input to create dynamic conversations.
  • PDF Output: Save generated chat completions in a structured PDF file for further use.
  • Asynchronous Processing: Ensure responsive applications by processing chat completions asynchronously.

Generate Chat Responses

To generate chat responses and save them to a PDF file using the ChatGPT plugin, follow these steps:

  1. Create an instance of the PdfChatGptRequestOptions class to configure the request options.
  2. Add input and output PDF files.
  3. Set the API key and specify parameters such as maximum token count and the query for the ChatGPT model.
  4. Run the ProcessAsync method to generate the chat completion.
 1var options = new PdfChatGptRequestOptions();
 2options.ApiKey = "sk-******";  // Set your API key
 3options.MaxTokens = 1000;  // Set the maximum number of tokens
 4options.Query = "Analyze this text for key themes.";
 5
 6// Add the input PDF file
 7options.AddInput(new FileDataSource("input.pdf"));
 8
 9// Specify where to save the output PDF with chat responses
10options.AddOutput(new FileDataSource("output.pdf"));
11
12// Create an instance of the PdfChatGpt plugin
13var plugin = new PdfChatGpt();
14
15// Run the process asynchronously
16var result = await plugin.ProcessAsync(options);

Adding System and User Messages

To create a more interactive conversation, you can add both system and user messages. These messages help shape the conversation context.

  1. Add a system message that sets the context for ChatGPT.
  2. Add a user message that represents the user’s input for the conversation.
 1var options = new PdfChatGptRequestOptions();
 2options.ApiKey = "sk-******";  // Set your API key
 3
 4// Add system message for context
 5options.AddSystemMessage("You are an AI trained to summarize text.");
 6
 7// Add user message to query the ChatGPT model
 8options.AddUserMessage("Please summarize the attached document.");
 9
10// Add input and output PDFs
11options.AddInput(new FileDataSource("input.pdf"));
12options.AddOutput(new FileDataSource("output.pdf"));
13
14// Process the request asynchronously
15var plugin = new PdfChatGpt();
16var result = await plugin.ProcessAsync(options);

PDF Merger

The Documentize PDF Merger for .NET is a versatile tool designed to merge multiple PDF documents into a single file. It simplifies the consolidation of PDF files, ensuring your documents are merged efficiently and maintaining consistency across content. The plugin handles internal resources such as fonts and images to optimize the merged document.

Key Features:

  • Merge Multiple PDFs: Easily combine multiple PDF files into one.
  • Resource Optimization: Removes duplicate fonts and images during merging.
  • Batch Processing: Merge large batches of PDF documents in one go.
  • Secure Merging: Ensure document integrity without data loss or content corruption.

How to Merge PDF Documents

To merge multiple PDF documents into a single file, follow these steps:

  1. Create an instance of the Merger class.
  2. Create an instance of MergeOptions to configure the merging process.
  3. Add input PDF files using the AddInput method.
  4. Set the output file path using AddOutput.
  5. Execute the merge using the Process method.
 1var merger = new Merger();
 2var mergeOptions = new MergeOptions();
 3
 4// Add input PDF files to merge
 5mergeOptions.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 6mergeOptions.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
 7mergeOptions.AddInput(new FileDataSource(@"C:\Samples\file3.pdf"));
 8
 9// Specify the output file path
10mergeOptions.AddOutput(new FileDataSource(@"C:\Samples\mergedOutput.pdf"));
11
12// Merge the PDFs
13merger.Process(mergeOptions);

How to Merge PDFs with Page Range

You can also merge specific page ranges from input PDF files using the MergeOptions class. This allows you to combine selected pages into the final output document.

  1. Create an instance of the Merger class.
  2. Configure page ranges using MergeOptions.
  3. Add the input files with specified page ranges.
  4. Set the output path.
  5. Call the Process method.
 1var merger = new Merger();
 2var mergeOptions = new MergeOptions();
 3
 4// Merge specific pages from input PDFs
 5mergeOptions.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"), new PageRange(1, 3));
 6mergeOptions.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"), new PageRange(2, 5));
 7
 8// Specify the output file path
 9mergeOptions.AddOutput(new FileDataSource(@"C:\Samples\outputWithSpecificPages.pdf"));
10
11// Merge the PDFs
12merger.Process(mergeOptions);

How to Handle Batch Merging

The PDF Merger plugin is optimized for handling large batches of PDF documents. By leveraging the batch processing feature, you can merge hundreds of PDFs in a single operation, ensuring efficient and fast document management.

  1. Instantiate the Merger class.
  2. Add all input PDF files to the MergeOptions class.
  3. Specify the output path.
  4. Call the Process method to merge all files in the batch.
 1var merger = new Merger();
 2var mergeOptions = new MergeOptions();
 3
 4// Add a large batch of PDFs for merging
 5for (int i = 1; i <= 100; i++)
 6{
 7    mergeOptions.AddInput(new FileDataSource($@"C:\Samples\file{i}.pdf"));
 8}
 9
10// Specify the output file path
11mergeOptions.AddOutput(new FileDataSource(@"C:\Samples\batchMergedOutput.pdf"));
12
13// Process the batch merging
14merger.Process(mergeOptions);

PDF Optimizer

The Documentize PDF Optimizer is a comprehensive plugin that enhances PDF documents through advanced optimization techniques. It is designed to help reduce file sizes, rotate pages, crop content, and resize documents. These operations improve the quality and manageability of PDF files, making them easier to store, share, and view.

Key Features:

  • Optimization: Reduce PDF file size without losing quality.
  • Rotation: Adjust the orientation of PDF pages.
  • Cropping: Remove unnecessary margins or content from the document.
  • Resizing: Resize pages to specific dimensions (e.g., A4, Letter).

Optimize PDF Document

The following steps demonstrate how to optimize a PDF document by reducing its file size while maintaining quality.

  1. Create an instance of the Optimizer class.
  2. Create an OptimizeOptions object to configure optimization settings.
  3. Add input PDF file(s) and set an output location for the optimized file.
  4. Run the Process method to execute the optimization.
1var optimizer = new Optimizer();
2var optimizeOptions = new OptimizeOptions();
3optimizeOptions.AddInput(new FileDataSource("input.pdf"));
4optimizeOptions.AddOutput(new FileDataSource("output.pdf"));
5optimizer.Process(optimizeOptions);

Resize PDF Document

To resize a PDF document, the ResizeOptions class is used to specify the new page size for the document.

  1. Instantiate the Optimizer class.
  2. Create a ResizeOptions object to define the page size.
  3. Add the input file and set the desired output location.
  4. Use the SetPageSize method to specify the new size (e.g., A4).
  5. Call the Process method to apply the changes.
1var optimizer = new Optimizer();
2var resizeOptions = new ResizeOptions();
3resizeOptions.AddInput(new FileDataSource("input.pdf"));
4resizeOptions.SetPageSize(PageSize.A4);
5resizeOptions.AddOutput(new FileDataSource("output.pdf"));
6optimizer.Process(resizeOptions);

Rotate PDF Pages

Use the RotateOptions class to adjust the orientation of pages in a PDF file.

  1. Instantiate the Optimizer class.
  2. Create a RotateOptions object and configure the rotation angle.
  3. Add the input PDF file and specify the output file location.
  4. Set the rotation angle (e.g., 90 degrees) using the SetRotation method.
  5. Execute the rotation with the Process method.
1var optimizer = new Optimizer();
2var rotateOptions = new RotateOptions();
3rotateOptions.AddInput(new FileDataSource("input.pdf"));
4rotateOptions.SetRotation(90);
5rotateOptions.AddOutput(new FileDataSource("output.pdf"));
6optimizer.Process(rotateOptions);

Crop PDF Document

Cropping removes unwanted content or margins from a PDF document. The CropOptions class can be used to define the crop area.

  1. Create an instance of the Optimizer class.
  2. Define the crop area with the CropOptions object.
  3. Add the input file and specify the output file location.
  4. Use the SetCropBox method to define the crop area.
  5. Execute the cropping with the Process method.
1var optimizer = new Optimizer();
2var cropOptions = new CropOptions();
3cropOptions.AddInput(new FileDataSource("input.pdf"));
4cropOptions.SetCropBox(new Rectangle(50, 50, 500, 700)); // Defines the crop area
5cropOptions.AddOutput(new FileDataSource("output.pdf"));
6optimizer.Process(cropOptions);

PDF Security

The Documentize PDF Security for .NET is a powerful tool designed to enhance the security of your PDF documents by providing encryption and decryption capabilities. It ensures that your sensitive information remains confidential and protected from unauthorized access.

Key Features:

  • Encrypt PDF Documents: Secure your PDF files by adding user and owner passwords.
  • Decrypt PDF Documents: Remove encryption from PDFs when needed.
  • Set Permissions: Control permissions such as printing, copying, and modifying content.
  • Automation: Integrate encryption and decryption into your .NET applications for automated workflows.
  • Compliance: Ensure your documents meet industry standards for document security.

How to Encrypt a PDF Document

To encrypt a PDF document, follow these steps:

  1. Create an instance of the Security class.
  2. Create an instance of EncryptionOptions with the desired user and owner passwords.
  3. Add the input PDF file using the AddInput method.
  4. Set the output file path using AddOutput.
  5. Execute the encryption using the Process method.
 1// Instantiate the Security plugin
 2var plugin = new Security();
 3
 4// Configure the encryption options
 5var opt = new EncryptionOptions("user_password", "owner_password");
 6
 7// Add input PDF file
 8opt.AddInput(new FileDataSource("path_to_pdf"));
 9
10// Specify the output encrypted PDF file
11opt.AddOutput(new FileDataSource("path_to_encrypted_pdf"));
12
13// Perform the encryption process
14plugin.Process(opt);

How to Decrypt a PDF Document

To decrypt a PDF document, follow these steps:

  1. Create an instance of the Security class.
  2. Create an instance of DecryptionOptions with the necessary password.
  3. Add the encrypted PDF file using the AddInput method.
  4. Set the output file path using AddOutput.
  5. Execute the decryption using the Process method.
 1// Instantiate the Security plugin
 2var plugin = new Security();
 3
 4// Configure the decryption options
 5var opt = new DecryptionOptions("user_password");
 6
 7// Add input encrypted PDF file
 8opt.AddInput(new FileDataSource("path_to_encrypted_pdf"));
 9
10// Specify the output decrypted PDF file
11opt.AddOutput(new FileDataSource("path_to_decrypted_pdf"));
12
13// Perform the decryption process
14plugin.Process(opt);

Setting Permissions on PDF Documents

When encrypting a PDF, you can set various permissions to control how the document can be used.

  • Printing: Allow or disallow printing of the document.
  • Copying: Allow or disallow copying of content.
  • Modifying: Allow or disallow modifications to the document.

To set permissions, you can configure the EncryptionOptions accordingly.

PDF Signature

The Documentize PDF Signature for .NET plugin allows users to digitally sign PDF documents. It offers a streamlined process for adding signatures, ensuring authenticity, and securing PDF content. The plugin supports both visible and invisible signatures and provides options to customize the signature’s position, reason, contact information, and more.

Key Features:

  • Digitally Sign PDF Documents: Secure your documents with visible or invisible digital signatures.
  • PFX Support: Sign PDF files using a PFX certificate.
  • Customizable Options: Configure signature settings like reason, location, and contact details.
  • Visible and Invisible Signatures: Choose whether the signature is visible on the document.

How to Sign PDF Documents

To sign a PDF document using a PFX file, follow these steps:

  1. Create an instance of the Signature class.
  2. Instantiate the SignOptions class with the PFX file path and password.
  3. Add the input PDF and the output file to the options.
  4. Run the Process method to apply the signature.
 1var signature = new Signature();
 2var signOptions = new SignOptions(@"C:\certificates\myCertificate.pfx", "pfxPassword");
 3
 4// Add the input PDF and specify the output file
 5signOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 6signOptions.AddOutput(new FileDataSource(@"C:\Samples\signedOutput.pdf"));
 7
 8// Configure signature options
 9signOptions.Reason = "Contract Agreement";
10signOptions.Contact = "johndoe@example.com";
11signOptions.Location = "New York";
12signOptions.PageNumber = 1;
13signOptions.Visible = true;
14signOptions.Rectangle = new Rectangle(100, 100, 200, 150);
15
16// Apply the signature to the document
17signature.Process(signOptions);

How to Use Stream for PFX File

You can also sign a PDF using a PFX certificate provided as a stream instead of a file path. This allows more flexible handling of certificate storage.

  1. Create an instance of the Signature class.
  2. Instantiate SignOptions with a stream containing the PFX and the password.
  3. Add the input and output files.
  4. Run the Process method to apply the signature.
 1using var pfxStream = File.OpenRead(@"C:\certificates\myCertificate.pfx");
 2var signature = new Signature();
 3var signOptions = new SignOptions(pfxStream, "pfxPassword");
 4
 5// Add input and output files
 6signOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 7signOptions.AddOutput(new FileDataSource(@"C:\Samples\signedOutput.pdf"));
 8
 9// Apply signature
10signature.Process(signOptions);

How to Apply Invisible Signatures

To add an invisible signature (one that secures the document without displaying the signature on the document), simply set the Visible property to false.

  1. Create an instance of SignOptions.
  2. Set Visible to false.
  3. Add input and output files.
  4. Call Process to apply the invisible signature.
 1var signature = new Signature();
 2var signOptions = new SignOptions(@"C:\certificates\myCertificate.pfx", "pfxPassword");
 3
 4// Configure invisible signature
 5signOptions.Visible = false;
 6
 7// Add input and output files
 8signOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 9signOptions.AddOutput(new FileDataSource(@"C:\Samples\invisiblySigned.pdf"));
10
11// Process signature
12signature.Process(signOptions);

PDF Splitter

The Documentize PDF Splitter for .NET is a powerful tool that simplifies the process of splitting large PDF documents into smaller, more manageable files. Whether you need to extract individual pages or divide a document into specific sections, this plugin allows you to achieve it efficiently and with minimal effort.

Key Features:

  • Split PDF by Page: Break down a PDF document into individual pages.
  • Batch Processing: Split large batches of PDFs in one go.
  • Custom Split Options: Configure the splitting process based on your requirements.
  • Organized Output: Easily manage the output files for each split page or section.

How to Split PDF Documents

To split a PDF document into individual pages, follow these steps:

  1. Create an instance of the Splitter class.
  2. Create an instance of SplitOptions to configure the splitting options.
  3. Add the input PDF file using the AddInput method.
  4. Add output files for each split page using the AddOutput method.
  5. Run the Process method to split the document.
 1var splitter = new Splitter();
 2var splitOptions = new SplitOptions();
 3
 4// Add the input PDF file
 5splitOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 6
 7// Specify output files for each page
 8splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_page_1.pdf"));
 9splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_page_2.pdf"));
10splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_page_3.pdf"));
11
12// Process the split operation
13splitter.Process(splitOptions);

Splitting PDF by Page Ranges

You can also split a PDF by specifying page ranges. This allows you to extract specific sections or multiple pages from a PDF into separate documents.

 1var splitter = new Splitter();
 2var splitOptions = new SplitOptions();
 3
 4// Add the input PDF
 5splitOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 6
 7// Define output for page ranges (e.g., pages 1-3)
 8splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_pages_1_to_3.pdf"));
 9
10// Process the split
11splitter.Process(splitOptions);

How to Handle Batch Splitting

The PDF Splitter plugin is optimized to handle large batches of PDF documents. You can split hundreds of PDFs into individual pages or sections by leveraging batch processing.

 1var splitter = new Splitter();
 2var splitOptions = new SplitOptions();
 3
 4// Add input PDF files in a batch
 5splitOptions.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 6splitOptions.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
 7
 8// Define the output for each file
 9splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_file1_page1.pdf"));
10splitOptions.AddOutput(new FileDataSource(@"C:\Samples\output_file2_page1.pdf"));
11
12// Process the batch split
13splitter.Process(splitOptions);

PDF Text Extractor

The Documentize PDF Text Extractor for .NET simplifies extracting text from PDF documents. Whether you need pure, raw, or plain text, this plugin allows you to extract text efficiently while preserving formatting or omitting it based on your needs.

Key Features:

  • Pure Mode: Extract text while preserving its original formatting.
  • Raw Mode: Extract text without any formatting.
  • Plain Mode: Extract text without special characters or formatting.
  • Batch Processing: Extract text from multiple PDFs at once.

How to Extract Text from PDF Documents

To extract text from a PDF document, follow these steps:

  1. Create an instance of the TextExtractor class.
  2. Create an instance of TextExtractorOptions to configure the extraction options.
  3. Add the input PDF file using the AddInput method.
  4. Run the Process method to extract the text.
  5. Access the extracted text using the ResultContainer.ResultCollection.
 1using var extractor = new TextExtractor();
 2var textExtractorOptions = new TextExtractorOptions();
 3
 4// Add the input PDF
 5textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 6
 7// Process the text extraction
 8var resultContainer = extractor.Process(textExtractorOptions);
 9
10// Print the extracted text
11var extractedText = resultContainer.ResultCollection[0];
12Console.WriteLine(extractedText);

Extracting Text from Multiple PDFs

The plugin allows you to extract text from multiple PDFs simultaneously, ensuring quick and efficient processing.

 1using var extractor = new TextExtractor();
 2var textExtractorOptions = new TextExtractorOptions();
 3
 4// Add multiple input PDFs
 5textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\input1.pdf"));
 6textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\input2.pdf"));
 7
 8// Process the extraction
 9var resultContainer = extractor.Process(textExtractorOptions);
10
11// Output the extracted text
12foreach (var result in resultContainer.ResultCollection)
13{
14    Console.WriteLine(result);
15}

Text Extraction Modes

The TextExtractor plugin offers three extraction modes, providing flexibility based on your needs.

  1. Pure Mode: Preserves the original formatting, including spaces and alignment.
  2. Raw Mode: Extracts the text without formatting, useful for raw data processing.
  3. Plain Mode: Extracts text without special characters or additional formatting.
1var textExtractorOptions = new TextExtractorOptions();
2
3// Set to Pure mode
4textExtractorOptions.Mode = ExtractionMode.Pure;
5textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
6
7// Process and output
8var resultContainer = extractor.Process(textExtractorOptions);
9Console.WriteLine(resultContainer.ResultCollection[0]);

How to Handle Batch Processing

For large document sets, you can leverage batch processing, enabling you to extract text from multiple PDFs at once.

 1using var extractor = new TextExtractor();
 2var textExtractorOptions = new TextExtractorOptions();
 3
 4// Add multiple input PDFs
 5textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\batch1.pdf"));
 6textExtractorOptions.AddInput(new FileDataSource(@"C:\Samples\batch2.pdf"));
 7
 8// Define output for each file
 9var resultContainer = extractor.Process(textExtractorOptions);
10
11// Handle extracted text
12foreach (var result in resultContainer.ResultCollection)
13{
14    Console.WriteLine(result);
15}

PDF Timestamp Adder

The Documentize PDF Timestamp Adder for .NET is a powerful tool designed to add secure timestamps to your PDF documents. It enhances the integrity and authenticity of your documents by providing a trusted time reference, ensuring compliance with digital signature standards.

Key Features:

  • Add Secure Timestamps: Effortlessly add secure timestamps to your PDF documents.
  • Customizable Timestamp Servers: Use custom timestamp server URLs and authentication credentials.
  • Automation: Integrate timestamping into your .NET applications for automated workflows.
  • Compliance: Ensure your documents meet industry standards for digital signatures and timestamps.

How to Add a Timestamp to PDF Documents

To add a secure timestamp to a PDF document, follow these steps:

  1. Create an instance of the Timestamp class.
  2. Create an instance of AddTimestampOptions to configure the timestamping process.
  3. Add the input PDF file using the AddInput method.
  4. Set the output file path using AddOutput.
  5. Execute the timestamping using the Process method.
 1// Instantiate the Timestamp plugin
 2var plugin = new Timestamp();
 3
 4// Configure the timestamping options
 5var opt = new AddTimestampOptions("path_to_pfx", "password_for_pfx", "timestamp_server_url");
 6
 7// Add input PDF file
 8opt.AddInput(new FileDataSource("path_to_pdf"));
 9
10// Specify the output PDF file
11opt.AddOutput(new FileDataSource("path_to_result_pdf"));
12
13// Perform the timestamping process
14plugin.Process(opt);

How to Use Custom Authentication with Timestamp Server

You can provide basic authentication credentials when connecting to the timestamp server. This allows you to authenticate with servers that require a username and password.

  1. Create an instance of the Timestamp class.
  2. Create an instance of AddTimestampOptions, including the serverBasicAuthCredentials.
  3. Add the input file and output file paths.
  4. Call the Process method.
 1// Instantiate the Timestamp plugin
 2var plugin = new Timestamp();
 3
 4// Configure the timestamping options with authentication
 5var opt = new AddTimestampOptions("path_to_pfx", "password_for_pfx", "timestamp_server_url", "username:password");
 6
 7// Add input PDF file
 8opt.AddInput(new FileDataSource("path_to_pdf"));
 9
10// Specify the output PDF file
11opt.AddOutput(new FileDataSource("path_to_result_pdf"));
12
13// Perform the timestamping process
14plugin.Process(opt);

Handling PFX Files and Passwords

The AddTimestampOptions class allows you to use a PFX file for digital signing along with the password.

  • PFX Stream or File Path: You can provide a stream or file path to the PFX file.
  • Password Protection: Ensure you securely manage the password for the PFX file.

PDF to DOC Converter

The Documentize PDF to DOC Converter for .NET is a powerful tool designed to convert PDF documents into DOC or DOCX formats. This plugin seamlessly transforms PDF pages into editable Microsoft Word documents, making it easy to reuse, edit, and share content across multiple platforms.

Key Features:

  • DOC/DOCX Conversion: Convert PDF documents to editable Microsoft Word formats (DOC or DOCX).
  • Maintain Formatting: Retain the original layout, text, and formatting during the conversion process.
  • Batch Processing: Convert multiple PDF files at once.
  • Custom Conversion Options: Fine-tune the conversion process with different modes, like Enhanced Flow, for better layout.

How to Convert PDF to DOC/DOCX

To convert a PDF document to DOC/DOCX format, follow these steps:

  1. Create an instance of the PdfDoc class.
  2. Create an instance of PdfToDocOptions to configure the conversion process.
  3. Add the input PDF file using the AddInput method.
  4. Add the output file path for the resulting DOC/DOCX file using the AddOutput method.
  5. Run the Process method to execute the conversion.
 1var pdfToWord = new PdfDoc();
 2var options = new PdfToDocOptions()
 3{
 4    SaveFormat = SaveFormat.DocX,       // Output format as DOCX
 5    ConversionMode = ConversionMode.EnhancedFlow // Optimize layout and formatting
 6};
 7
 8// Add the input PDF file
 9options.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
10
11// Add the output Word document path
12options.AddOutput(new FileDataSource(@"C:\Samples\output.docx"));
13
14// Process the conversion
15pdfToWord.Process(options);

Converting PDF to DOC with Custom Options

The PDF to DOC Converter plugin provides several options to customize your conversion process. You can choose between different modes to control how the layout and structure of the PDF are handled during conversion.

 1var pdfToWord = new PdfDoc();
 2var options = new PdfToDocOptions()
 3{
 4    SaveFormat = SaveFormat.Doc,        // Output format as DOC
 5    ConversionMode = ConversionMode.Precise // Maintain original PDF layout as closely as possible
 6};
 7
 8// Add the input PDF file
 9options.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
10
11// Add the output Word document path
12options.AddOutput(new FileDataSource(@"C:\Samples\output.doc"));
13
14// Process the conversion
15pdfToWord.Process(options);

Batch Processing PDF to DOC/DOCX Conversion

The PDF to DOC Converter supports batch processing, allowing you to convert multiple PDF files at once. Here’s an example of batch conversion:

 1var pdfToWord = new PdfDoc();
 2var options = new PdfToDocOptions()
 3{
 4    SaveFormat = SaveFormat.DocX
 5};
 6
 7// Add multiple input PDF files
 8options.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 9options.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
10
11// Add output file paths for the resulting DOCX files
12options.AddOutput(new FileDataSource(@"C:\Samples\output_file1.docx"));
13options.AddOutput(new FileDataSource(@"C:\Samples\output_file2.docx"));
14
15// Process the batch conversion
16pdfToWord.Process(options);

PDF to XLS Converter

The Documentize PDF to XLS Converter for .NET is a powerful tool that allows seamless conversion of PDF documents into Excel spreadsheets (XLS/XLSX). This plugin enhances the accessibility and usability of your PDF content, making it easy to manipulate and analyze data in spreadsheet format.

Key Features:

  • Convert PDF to Excel: Transform PDF files into XLS/XLSX spreadsheets for easy data management.
  • Custom Output Options: Configure the output format, page range, worksheet name, and more.
  • High-Fidelity Conversion: Retain layout, formatting, and content accuracy during conversion.
  • Batch Processing: Convert multiple PDF files in one go for large-scale operations.

How to Convert PDF to XLS

To convert a PDF document into an Excel file (XLS/XLSX), follow these steps:

  1. Create an instance of the PdfXls class.
  2. Create an instance of PdfToXlsOptions to configure the conversion settings.
  3. Add the input PDF file using the AddInput method.
  4. Specify the output Excel file using the AddOutput method.
  5. Run the Process method to initiate the conversion.
1var pdfXlsConverter = new PdfXls();
2var options = new PdfToXlsOptions();
3
4// Add input and output file paths
5options.AddInput(new FileDataSource(@"C:\Samples\sample.pdf"));
6options.AddOutput(new FileDataSource(@"C:\Samples\output.xlsx"));
7
8// Run the conversion process
9pdfXlsConverter.Process(options);

Customizing the PDF to Excel Conversion

You can customize the conversion settings by modifying the PdfToXlsOptions class. For instance, to convert the PDF to an XLSX format, insert a blank column, and name the worksheet, you can use the following code:

 1var options = new PdfToXlsOptions();
 2
 3// Set the output format to XLSX
 4options.Format = PdfToXlsOptions.ExcelFormat.XLSX;
 5
 6// Insert a blank column at the first position
 7options.InsertBlankColumnAtFirst = true;
 8
 9// Set the worksheet name
10options.WorksheetName = "MySheet";
11
12// Add input and output files
13options.AddInput(new FileDataSource(@"C:\Samples\sample.pdf"));
14options.AddOutput(new FileDataSource(@"C:\Samples\output.xlsx"));
15
16// Process the conversion
17pdfXlsConverter.Process(options);

Handling Conversion Results

After processing, the Process method returns a ResultContainer object that holds the result of the conversion. You can retrieve the converted file path or other output details:

1var resultContainer = pdfXlsConverter.Process(options);
2
3// Access and print the result file path
4var result = resultContainer.ResultCollection[0];
5Console.WriteLine(result);

Batch Processing for PDF to XLS Conversion

The PDF to XLS Converter plugin also supports batch processing, enabling the conversion of multiple PDF files at once.

 1var pdfXlsConverter = new PdfXls();
 2var options = new PdfToXlsOptions();
 3
 4// Add multiple input PDFs
 5options.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 6options.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
 7
 8// Add the output Excel files
 9options.AddOutput(new FileDataSource(@"C:\Samples\output1.xlsx"));
10options.AddOutput(new FileDataSource(@"C:\Samples\output2.xlsx"));
11
12// Process the batch conversion
13pdfXlsConverter.Process(options);

PDF/A Converter

The Documentize PDF/A Converter for .NET is a powerful tool designed to convert PDF documents into the PDF/A format, ensuring that your content remains compliant with long-term archiving standards. This plugin also supports validating existing PDF documents for PDF/A compliance, offering both conversion and validation features in a single solution.

Key Features:

  • Convert to PDF/A: Seamlessly transform PDF files into the PDF/A format (such as PDF/A-1a, PDF/A-2b, PDF/A-3b) to ensure compliance with archiving standards.
  • Validate PDF/A Compliance: Check existing PDF documents for conformance with PDF/A standards and identify issues if they do not comply.
  • Batch Processing: Process multiple files at once for conversion or validation.
  • Efficient Workflow: Minimize time and effort with fast and reliable conversion processes.

How to Convert PDF to PDF/A

To convert a PDF document into PDF/A format, follow these steps:

  1. Create an instance of the PdfAConverter class.
  2. Create an instance of PdfAConvertOptions to configure the conversion.
  3. Specify the desired PDF/A version (e.g., PDF/A-3B).
  4. Add the input PDF file using the AddInput method.
  5. Add the output file for the resulting PDF/A using the AddOutput method.
  6. Call the Process method to execute the conversion.
 1var pdfAConverter = new PdfAConverter();
 2var pdfAOptions = new PdfAConvertOptions
 3{
 4    PdfAVersion = PdfAStandardVersion.PDF_A_3B
 5};
 6
 7// Add the input PDF file
 8pdfAOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 9
10// Specify the output PDF/A file
11pdfAOptions.AddOutput(new FileDataSource(@"C:\Samples\output_pdfa.pdf"));
12
13// Process the conversion
14pdfAConverter.Process(pdfAOptions);

Validating PDF/A Compliance

You can validate existing PDF files for PDF/A compliance using the PdfAValidateOptions class.

 1var pdfAConverter = new PdfAConverter();
 2var validationOptions = new PdfAValidateOptions
 3{
 4    PdfAVersion = PdfAStandardVersion.PDF_A_1A
 5};
 6
 7// Add the PDF file to be validated
 8validationOptions.AddInput(new FileDataSource(@"C:\Samples\input.pdf"));
 9
10// Run the validation process
11var resultContainer = pdfAConverter.Process(validationOptions);
12
13// Check the validation result
14var validationResult = (PdfAValidationResult)resultContainer.ResultCollection[0].Data;
15Console.WriteLine("PDF/A Validation Passed: " + validationResult.IsValid);

Batch Processing for PDF/A Conversion

This plugin supports batch processing, allowing you to convert or validate multiple PDF files for PDF/A compliance at once.

 1var pdfAConverter = new PdfAConverter();
 2var pdfAOptions = new PdfAConvertOptions
 3{
 4    PdfAVersion = PdfAStandardVersion.PDF_A_3B
 5};
 6
 7// Add multiple input PDFs
 8pdfAOptions.AddInput(new FileDataSource(@"C:\Samples\file1.pdf"));
 9pdfAOptions.AddInput(new FileDataSource(@"C:\Samples\file2.pdf"));
10
11// Specify output files for the converted PDF/As
12pdfAOptions.AddOutput(new FileDataSource(@"C:\Samples\file1_pdfa.pdf"));
13pdfAOptions.AddOutput(new FileDataSource(@"C:\Samples\file2_pdfa.pdf"));
14
15// Process the batch conversion
16pdfAConverter.Process(pdfAOptions);

HTML Converter

The Documentize HTML Converter for .NET provides robust capabilities for converting documents between PDF and HTML formats, ideal for web applications, archiving, and report generation. With multiple options for handling resources and layouts, the converter adapts to various project requirements.

Key Features

PDF to HTML Conversion

Convert PDF files to HTML to make documents accessible for web-based viewing or integration into applications where HTML format is preferred.

HTML to PDF Conversion

Transform HTML content into high-quality PDFs, perfect for generating printable reports, archiving web content, or creating shareable document formats.


Detailed Guide

Converting PDF to HTML

To convert a PDF to HTML:

  1. Initialize the Converter: Create an instance of HtmlConverter.
  2. Set Conversion Options: Use PdfToHtmlOptions to customize output, choosing either embedded or external resources.
  3. Define Input and Output Paths: Set the paths for your input PDF and output HTML.
  4. Execute the Conversion: Call the Process method to convert the file.

Example: Convert PDF to HTML with Embedded Resources

// Step 1: Initialize the HTML Converter
var converter = new HtmlConverter();

// Step 2: Configure options for PDF to HTML conversion
var options = new PdfToHtmlOptions(PdfToHtmlOptions.SaveDataType.FileWithEmbeddedResources);

// Step 3: Set file paths
options.AddInput(new FileDataSource("input.pdf"));
options.AddOutput(new FileDataSource("output.html"));

// Step 4: Run the conversion
converter.Process(options);

Available Options for PDF to HTML Conversion

  • SaveDataType:

    • FileWithEmbeddedResources: Generates a single HTML file with all resources embedded.
    • FileWithExternalResources: Saves resources separately, ideal for large HTML files.
  • Output Customization:

    • BasePath: Set the base path for resources in the HTML document.
    • IsRenderToSinglePage: Optionally render all PDF content on a single HTML page.

Converting HTML to PDF

To convert an HTML document to a PDF, follow these steps:

  1. Initialize the Converter: Create an instance of the HtmlConverter.
  2. Configure PDF Options: Use HtmlToPdfOptions to define layout and media settings.
  3. Specify Paths: Set input HTML and output PDF file paths.
  4. Execute the Conversion: Run the Process method to complete the conversion.

Example: Convert HTML to PDF

// Step 1: Initialize the HTML Converter
var converter = new HtmlConverter();

// Step 2: Configure options for HTML to PDF conversion
var options = new HtmlToPdfOptions();

// Step 3: Set file paths
options.AddInput(new FileDataSource("input.html"));
options.AddOutput(new FileDataSource("output.pdf"));

// Step 4: Execute the conversion
converter.Process(options);

Additional Options for HTML to PDF Conversion

  • Media Type:

    • HtmlMediaType.Print: Ideal for generating PDFs suited for printing.
    • HtmlMediaType.Screen: Use when converting content designed for digital viewing.
  • Layout Adjustments:

    • PageLayoutOption: Adjusts how HTML content fits the PDF layout, like ScaleToPageWidth to ensure the content scales to the PDF width.
    • IsRenderToSinglePage: Enables rendering the entire HTML content on a single PDF page if needed for concise presentations.

This converter is versatile for a variety of applications, from generating PDF reports based on web content to converting archives of PDF documents for web-based accessibility. For more advanced configurations, refer to the full Documentize documentation.

 English