Add tags to existing PDF

This article demonstrates how to open an existing PDF document, read the visual content and tag the visual content based on its type and content. In particular, text shapes are tagged as a H1 if the text equals “Creating tagged PDF” and as Span otherwise. Image shapes are tagged as Figure. This code sample is included in the evaluation download.

Here is the end result:

Tagged Pdf

First we open an existing PDF document, and extract the visual content as shapes:

using (FileStream fs = new FileStream("NotTagged.pdf", FileMode.Open))
{
  Document document = new Document(fs);
  Page sourcePage = document.Pages[0];
  ShapeCollection shapes = sourcePage.CreateShapes();
 
  ...
}

Next we create a new tagged document and setup the root hierarchy:

// create a new tagged document and setup the root hierarchy
Document taggedDocument = new Document();
taggedDocument.LogicalStructure = new LogicalStructure();
Tag documentTag = new Tag("Document", taggedDocument.LogicalStructure.RootTag);
Tag paragraphTag = new Tag("P", documentTag);

// copy the visual content to the new document
taggedDocument.Pages.Add(new Page(sourcePage.Width, sourcePage.Height));
taggedDocument.Pages[0].Overlay.Add(shapes);

SetTag(shapes, paragraphTag);

Method SetTag enumerates the shapes and tags them accordingly as follows:

static void SetTag(Shape shape, Tag paragraphTag)
{
  // tag text
  TextShape textShape = shape as TextShape;
  if (null != textShape)
  {
    if (textShape.Text == "Creating tagged PDF")
      textShape.ParentTag = new Tag("H1", paragraphTag);
    else
      textShape.ParentTag = new Tag("Span", paragraphTag);

    return;
  }

  // txt images
  ImageShape imageShape = shape as ImageShape;
  if (null != imageShape)
  {
    imageShape.ParentTag = new Tag("Figure", paragraphTag);
    return;
  }

  // recurse
  ShapeCollection shapeCollection = shape as ShapeCollection;
  if (null != shapeCollection)
  {
   
    foreach(Shape item in shapeCollection)
    {
      SetTag(item, paragraphTag);
    }
  }
}
Download PDFKit.NET 5.0
We will send you a download link
Why do we ask your email address?
We send tips that speed up your evaluation
We let you know about bug fixes
You can always unsubscribe with one click
We never share your address with a 3rd party
Thank you for your download

We have sent an email with a download link. Alternatively, you may want to use the NuGet package manager to install our library.

Nuget ID

Use the NugetID and start right away, or download the package and install it handmatically