In this post, I’ll show you how to parse a PDF document in Angular.

We will extract full text content of a PDF file using pdfjs-dist library.

Generate a new Angular application, if you do not have one already:

ng new PdfReaderCode language: Bash (bash)

Install the latest version of pdfjs-dist from npm:

npm install pdfjs-distCode language: Bash (bash)

Let’s create a service, which will contain the logic to parse PDF:

ng generate service PdfReaderCode language: Bash (bash)

Replace the contents of the generated service with the following:

import { Injectable } from '@angular/core';
import * as pdfjsLib from 'pdfjs-dist';
@Injectable({
  providedIn: 'root'
})
export class PdfReaderService {
  constructor() {
    pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';
  }
  public async readPdf(pdfUrl: string): Promise<string> {
    const pdf = await pdfjsLib.getDocument(pdfUrl);
    const countPromises = []; // collecting all page promises
    for (let i = 1; i <= pdf._pdfInfo.numPages; i++) {
      const page = await pdf.getPage(i);
      const textContent = await page.getTextContent();
      countPromises.push(textContent.items.map((s) => s.str).join(''));
    }
    const pageContents = await Promise.all(countPromises);
    return pageContents.join('');
  }
}Code language: TypeScript (typescript)

Our service configures the worker source of pdfjs-dist.

Given a URL pointing to a PDF document, readPdf method retrieves the file.

For demonstration, I am using the app.component.ts to read the contents of the pdf file:

import { Component, OnInit } from '@angular/core';
import { PdfReaderService } from './pdf-reader.service';
@Component({
  selector: 'app-root',
  templateUrl: './app.component.html',
  styleUrls: ['./app.component.css']
})
export class AppComponent implements OnInit {
  constructor(private pdfReader: PdfReaderService) { }
 ngOnInit() {
    this.pdfReader.readPdf('./assets/sample.pdf')
      .then(text => alert('PDF parsed: ' + text), reason => console.error(reason));
  }
}Code language: TypeScript (typescript)

In the example above, I have an example pdf document under src/assets, called sample.pdf.

Run the application with ng serve and the file contents appear in an alert dialog:

Umut Esen

Software Engineer specialising in full-stack web application development.

Leave a Reply

This Post Has 8 Comments

  1. Artyom

    Don’t know when it was written, but found an error pdfjsLib.getDocument has to be referred as a pdfDocumentLoadingTask and then from there we should get a promise. so should be something like
    const pdfLoadingTask = pdfjsLib.getDocument(pdfUrl);
    pdfLoadingTask.promise.then((pdf) => {
    for(let i = 1; i <= pdf._pdfInfo.numPages; i++) {
    ….

    or const pdf = await pdfjsLib.getDocument(pdfUrl).promise;
    which was missed in your code.

    Anyways great article and great example. Thanks for showing this.

  2. Sajan

    Thank you for the detailed tutorial.

    I am facing issue in :
    countPromises.push(textContent.items.map((s) => s.str).join(”));
    ‘s.str’ throws error – “Property ‘str’ does not exist on type ‘TextItem | TextMarkedContent’.
    Property ‘str’ does not exist on type ‘TextMarkedContent’.”

    Any help would be appreciated.

    1. Artyom

      Hey, Sajan.
      I was able to fix this.
      so as I investigated textContent.items is of type TextItem | TextMarkedContent so if we use (s as TextItem).str instead of just s.str we’d get all compiled and running.

  3. Ciprian

    Hi,
    I tried to use the code in the article, and ng serve throws an error related to _pdfInfo and getPage, saying that they do not exist on type PDFDocumentLoadingTask.

    Could you give any suggestions on how to fix that?

    Thanks

    1. c

      use pdf._worker.getPage() to fix the issue

    2. Artyom

      As I wrote in my comment you have to add .promise in PdfReaderService class. So that you’d have something like this

      const pdf = await pdfjsLib.getDocument(pdfUrl).promise;

      with that everything works like a charm

  4. Igor

    Hi! How can I get the file in the form?

    1. Umut Esen

      You would need to save it somewhere accessible via a URL, for example to a backend server. Then pass the URL to pdfjs to read, good luck!