In this post, I’ll show you how to parse a PDF document in Angular.

We will extract full text content of a PDF file using pdfjs-dist library.

Generate a new Angular application, if you do not have one already:

ng new PdfReader
Code language: Bash (bash)

Install the latest version of pdfjs-dist from npm:

npm install pdfjs-dist
Code language: Bash (bash)

Let’s create a service, which will contain the logic to parse PDF:

ng generate service PdfReader
Code language: Bash (bash)

Replace the contents of the generated service with the following:

import { Injectable } from '@angular/core'; import * as pdfjsLib from 'pdfjs-dist'; @Injectable({ providedIn: 'root' }) export class PdfReaderService { constructor() { pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js'; } public async readPdf(pdfUrl: string): Promise<string> { const pdf = await pdfjsLib.getDocument(pdfUrl); const countPromises = []; // collecting all page promises for (let i = 1; i <= pdf._pdfInfo.numPages; i++) { const page = await pdf.getPage(i); const textContent = await page.getTextContent(); countPromises.push(textContent.items.map((s) => s.str).join('')); } const pageContents = await Promise.all(countPromises); return pageContents.join(''); } }
Code language: TypeScript (typescript)

Our service configures the worker source of pdfjs-dist.

Given a URL pointing to a PDF document, readPdf method retrieves the file.

For demonstration, I am using the app.component.ts to read the contents of the pdf file:

import { Component, OnInit } from '@angular/core'; import { PdfReaderService } from './pdf-reader.service'; @Component({ selector: 'app-root', templateUrl: './app.component.html', styleUrls: ['./app.component.css'] }) export class AppComponent implements OnInit { constructor(private pdfReader: PdfReaderService) { } ngOnInit() { this.pdfReader.readPdf('./assets/sample.pdf') .then(text => alert('PDF parsed: ' + text), reason => console.error(reason)); } }
Code language: TypeScript (typescript)

In the example above, I have an example pdf document under src/assets, called sample.pdf.

Run the application with ng serve and the file contents appear in an alert dialog:

Umut Esen

I am a software developer and blogger with a passion for the world wide web.

Leave a Reply

This Post Has 5 Comments

  1. Igor

    Hi! How can I get the file in the form?

    1. Umut Esen

      You would need to save it somewhere accessible via a URL, for example to a backend server. Then pass the URL to pdfjs to read, good luck!

  2. Ciprian

    Hi,
    I tried to use the code in the article, and ng serve throws an error related to _pdfInfo and getPage, saying that they do not exist on type PDFDocumentLoadingTask.

    Could you give any suggestions on how to fix that?

    Thanks

    1. c

      use pdf._worker.getPage() to fix the issue

  3. Sajan

    Thank you for the detailed tutorial.

    I am facing issue in :
    countPromises.push(textContent.items.map((s) => s.str).join(”));
    ‘s.str’ throws error – “Property ‘str’ does not exist on type ‘TextItem | TextMarkedContent’.
    Property ‘str’ does not exist on type ‘TextMarkedContent’.”

    Any help would be appreciated.