In this post, I’ll show you how to parse a PDF document in Angular.

We will extract full text content of a PDF file using pdfjs-dist library.

Generate a new Angular application, if you do not have one already:

ng new PdfReader
Code language: Bash (bash)

Install the latest version of pdfjs-dist from npm:

npm install pdfjs-dist
Code language: Bash (bash)

Let’s create a service, which will contain the logic to parse PDF:

ng generate service PdfReader
Code language: Bash (bash)

Replace the contents of the generated service with the following:

import { Injectable } from '@angular/core'; import * as pdfjsLib from 'pdfjs-dist'; @Injectable({ providedIn: 'root' }) export class PdfReaderService { constructor() { pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js'; } public async readPdf(pdfUrl: string): Promise<string> { const pdf = await pdfjsLib.getDocument(pdfUrl); const countPromises = []; // collecting all page promises for (let i = 1; i <= pdf._pdfInfo.numPages; i++) { const page = await pdf.getPage(i); const textContent = await page.getTextContent(); countPromises.push(textContent.items.map((s) => s.str).join('')); } const pageContents = await Promise.all(countPromises); return pageContents.join(''); } }
Code language: TypeScript (typescript)

Our service configures the worker source of pdfjs-dist.

Given a URL pointing to a PDF document, readPdf method retrieves the file.

For demonstration, I am using the app.component.ts to read the contents of the pdf file:

import { Component, OnInit } from '@angular/core'; import { PdfReaderService } from './pdf-reader.service'; @Component({ selector: 'app-root', templateUrl: './app.component.html', styleUrls: ['./app.component.css'] }) export class AppComponent implements OnInit { constructor(private pdfReader: PdfReaderService) { } ngOnInit() { this.pdfReader.readPdf('./assets/sample.pdf') .then(text => alert('PDF parsed: ' + text), reason => console.error(reason)); } }
Code language: TypeScript (typescript)

In the example above, I have an example pdf document under src/assets, called sample.pdf.

Run the application with ng serve and the file contents appear in an alert dialog:

Umut Esen

I am a software developer and blogger with a passion for the world wide web.

Leave a Reply

This Post Has 4 Comments

  1. Igor

    Hi! How can I get the file in the form?

    1. Umut Esen

      You would need to save it somewhere accessible via a URL, for example to a backend server. Then pass the URL to pdfjs to read, good luck!

  2. Ciprian

    Hi,
    I tried to use the code in the article, and ng serve throws an error related to _pdfInfo and getPage, saying that they do not exist on type PDFDocumentLoadingTask.

    Could you give any suggestions on how to fix that?

    Thanks

  3. Sajan

    Thank you for the detailed tutorial.

    I am facing issue in :
    countPromises.push(textContent.items.map((s) => s.str).join(”));
    ‘s.str’ throws error – “Property ‘str’ does not exist on type ‘TextItem | TextMarkedContent’.
    Property ‘str’ does not exist on type ‘TextMarkedContent’.”

    Any help would be appreciated.