Parse PDF Documents in Angular with PDFjs
Parse PDF documents with pdfjs in Angular

Parse PDF Documents in Angular with PDFjs

Generate a new Angular application, if you do not have one already:

ng new PdfReader

Download pdfjs-dist from npm:

npm install pdfjs-dist

Create a service under src folder using the following command:

ng generate service PdfReader

Replace the contents of the generated service with the following:

import { Injectable } from '@angular/core';
import * as pdfjsLib from 'pdfjs-dist';

@Injectable({
  providedIn: 'root'
})
export class PdfReaderService {

  constructor() {
    pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';
  }

  public async readPdf(pdfUrl: string): Promise<string> {
    const pdf = await pdfjsLib.getDocument(pdfUrl);
    const countPromises = []; // collecting all page promises

    for (let i = 1; i <= pdf._pdfInfo.numPages; i++) {
      const page = await pdf.getPage(i);
      const textContent = await page.getTextContent();
      countPromises.push(textContent.items.map((s) => s.str).join(''));
    }

    const pageContents = await Promise.all(countPromises);
    return pageContents.join('');
  }
}

It is technically possible to use the pdf reader service from any typescript file in Angular.

For demonstration, I am using the app.component.ts to read the contents of the pdf file:

import { Component, OnInit } from '@angular/core';
import { PdfReaderService } from './pdf-reader.service';

@Component({
  selector: 'app-root',
  templateUrl: './app.component.html',
  styleUrls: ['./app.component.css']
})
export class AppComponent implements OnInit {

  constructor(private pdfReader: PdfReaderService) { }

 ngOnInit() {
    this.pdfReader.readPdf('./assets/sample.pdf')
      .then(text => alert('PDF parsed: ' + text), reason => console.error(reason));
  }
}

In the example above, I happen to have an example pdf document under src/assets, called sample.pdf.

Run the application with ng serve and the file contents appear in an alert dialog:

Umut Esen

Umut is a certified Microsoft Certified Solutions Developer and has an MSc in Computer Science. He is currently working as a senior software developer for Royal London. He is the primary author and the founder of onthecode.

Leave a Reply

Close Menu

Subscribe

Join my developer newsletter to be the first to know about my new articles!