pdf_1.pngThese days, most patent researchers expect patent information providers to provide access to PDF patent documents.

Doing anything less seems laggardly. For many years now, patent PDFs have been downloadable for free from services like Google Patents and Espacenet. The opening up of Adobe PDF as an ISO standard has also made it easier and cheaper for individuals to create their own searchable, bookmarked PDF documents.

Does the PDF Collection in the TotalPatent® Patent Research Tool Offer Unique Value?

According to TotalPatent product literature, its PDF patent collection is distinguished by the sheer size and extensiveness of the collection (60 million+), its small file size (compressed PDFs), searchable full-text (most documents) and built-in bookmarking (tables of contents).

These characteristics are meant to allow TotalPatent users to save time and effort sourcing patent documents while saving hard drive and server space. Being able to go directly to specific sections using the built-in bookmarks for major sections, directly search and highlight text within PDF files copy and paste the text to other documents and applications, and work directly with PDF patent document images without having to skip between extracted full text files and PDF images contribute to the time savings.

Now, depending upon your role, you may work extensively with the patent documents yourself, or you may be providing PDF copies to your downstream clients as a service.

So do you know what you are providing? Have you examined your PDF files to see if they help your users, or frustrate them?

As part of my mission to get to know TotalPatent a little better, I decided to take a closer look at its PDF collection, given that it is highlighted as a key feature of the product.

My mini-test was highly informal and fairly limited, but it may be a useful test to do when evaluating a patent research platform. All PDF files are not created equal, and it helps to understand what you are supplying to your end users and clients.

This, in outline, was my process:

  1. Create a small search result set in a topic area of interest.
  2. Choose a list of patent documents to download. I made sure to include a range of authorities, languages and publication times.
  3. Select a few patent information providers to compare. I recommend trying both free and fee-based services.
  4. Search for the same patent documents on each service and download the patent PDFs to a folder specific to the document supplier.
  5. Look at each set in turn. Examine each PDF file. Here are things to check:
  • File size
  • The ability to highlight the embedded text: Can you do this?
  • How consistently can you highlight all of the text?
  • Try to search the text using the “Find” in your PDF reader
  • Look at the properties of the PDF file.

The file properties can tell you who produced the PDF file, the version of the PDF standard used and even the name and version of the software used to produce it.

What I Found

My own mini-test was tiny and extremely limited—but I found it useful.

Availability of Patent Documents: Three of the four services I tried had all the documents available as PDFs.

I expected this result. I had a very small sample size (fewer than 20 documents), and had only retrieved relatively recent documents from major patent authorities.

The other provider has only recently begun offering PDFs for a wider range of authorities, and the platform is free and not core to the provider’s business. I was surprised by a couple of bad links, but otherwise I did not truly expect much.

Bookmarking of PDFs: Every downloaded PDF I looked at had chapter heading bookmarks for the major sections of the patent document (e.g., specification, claims). I was pleasantly surprised.

Full-Text Searchability of PDF Files: All the Roman-script PDF files I looked at from TotalPatent were searchable. The one file that did not contain embedded full text was a Chinese language patent document.

I did not expect to find searchable full-text PDF from other providers, but I did find a few. This was not a consistent result (about half to one-third from one provider), but it may perhaps occur if provider has sourced patent PDFs from multiple document providers.

PDF File Size: TotalPatent PDF files are compressed in order to speed download times and reduce storage requirements.

Interestingly, in my tiny sample, the TotalPatent U.S. patent documents were generally small (about 200 to 800 KB), but not always the smallest documents available.

The most consistent difference was with WO and EP PDF files. The TotalPatent files I examined were, on average, about half the size of files I had retrieved from other authorities, even with the inclusion of searchable text.

Overall: LexisNexis TotalPatent does seem to be delivering an extensive collection of consistently searchable bookmarked PDFs with relatively small file sizes.

Of course, more extensive, systematic testing using a wider range of dates and patent authorities would be more definitive, but if I were evaluating a product for my organization, running a few mini-tests like this would probably give me confidence about my recommendations.

Consistent, Built-In Searchability Simplifies Workflow for Patent Researchers and Attorneys

While it is possible to use readily available software packages to generate your own searchable, compressed PDF files, the process is still time-consuming. Being able to offer clients access to bookmarked, searchable PDF files without extra effort is a huge plus, and the searchability of the PDF files should convert into time-savings for everyone who works with patent documents.

Consistency and Quality Control Inspires Confidence

Ultimately, is what really impressed me about the TotalPatent PDF collection was consistency and quality of the results, even with a very tiny sample size.

I retrieved what I expected, and when I examined the files in greater detail, I saw a level of uniformity and document quality that was consistent. I felt that, if I used the TotalPatent repositories, I could assure clients that they would have access to output with consistent characteristics and features. The idea that I could assure clients that the files would also be consistently searchable was also a huge plus.