Machine Learning for Patents

April 7, 2022


Episode Highlights

Chemistry was the only subject in school that I ever got a C in. Specifically organic chemistry, But I found it fascinating. And I found it a challenge. I actually synthesized the first clinical batch of the drug that went on to become Viracept. 

All patent people have some sort of technical background of one type or another, and so mine just happened to be chemistry. Chemistry is a central science. I went from the lab to working with chemical information to working with patent information, working with patent analytics, patent intelligence.

Then just in the past few years, with the advent of machine learning, and the real surge of machine learning in patent related areas, I started to work along those lines as well.

The history behind ML4Patents 

There seemed to be a fair amount of misinformation.

There are some vendors who basically were going around saying, “Well, look, you don’t need to read patents anymore. And all of this searching that you’ve been doing, and all this manual review, and basically what amounts to 50 years of patent related items, you just have to throw it out the window. It’s old school, it’s not the way things ought to be done anymore. [And] you should be using these machine learning based methods.” 

On the flip-side, then you also had a number of for lack of a better term, old school, Boolean-based searchers, who were quite adamant that, well, you don’t know what’s going on in the black box, you can’t trust the system, you can’t evaluate the results.

The reality is somewhere in the middle. I thought that it was about time that there were resources available that took a middle ground, or at least provided an unbiased view of what was going on. 

The impact of Machine Learning on patent analytics

There are activities that used to take a month, that can now be completed in a couple of days. There are activities that it would have been impossible for somebody without advanced training and exposure to very expensive databases, to be able to accomplish that can now be done in a few hours.

All of that has been driven forward by the advent of these machine learning algorithms and technologies. 

The patent corpus is challenging. It’s different than written text. It’s different from other types of documents.

But now enough time has passed and enough organizations have gotten involved, that you’re seeing real headway real progress, and being able to apply what’s been learned in those other areas and taken it into the patent world.

Measuring the accuracy of Machine Learning

One of the big things that people still talk about is trust. People say, “I couldn’t possibly use a machine learning tool, because everybody says you can’t trust it, that there isn’t any transparency, that it’s a black box.” But what we attempted to do was create a gold standard collection in a couple of different technologies.

What that allows you to do is then be able to make meaningful comparisons, and do meaningful evaluations. It’s demonstrating to people that this is for real, and they should be investing more time and effort into getting involved.

Cipher is hugely grateful for the collaboration with Tony. He was capable of doing a very difficult job of reading 1,500 patents relating to qubits or cannabinoids, and putting them into piles so that we could run our algorithm against an independent test set.

Adopters of Machine Learning

Automation is coming to everybody’s jobs. and you can either look at that fearfully, or you can actually embrace it and you can get excited about the efficiencies that it creates, the opportunity it creates for you to do more value-added work. 

That’s another underlying idea behind ML4Patents. The people in this industry have more to contribute than just being able to do searching or being able to create these buckets, and put documents in the piles. 

There’s just so much more that they’re capable of being able to provide if they can get those really tedious, manual, time-consuming tasks off their plate.

Now we’re going through another one of those changes, where instead of relying on the process that would take them four weeks to put together valuable business insight, they can start doing that now in a much shorter time period.

It’s also opening up additional avenues for insight because more visualization types are available, more types of analysis results are available. Then you apply that to decision making and now you start feeling much, much more confident about the direction that you’re about to take your organization because you’re coming at it from really great data, really great analysis, and lots of great insight.

What the future holds

Within the next five years, patent searching with machine learning-based tools is going to feel the same as Boolean searching, same to those people who used to search with punch cards or printed indexes.

We’re really getting there and the advent of tools like Cipher is really getting us to the point where people can have these resources available, they can access it quickly, get valuable insight quickly, and then use it for more and more of their decision making.

Message from the CEO

There’s so much scientific knowledge locked within patents, you have to ask why so few people have access to what is often described as we’ve discussed today, the largest library of scientific information in the world.

The conundrum is more puzzling when you realize that there is a profession of analysts like Tony trained to extract the insight hidden in plain sight. I think the answer lies in the reality that analyzing patents has until recently been a specialist sport.

With the advent of a range of AI and machine learning technologies, this has opened up the data to those Tony would say everyone who need and need to benefit from it. This is the time as we say at Cipher to unleash the strategic value of patents as a treasure chest of information available to all of us. 

Looking for new ways to classify patent data and create bespoke technology clusters?

Find out how you can use the LexisNexis Cipher Classification system to read 44m+ patents globally and pull the relevant patents into a classifier defined by you and avoid the hard work of going and finding the right patents.

Was this post helpful?