Every company today wants to be “data-driven”. But when it comes to patents, the lifeblood of innovation strategy, many executives underestimate a fundamental truth: analytics are only as good as the data beneath them. In the patent database, even a single misspelling, mistranslation, or outdated ownership record can send millions of dollars in R&D or licensing decisions down the wrong path.
Patent data is notoriously messy. It spans decades, languages, jurisdictions, subsidiaries, acquisitions, and legal nuances. Without rigorous data hygiene, “insights” from patent analytics can mislead more than they guide. Our internal survey revealed that companies reported saving more than two working years annually spent on applicant data cleaning in their patent benchmarking work by switching to a harmonized patent database.
A multinational health technology company “Our company does regular benchmarking against competitors. By using PatentSight+, we save more than two man-years per year just for applicant data cleaning.”
“Our company does regular benchmarking against competitors. By using PatentSight+, we save more than two man-years per year just for applicant data cleaning.”
Yet despite this reality, too many organizations chase advanced AI dashboards or buzzworthy analytics before addressing the basics: clean, reliable, and verified patent data.
Patent data isn’t like financial reporting, where regulators enforce consistent standards. Instead, patent offices worldwide collect filings in varying formats, languages, and levels of quality. Corporate mergers, divestitures, and name changes are not always reflected on patent filings, which further complicate the landscape.
Consider Siemens: in our database, more than 4,000 legal entities are connected to the company’s corporate tree, and more than 100 subsidiaries were acquired, while 90 were sold. Mapping those entities into a single coherent picture of “Siemens innovation” took over 100 hours of manual patent data cleanup by our harmonization team for just one company.
Mistakes can also be deeply misleading. For example, patents belonging to Germany’s Röhm GmbH, a chemical manufacturing company, were at times misclassified as belonging to Japan’s ROHM Co. Ltd., which manufactured semiconductors. Fixing this misclassification required human oversight. Any resulting analysis from a database that relied on incorrect patent data would have caused one to assume that Röhm of Germany was also developing semiconductor technologies. Similar mix-ups are common when organizations rely on raw, unverified datasets.
Figure: The difference in porttfolio visualization of Germany’s Röhm GmbH after removing the incorrectly assigned semiconductor patents owned by Japan’s Rohm Ltd.
External experts echo this challenge. Analysts covering 5G patent ownership found that up to 40% of patents declared to standards bodies couldn’t be directly matched to clean records without extensive processing, and another 20% resulted in false positives. These errors can distort rankings, skew licensing valuations, and even mislead courts in billion-dollar disputes.
The consequences of unreliable patent data extend far beyond spreadsheets.
As William Mansfield, Head of Data Strategy at LexisNexis® Intellectual Property Solutions, notes in the Innovation Momentum 2025 report, “Reliable patent data and analytics are no longer optional ‘nice-to-haves’ but essential for informed business decisions in today’s competitive innovation landscape.”
Cleaning patent data is not a one-click solution. It is labor-intensive, requires multilingual capabilities, and demands constant monitoring of mergers, reassignments, and corporate histories. LexisNexis employs a dedicated harmonization team across multiple countries and languages to manually verify over 35 million patent families, update more than 2,000 ultimate owners annually, and track up to 2,000 M&A deals every year.
This effort transforms patent analytics from guesswork into a strategic asset. Companies relying on our harmonized data can confidently:
C-level executives often gravitate to eye-catching analytics dashboards. But when it comes to patent intelligence, the unglamorous work of data hygiene determines whether those insights are trustworthy.
Patent data is messy by nature, but it doesn’t have to stay that way. By investing in machine learning algorithms and trained experts that perform systematic cleaning, harmonization, and validation, we help companies turn their patent analytics from risky speculation into a competitive edge. As the pace of innovation accelerates, leaders must resist the temptation of “fancy” analytics built on shaky foundations and instead prioritize the one thing that ensures every downstream decision is sound: clean patent data.
The Innovation Momentum 2025: The Global Top 100 report celebrates the top 100 companies that are breaking boundaries and setting new benchmarks in technology and industry through visionary advancements.