Welcome to the first in our series of blog posts, titled “Conversations With the Team That Built Protégé in PatentSight+,” based on our talks with the team behind LexisNexis® PatentSight+™ with Protégé™ about developing an agentic AI system for patent analysis. The purpose of these articles is to take you on a journey with us, a team of AI engineers, data scientists, product managers, and patent analysts, as we build and improve this agentic system. This series will explore what it takes to develop a trustworthy, agentic AI in IP analytics that delivers actionable intelligence in response to natural-language questions.
Across the series, you will learn about:
Throughout, the focus is on our approach to solving the challenges we see in AI systems for patent analytics: formed by combining trusted patent data, structured analytics, and a clear product design to deliver fast, decision-ready patent insights that drive strategic business decisions.
We believe that the value of an agentic AI system for patent analytics is directly tied to users’ trust in the results, because the insights from such analyses inform various business decisions, such as M&A deals, licensing negotiations, and expansion plans. However, trust is gained only when the system is dependable and consistently produces reliable analyses.
Studies show that most people are wary of trusting AI, citing issues such as misinformation and inconsistent results, and that non-expert users are more willing to accept AI advice than expert users. In other words, you, as experts in patents in the field of technology you operate in, are highly likely to question the responses that the average AI tool delivers. Among IP teams, this aversion to trusting AI in IP analytics is even more pronounced because, in most cases, the insights derived from patent analytics form the foundation for making multi-million-dollar decisions that can significantly impact the business.
So how can we increase the trustworthiness of AI in patent analytics? Our approach begins by ensuring that the AI system’s results align with patent analysts’ expectations.
Jakub Hudak, Machine Learning Manager working on our AI solution, PatentSight+™ with Protégé™, puts it like this, “There are three layers of trust: trust that the tool will deliver its promise of basic functionality, trust that the interface filters right queries and visualizations, and trust that the tool delivers valuable, decision-ready insights.”
A key component of patent analysis using tools like LexisNexis® PatentSight+™ is defining a query using search syntax. This can range from simple queries, such as retrieving all data for an owner (owner=(RELX)), to more complex queries that capture a technology space using IPC/CPC classifications and full-text keywords.
Figure: When prompted to identify the top owners in CRISPR and gene-editing technology, Protégé begins by generating a search syntax to define the analysis’s scope. You can validate the correctness of the search query created by Protégé by reviewing the syntax.
Patent analysts know that defining the scope of patent data at the right level of detail is critical, and the process involves carefully calibrating the precision and recall of the search query to ensure the resulting dataset is neither too noisy nor missing any relevant owners, patents, or jurisdictions.
Good searches come from testing and refining the syntax. Without a good search syntax, it is impossible to generate an answer that you can trust and use to discover meaningful insights. Protégé’s agentic AI system iteratively refines queries, evaluates the resulting datasets, and reasons over them until it identifies the right patent scope. It then shares the search syntax behind the analysis, so you can confirm that it matches the scope you intended in your original prompt.
Search syntax has rules. You must close every open bracket, and you can use only a limited set of valid Boolean operators. That poses an interesting challenge. How can we ensure the system follows all these rules and does not waste your time by generating typos and syntax errors? Our solution is to constrain the model’s output. We define the shape of a valid search query so that, when the AI model generates a new query, it cannot deviate from this definition. As Jakub notes, “The search syntax simply has to conform to certain rules, and if the syntax is invalid, it is not executable.”
Jakub HudakMachine Learning ManagerLexisNexis Intellectual Property Solutions
Tech Note This approach of restricting the system is sometimes called grammar-constrained decoding. When a language model produces text token by token, grammar-constrained decoding ensures that the next token cannot be sampled from a subset of tokens that would probably lead to an output violating a predefined schema. Defining syntax rules as a verifiable schema is not easy, but for us, the improvement in user experience and the confidence that the system will not break are worth it. It also means the AI can focus on what matters, the patent analysis at hand, instead of worrying about the details of the search syntax grammar.
This approach of restricting the system is sometimes called grammar-constrained decoding. When a language model produces text token by token, grammar-constrained decoding ensures that the next token cannot be sampled from a subset of tokens that would probably lead to an output violating a predefined schema. Defining syntax rules as a verifiable schema is not easy, but for us, the improvement in user experience and the confidence that the system will not break are worth it. It also means the AI can focus on what matters, the patent analysis at hand, instead of worrying about the details of the search syntax grammar.
Most of the actions in our agentic system focus on processing your question, conducting the necessary research and analysis, and presenting and interpreting the results. Visualizations are the exception. In our experience, asking the system to generate a key part of your narrative is too risky, because a hallucination at that stage would directly undermine trust in the response and make it harder to confidently share findings with stakeholders.
We have also learned through user meetings, thought leadership events, training sessions, and webinars that PatentSight+ visualizations play a major role in reporting analysis to non-IP audiences. Because those visualizations are already well-trusted, we chose not to interfere with what is already working well. Instead, Protégé determines the visualization type and the patent data needed to answer a query, then requests them directly from the same database that underpins all PatentSight+ analyses. By keeping the analysis grounded in reliable data and showing the visualization directly to the user without passing through the AI, reducing the risk of hallucinated values.
Figure: The previous prompt for identifying the top owners in CRISPR and gene-editing resulted in Protégé returning the above chart. The chart itself is not created by Protege; instead, a tool within Protege requests this information from the PatentSight+ database to avoid introducing the potential for hallucination at this step.
We also reduce the risk of hallucinations in responses by limiting the sources of patent information the system can access. At a recent LexisNexis event discussing “AI in the IP Profession,” which included in-house IP counsel from leading foundational AI model companies, the consensus among the panelists was that: “Better insights don’t come from AI alone; they come from AI applied to clean, structured, and trusted data.” Using a curated dataset with verified, harmonized ownership information and updated patent validity information helps reduce the risk that the system will build its analysis on incomplete, inconsistent, or unverified patent data.
Together, these choices reflect a simple principle. If we want users to trust AI in IP analytics, we need to reduce the number of opportunities the system has to generate or amplify errors.
As Jakub puts it, “The risk of letting AI generate the numeric values in charts is that it introduces an unnecessary hallucination vector.” In high-stakes situations where patent analytics are usually relied upon, that is not good enough.
One of the early testers of Protégé, a senior IP analyst at a leading manufacturing company, explained that they trusted the results because “I know exactly which database it accesses.”
We don’t assume trust in AI for patent analytics; we prove it. We start by building a testable system that consistently supports patent analysts, reduces the risk of hallucinations in patent search, constrains query generation, and grounds visualizations in curated patent data rather than invented values. That approach reflects the broader goal of PatentSight+ with Protégé: helping teams make high-value strategic decisions by combining trusted patent data and structured analytics with purpose-built AI for strategic patent analysis.
Just as important is the foundation the system works from. Protégé operates on the curated patent data that underpins PatentSight+, because our research shows that trust depends on more than just fluent, polished answers. It is achieved through data quality, reproducibility, and outputs that best meet a patent analyst’s expectations.
This is only one part of how we are approaching trustworthy AI for patent analytics. In the next blog post, we will look at how transparency helps users understand what the system is doing and why that matters for building trust. After that, we will turn to the difference between agentic systems and pre-defined workflows, and then to what it took to build the AI development team behind this work. Stay tuned and follow our journey by subscribing to our newsletter.
Your AI assistant for fast, decision-ready patent insights that drive strategic business decisions.