Machine Learning Manager
Jakub leads the AI engineering team at LexisNexis Intellectual Property Solutions. This is the team behind Protégé's agentic patent analysis system. His background spans data science and engineering roles at companies including Google/YouTube, Essentia Analytics, and Trilodocs, and he holds an MSc in Data Science and AI from the University of London.
Welcome to the third article in our series, “Conversations With the Team That Built Protégé in PatentSight+.” In the first post, we discussed how we manage hallucination risk by constraining the agentic system, and in the second, we explored how transparency lets users inspect the analysis as it develops.
This post turns to a deeper architectural question: the difference between AI systems constrained by predetermined workflows and those that rely on autonomous planning and execution, and why LexisNexis® Protégé™ in PatentSight+™ takes a hybrid approach that combines structure and autonomy in a way well-suited to the exploratory nature of patent analysis.
Most of the software tools we use day to day are deterministic. When you click the close button in a browser window, the window closes; if it doesn’t, something is broken. There are pre-defined user interactions and clear expectations about the intended results. AI systems, by contrast, are probabilistic. The same input will not always generate the same output.
This poses an interesting question: should we limit AI to a predetermined set of tasks, or should we let the agent develop its own plan and sequence of actions? There are pros and cons to each strategy. In fact, the two poles form a spectrum, and most AI implementations fall somewhere between pre-defined orchestration and autonomous action.
At one end of the spectrum, we risk falling into the well-known trap of trying to build knowledge into AI to circumvent the shortcomings of current machine learning models. This tends to help in the short term, but eventually reaches a plateau and inhibits further progress. A highly restricted system can produce more predictable results, but the product’s usefulness is limited by the designers’ assumptions about user behavior.
In our experience, patent analysis is not a strictly deterministic sequence of actions. Every step in the process depends on the findings from the previous step. When you set out to understand a technology landscape, should you look at company A or company B? The answer depends on what you find when you explore the data available. Maybe companies A and B do not appear in the technology landscape at all. Maybe they do, but under an unexpected technology cluster that needs further investigation. The direction of analysis can change at every step. Predicting the exact right sequence of events for every user’s goals is simply not realistic.
On the opposite end, we have completely unrestrained agentic systems that can act without limitations. This would perhaps be acceptable if AI knew everything and never made mistakes, but we all know that’s not the case. Protégé in PatentSight+ lives between these two worlds. Feedback from internal subject matter experts (SMEs) on IP and technology informs the approach, without constraining it. Protégé, therefore, acts as a cognitive core. It makes plans based on well-defined patent analysis strategies, selects tools, initiates actions, and reflects on the results. Not every question fits neatly into a recognized analysis category. Protégé is designed to help regardless.
If Protégé can attempt any patent analysis task, regardless of whether it was trained to do so, how can we ensure the quality of its responses? This is perhaps the biggest challenge with agentic systems in general. If a tool can do everything, does it do anything well?
Our approach was to learn from both industry leaders and published research on agentic system evaluation. Both sources converge on a few points:
With that in mind, we identified several standard use cases to establish a performance baseline. These included patent search and retrieval, competitive analysis, and trend detection. We ran tests with internal SMEs to identify performance gaps, then designed automated evaluations to address them. Testing against a suite of standard tasks acts as a static benchmark, helping AI engineers deliver improvements to well-understood capabilities. A static benchmark alone, however, is not sufficient for an open-ended agentic system. This is why we also rely on a human-in-the-loop iteration cycle:
This process also feeds into our automated evaluation system, which guards us against regressions in capabilities and ensures that measurable quality only ever goes up.
It is also worth noting that not all of this is equally hard. Some questions have exactly one correct answer: a patent number, a portfolio size, or the owner of a specific filing. For these, Protégé retrieves values directly from the PatentSight+ database rather than generating them, so correctness is not a matter of model quality. How do we design tools?
Tools are code functions that an agent is allowed to invoke. Protégé in PatentSight+ knows which tools it has access to and can decide how and when to use them to answer a user’s question. We give the agent a small number of powerful tools rather than a large set of simpler ones. Each is designed to handle significant complexity behind the scenes, keeping the agent focused on the analysis. This is consistent with broader industry experience and published research. Both indicate that too many tools overwhelm the agent, leading to confused reasoning, competing priorities, and degraded results.
Protégé’s tools mainly help it query our trusted patent datasets and generate visualizations from them. Agentic tools for patent analysis need careful design to create intuitive interfaces. Research shows that even something as minor as a poorly written tool description can cause unreliable performance. Our experience building Protégé in PatentSight+ confirmed this. Initially, the tools could do very little. As time went on, we added more and more capabilities. One day, we found that the quality of analysis had degraded despite all the new functionality. At that point, we had to take a step back and rethink our approach. We constrained what our tools could do and significantly reduced the complexity of using them. This shifted Protégé further toward the autonomous end of the agency spectrum and meaningfully improved overall patent analysis capabilities.
Not every AI system needs the same balance of structure and autonomy. The right position on that spectrum depends on the work to be done. For IP analysts, the answer is concrete. You need a system reliable enough to stake a business decision on, and flexible enough to be a genuine thinking partner.
The defined structure that Protégé operates within makes it dependable for high-stakes analysis. Every tool that accesses patent data is built and controlled by us. The charts, metrics, and portfolio comparisons you see come directly from the PatentSight+ database. These values are never generated by AI. This keeps the data side of the analysis deterministic, even when the reasoning around it is not.
Because it is purpose-built, prompting in Protégé is also much easier in the context of patent analytics. Protégé is designed to interpret your intent rather than wait for a perfectly framed question. It approaches analysis the way an experienced IP analyst would. It considers multiple angles, follows the data wherever it leads, and forms its own view of what is significant. If you ask a broad question, it will help you scope it. If the data points to an unexpected place, it will follow that thread. It brings its own analytical perspective to the work rather than simply executing what it was told to do.
Protégé uses a multi-model approach rather than committing to a single foundation model. For each task, it selects the most appropriate model available. This means it is not tied to any one model family and can adopt improvements as they emerge. As the underlying models improve, Protégé improves with them. So does the quality of analysis available to the IP teams that rely on it.
See how Protégé combines structure and autonomy to deliver patent insights you can act on. Talk to an expert for a full demo.
This is the third post in our series. Next, we will look at what it took to build the AI engineering team behind Protégé. Subscribe to our newsletter to follow the journey.
Your AI assistant for fast, decision-ready patent insights that drive strategic business decisions.