Robbie, Bobby, Rob, and Bob derive from Robert. Johnny, John, and Jon derive from Jonathan.
When dealing with person names, nicknames can make it hard to tell if two people are indeed the same person, unless you had a tool to help you identify these names. But do you use a custom stemming dictionary? Stemming thesaurus? Are there other options? Here, we compare options for stemming person names in MarkLogic to help you decide which is the right approach for you.
When stemming names using a dictionary, all of the following apply:
When stemming names using a thesaurus, consider:
It would be overkill for this person name stemming use case, but it is worth pointing out a trick using entity extraction. Feed in query strings to cts:parse
with function bindings to turn a query string into a tagged query, which you then expand and interpret according to whatever criteria you like, whether or not you do entity extraction on the actual content. Using an entity extraction approach:
If you have a large set of alternatives, or care about language context, go with the stemming dictionary.
Like what you just read, here are a few more articles for you to check out or you can visit our blog overview page to see more.
The MarkLogic Optic API makes your searches smarter by incorporating semantic information about the world around you and this tutorial shows you just how to do it.
Are you someone who’s more comfortable working in Graphical User Interface (GUI) than writing code? Do you want to have a visual representation of your data transformation pipelines? What if there was a way to empower users to visually enrich content and drive data pipelines without writing code? With the community tool Pipes for MarkLogic […]
Rest and Spread Properties in MarkLogic 10 In this last blog of the series, we’ll review over the new object rest and spread properties in MarkLogic 10. As mentioned previously, other newly introduced features of MarkLogic 10 include: The addition of JavaScript Modules, also known as MJS (discussed in detail in the first blog in this […]
Don’t waste time stitching together components. MarkLogic combines the power of a multi-model database, search, and semantic AI technology in a single platform with mastering, metadata management, government-grade security and more.
Request a Demo