Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Name Disambiguation

notebooks/name_disambiguation.livemd

Name Disambiguation

Mix.install([
  {:akin, "~> 0.1.8"}
])

Match

UNDER DEVELOPMENT

Identity is the challenge of author name disambiguation (AND). The aim of AND is to match an author’s name to that author when the author appears in a list of many authors. Complexity arises from homonymity (many people with the same name) and synonymity (when one person uses different forms/spellings of their name in publications).

Given the name of an author which is divided into the given, middle, and family name parts (i.e. “Virginia”, nil, “Woolf”) and a list of possible matching author names, find and return the matches for the author in the list. If initials exist in the left name, a separate comparison is performed for the initals and the sets of the right string.

If the comparison metrics produce a score greater than or equal to 0.9, they considered a match and returned in the list.

We want to find possible matches to the name “V. Woolf”

name = "Virginia Woolf"

in a list of other names

other_names = [
  "V Woolf",
  "V Woolfe",
  "Virginia Woolf",
  "V White",
  "Viginia Wolverine",
  "Virginia Woolfe"
]

The most likely matches are returned.

Akin.match_names(name, other_names)

Use options to require stricter matching.

other_names = [
  "Victor Woolf",
  "V Woolf",
  "V Woolfe",
  "Virginia Woolf",
  "V White",
  "Viginia Wolverine",
  "Virginia Woolfe"
]
opts = [match_at: 0.99]

Akin.match_names(name, other_names, opts)

Initials

The results are good even if we only have an initial for part of the name we are disambiguating.

name = "V. Woolf"
Akin.match_names(name, other_names)

Not Perfect

The results are imperfect and can lead to unwanted matches. See how “Victor” fairs.

other_names = [
  "Victor Woolf",
  "V Woolfe",
  "Virginia Woolf",
  "V White",
  "Viginia Wolverine",
  "Virginia Woolfe"
]
Akin.match_names(name, other_names)
opts = [match_at: 0.99, algorithms: ["bag_distance", "jaccard", "jaro_winkler"]]

Akin.match_names(name, other_names, opts)