Disclosed herein are various exemplary systems and methods for linking
entity references to entities and identifying associations between
entities. In particular, a method for identifying an entity from a
plurality of entity references, each entity reference being linked with a
separate ghost entity, is provided. The method comprises the steps of
comparing an entity reference of a first ghost entity with an entity
reference of a second ghost entity to determine a match probability
between the entity reference of the first ghost entity and the entity
reference of the second ghost entity, linking the entity reference of the
first ghost entity additionally with the second ghost entity and the
entity reference of the second ghost entity additionally with the first
ghost entity when the match probability is greater than or equal to a
match threshold and repeating the steps of comparing and linking for one
or more ghost entity pairings possible from the ghost entities. The
method further comprises determining, for one or more entity references
linked to a ghost entity, a score for the entity reference based at least
in part on a match probability between the entity reference and a value
representing the one or more entity references linked to the ghost entity
and identifying the ghost entity as an actual entity based at least in
part on one or more scores for the one or more entity references linked
to the ghost entity.