Identifying missing dictionary entries with frequency-conserving context models

Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M. Danforth, and Peter Sheridan Dodds
Phys. Rev. E 92, 042808 – Published 12 October 2015

Abstract

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary, an extensive, online, collaborative, and open-source dictionary that contains over 100000 phrasal definitions, we develop highly effective filters for the identification of meaningful, missing phrase entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique and expanding our knowledge of the defined English lexicon of phrases.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 5 March 2015

DOI:https://doi.org/10.1103/PhysRevE.92.042808

©2015 American Physical Society

Authors & Affiliations

Jake Ryland Williams*, Eric M. Clark, James P. Bagrow, Christopher M. Danforth§, and Peter Sheridan Dodds

  • Department of Mathematics & Statistics, Vermont Complex Systems Center, Computational Story Lab, and The Vermont Advanced Computing Core, The University of Vermont, Burlington, Vermont 05401, USA

  • *jake.williams@uvm.edu
  • eric.clark@uvm.edu
  • james.bagrow@uvm.edu
  • §chris.danforth@uvm.edu
  • peter.dodds@uvm.edu

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 92, Iss. 4 — October 2015

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×