Lexemes

A lexeme is a dictionary entry - a word or phrase in a specific language with its associated linguistic information. LangDex aggregates lexemes from multiple sources including Wiktionary (via Kaikki), JMDict, and regional dictionaries.

Lexeme Object

{
  "id": 56789,
  "lemma": "水",
  "language_id": 1234,
  "language": "jpn",
  "pos": "noun",
  "reading": "みず",
  "romanization": "mizu",
  "source": "kaikki",
  "senses": [
    {
      "id": 111,
      "sense_order": 1,
      "gloss": "water",
      "meaning_id": 98765,
      "definitions": [
        {
          "language": "eng",
          "text": "The liquid form of H2O"
        }
      ],
      "examples": [
        {
          "text": "水を飲む",
          "translation": "to drink water"
        }
      ]
    }
  ],
  "pronunciations": [
    {
      "ipa": "mizɯ",
      "audio_url": "https://cdn.langdex.co/audio/jpn/mizu.mp3"
    }
  ],
  "etymology": {
    "text": "From Old Japanese 水 (mi₁du)"
  },
  "frequency": {
    "rank": 342,
    "corpus": "opensubtitles"
  }
}

Core Fields

Field	Type	Description
`id`	integer	Internal LangDex ID
`lemma`	string	Dictionary form / headword
`language_id`	integer	Foreign key to language
`language`	string	ISO 639-3 code
`pos`	string	Part of speech
`reading`	string	Phonetic reading (for logographic scripts)
`romanization`	string	Latin transliteration
`source`	string	Data source (`kaikki`, `jmdict`, etc.)

Senses

Each lexeme has one or more senses - distinct meanings of the word.

{
  "id": 111,
  "lexeme_id": 56789,
  "sense_order": 1,
  "gloss": "water",
  "meaning_id": 98765,
  "pos": "noun",
  "register": "neutral",
  "domain": "nature",
  "definitions": [...],
  "examples": [...]
}

Sense Fields

Field	Type	Description
`sense_order`	integer	Position within the lexeme
`gloss`	string	Short definition/translation
`meaning_id`	integer	Link to cross-lingual meaning
`pos`	string	Part of speech (can differ from lexeme)
`register`	string	`formal`, `informal`, `slang`, etc.
`domain`	string	Subject area (medicine, law, etc.)

Word Forms

Lexemes can have multiple word forms representing inflected variants:

{
  "lexeme_id": 56789,
  "word_forms": [
    {
      "form": "waters",
      "features": ["noun", "plural"]
    },
    {
      "form": "watered",
      "features": ["verb", "past"]
    },
    {
      "form": "watering",
      "features": ["verb", "present-participle"]
    }
  ]
}

Morphological data comes from UniMorph, covering 150+ languages.

Pronunciations

{
  "pronunciations": [
    {
      "ipa": "/ˈwɔːtər/",
      "variety": "Received Pronunciation",
      "audio_url": "https://cdn.langdex.co/audio/eng/water-rp.mp3"
    },
    {
      "ipa": "/ˈwɑːtɚ/",
      "variety": "General American",
      "audio_url": "https://cdn.langdex.co/audio/eng/water-ga.mp3"
    }
  ]
}

Pronunciation data comes from:

IPA-Dict (1M+ entries)
Kaikki/Wiktionary pronunciations
Wiktionary audio files

Etymology

{
  "etymology": {
    "text": "From Middle English water, from Old English wæter, from Proto-Germanic *watōr",
    "cognates": [
      {"language": "deu", "lemma": "Wasser"},
      {"language": "nld", "lemma": "water"},
      {"language": "swe", "lemma": "vatten"}
    ]
  }
}

Frequency

Word frequency data from OpenSubtitles and Leipzig corpora:

{
  "frequency": {
    "rank": 342,
    "count": 1847293,
    "per_million": 3421.5,
    "corpus": "opensubtitles-en"
  }
}

Proficiency Levels

For languages with standardized proficiency tests:

{
  "proficiency": {
    "standard": "JLPT",
    "level": "N5",
    "is_core_vocabulary": true
  }
}

Supported standards:

JLPT (Japanese) - N5 to N1
HSK (Chinese) - 1 to 9
CEFR (European) - A1 to C2

API Examples

Get a lexeme with full details

curl "https://api.langdex.co/v1/lexemes/56789?include=senses,pronunciations,etymology,frequency" \
  -H "Authorization: Bearer YOUR_API_KEY"

Search lexemes

curl "https://api.langdex.co/v1/lexemes/search?q=water&lang=eng&pos=noun" \
  -H "Authorization: Bearer YOUR_API_KEY"

Get word forms

curl "https://api.langdex.co/v1/lexemes/56789/forms" \
  -H "Authorization: Bearer YOUR_API_KEY"

Get lexemes by proficiency level

curl "https://api.langdex.co/v1/lexemes?lang=jpn&proficiency=JLPT:N5&limit=100" \
  -H "Authorization: Bearer YOUR_API_KEY"

Data Sources

Source	Languages	Lexemes
Kaikki (Wiktionary)	200+	~10M
JMDict	Japanese	~200K
Regional dictionaries	Various	~100K

Meanings - Cross-lingual semantic hub
Translations - How lexemes translate

Getting Started

Core Concepts

Lexeme Object

Core Fields

Senses

Sense Fields

Word Forms

Pronunciations

Etymology

Frequency

Proficiency Levels

API Examples

Get a lexeme with full details

Search lexemes

Get word forms

Get lexemes by proficiency level

Data Sources

Getting Started

Core Concepts

​Lexeme Object

​Core Fields

​Senses

​Sense Fields

​Word Forms

​Pronunciations

​Etymology

​Frequency

​Proficiency Levels

​API Examples

​Get a lexeme with full details

​Search lexemes

​Get word forms

​Get lexemes by proficiency level

​Data Sources

​Related Concepts

Lexeme Object

Core Fields

Senses

Sense Fields

Word Forms

Pronunciations

Etymology

Frequency

Proficiency Levels

API Examples

Get a lexeme with full details

Search lexemes

Get word forms

Get lexemes by proficiency level

Data Sources

Related Concepts