Skip to content

Query WordNet dictionary in TypeScript/JS - definitions, synonyms, hypernyms. Zod-validated parser with CLI

License

Notifications You must be signed in to change notification settings

king8fisher/synset

Repository files navigation

npm version

synset

WordNet dictionary parser with Zod validation, query utilities, and CLI.

Usage

CLI

# Run without installing
bunx synset define dog

# List synonyms
bunx synset synonyms happy

# Show hypernyms (more general terms)
bunx synset hypernyms dog

# Show all relations
bunx synset related computer

# Pre-download WordNet data
bunx synset fetch

# Export to SQLite database
bunx synset export-sqlite dictionary.db

# Use local file instead of cache
bunx synset define dog --file ./path/to/english-wordnet-{YEAR}.xml

Or install globally:

bun install -g synset
synset define dog

Library

npm install synset
# or
bun add synset
import {
  fetchWordNet,
  loadWordNet,
  buildIndex,
  getDefinitions,
  getSynonyms,
  getHypernyms,
  findSynsets,
} from 'synset'

// Fetch WordNet data (auto-discovers latest version, downloads & caches ~100MB XML)
const { lexicon, version } = await fetchWordNet()
console.log(`Loaded WordNet ${version}`)

// Or load from local file
const lexicon = await loadWordNet('./path/to/english-wordnet-{YEAR}.xml')

// Build index for fast lookups
const index = buildIndex(lexicon)

// Query
getDefinitions(index, 'dog')
// [{ text: "a member of the genus Canis...", synset, partOfSpeech: "n" }, ...]

getSynonyms(index, 'happy')
// [{ word: "glad", entry, synset }, ...]

getHypernyms(index, 'dog')
// [Synset for "canine", Synset for "domestic animal", ...]

findSynsets(index, 'bank')
// [Synset for "financial institution", Synset for "river bank", ...]

SQLite Export

import { exportToSQLite } from 'synset'

// Export to SQLite
exportToSQLite(lexicon, 'dictionary.db', {
  onProgress: ({ phase, current, total }) => {
    // phases: words, synsets, word_synsets, synset_relations, sense_relations
    console.log(`${phase}: ${current}/${total}`)
  }
})

Schema is available as:

  • import { SCHEMA } from 'synset' - SQL string constant
  • synset/schema.sql - standalone file via package exports

Tables:

  • words - unique words with display form
  • synsets - definitions with part of speech
  • word_synsets - word → synset mappings
  • synset_relations - hypernym, hyponym, meronym, etc. links between synsets
  • sense_relations - antonym, derivation, pertainym, etc. links between word senses

Example queries:

-- Hypernyms via synset relations (dog → canine, domestic animal)
SELECT w2.word_display, s2.definition
FROM words w
JOIN word_synsets ws ON w.id = ws.word_id
JOIN synset_relations sr ON ws.synset_id = sr.source_id
JOIN synsets s2 ON sr.target_id = s2.id
JOIN word_synsets ws2 ON s2.id = ws2.synset_id
JOIN words w2 ON ws2.word_id = w2.id
WHERE w.word = 'dog' AND sr.rel_type = 'hypernym';

-- Antonyms via sense relations (happy → unhappy)
SELECT w2.word_display, s2.definition
FROM words w
JOIN sense_relations sr ON w.id = sr.source_word_id
JOIN words w2 ON sr.target_word_id = w2.id
JOIN synsets s2 ON sr.target_synset_id = s2.id
WHERE w.word = 'happy' AND sr.rel_type = 'antonym';

Runtime

  • Bun: Full support
  • Node.js 18+: Full support

Development

bun install
bun test
bun run check  # typecheck
bun run build  # build dist/

Dictionary Module

WordNet XML Source Structure

(Originally created with xmlstarlet against the 2023 xml file).

$ xmlstarlet el data/english-wordnet-2023.xml | sort | uniq | sort (with unicode symbols added manually)

📂 LexicalResource               root node
   📂 Lexicon                    desc of the database: id prefix, language, version, ...
      📂 LexicalEntry            an id for grouping children that can be refed by a Synset.
         📄 Form
         📂 Lemma
            📄 Pronunciation
         📂 Sense
            📄 SenseRelation
      📂 Synset
         📄 Definition
         📄 Example
         📄 ILIDefinition
         📄 SynsetRelation
      📄 SyntacticBehaviour

Schema Verification

Stats accumulated with:

  • $ xmlstarlet el data/english-wordnet-2023.xml | sort | uniq -c | sort -n
  • $ grep -oE '<[A-Za-z]+' data/english-wordnet-2024.xml | sed 's/<//g' | sort | uniq -c | sort -n

Element counts comparison (schema unchanged between releases):

Element 2023 2024
LexicalResource 1 1
Lexicon 1 1
SyntacticBehaviour 39 39
ILIDefinition 2700 3216
Form 4474 4474
Pronunciation 44671 44669
Example 49638 49723
Synset 120135 120630
Definition 120141 120635
SenseRelation 122041 122018
LexicalEntry 161338 161705
Lemma 161338 161705
Sense 212071 212478
SynsetRelation 293864 297150

Zod Test Coverage

📂 LexicalResource                  [-] (basic test for a parent)
   📂 Lexicon                       [-] (basic test for a parent)
      📂 LexicalEntry               [x]
         📄 Form                    [x]
         📂 Lemma                   [x]
            📄 Pronunciation        [x]
         📂 Sense                   [x]
            📄 SenseRelation        [x]
      📂 Synset                     [x]
         📄 Definition              [x]
         📄 Example                 [x]
         📄 ILIDefinition           [x]
         📄 SynsetRelation          [x]
      📄 SyntacticBehaviour         [x]

TODO

  • Support Open English Namenet - proper nouns (people, places, etc.) were moved to a separate resource starting with 2025 release
  • Option to fetch "2025+" edition which includes curated proper nouns from Namenet
  • CLI --json flag for JSON output
  • CLI colored output (disable with --no-color)

About

Query WordNet dictionary in TypeScript/JS - definitions, synonyms, hypernyms. Zod-validated parser with CLI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published