Key Titles for Licensing

English Dictionaries
Bilingual Dictionaries
Soundfiles
Natural Language Lexicons
Reference Resources

Dictionary Resources

The dictionaries described below incorporate many features ensuring the best possible accessibility for computational analysis:

  • Full tagging. Allows extraction of specific items relative to a knowledge area. Lexicons are readily built up, showing phoneticized lemmas and inflected forms.
  • Text disambiguated into separate sense records. Disambiguation allows extraction of senses: useful for natural language processing and translation tools.
  • Place-holders and collocational information. Place-holders reflect the structure of the language; collocational information adds to general disambiguation.
  • Numerous example sentences demonstrating typical usage, syntax, and collocation.
  • Parsed multi-word expressions. Provides a valuable source for identification and manipulation of these flexible structures.
  • Syntax information. Grammatical information is given wherever usage and structure affect the lexical item.
  • Lexical set markers and domain information. Lexical set markers provide a basis for taxonomy-building; subject domain labels reinforce this feature.
  • A DTD to standardize presentation across a range of dictionaries. Allows for many attributes which add to the linguistic picture: providing indicators to subject domain, regional use, register, style, restricted use, disambiguation of sense, gender and number information etc; also syntax pointers showing subject/object collocates and other modifiers.

English Dictionaries

Oxford Dictionary of English
The enhanced version of the Oxford Dictionary of English (SGML/XML) includes all the information present in the published dictionary (170,000 entries), such as definition text, example sentences, grammatical indicators, and encyclopedic material (12,000 entries). Alongside this, there are several formal data types suitable for language engineering and NLP applications:

  • full listing of all possible syntactic forms, tagged to show relationships to headword
  • encoding of morphological behaviour at individual sense level
  • IPA pronunciation for every form
  • full morphological data for spelling variants, plus straightforward links to standard forms
  • flexible codification of over 10,000 phrasal verbs and other multi-word units, allowing easy identification of real-world variations
  • classification of over 80,000 words/senses under 200 subject domains
  • semantic relationships between nouns and senses codified under WordNet-compatible taxonomy

 

Oxford Thesaurus of English
The New Oxford Thesaurus of English (SGML) is the largest one-volume print thesaurus available, with its 630,000 alternative words including over 570,000 synonyms, plus high numbers of antonyms, hyponyms and related terms. The synonyms are arranged in order of relevance, with grouping of terms with limited currency, such as informal, regional or technical terms. The set also includes corpus-based examples for most senses, nearly 38,000 in total. Alongside this, enhancements are planned along the lines of the New Oxford Dictionary of English :

  • extending coverage of hyponyms using WordNet-compatible semantic taxonomy
  • mapping onto dictionary definitions in NODE, to enable context-sensitive applications
  • using dictionary mapping to generate inflections for synonyms
  • linking individual synonyms to their context label, allowing subset look-ups

Shorter Oxford English Dictionary
For more scholarly purposes, the Shorter Oxford English Dictionary (new edition published autumn 2002), contains all the features of the Oxford English Dictionary, with 220,000 entries, 500,000 definitions and 83,000 quotations on historical, literary, scientific and current English.

Also available are a range of smaller dictionary and thesaurus sets, with reduced headword coverage and shorter definitions/synonym lists, and a selection of combined dictionary and thesaurus sets

 

Bilingual Dictionaries

Alongside its world-renowned range of English Dictionaries, OUP currently holds professional-level bilingual dictionary sets for French, Spanish, German and Russian, as well as smaller holdings in Italian and other languages. These feature wide and up-to-the minute coverage of the written and spoken language, with full details of grammatical usage and pronunciation, and inflected and variant forms.:

Oxford-Hachette French Dictionary 360,000 words/phrases
550, 000 translations
Oxford Spanish Dictionary; 275,000 words/phrases
450,000 translations
24 regional varieties of Spanish included
Oxford-Duden German Dictionary 320,000 words/phrases
520,000 translations
full integration of new German spelling system
Oxford Russian Dictionary 180,000 words/phrases
over 290,000 translations
Pocket Oxford Italian Dictionary 80,000 words/phrases
over 115,000 translations

As for the English dictionaries, we also have a range of smaller sets available for each of these languages, from Concise (XML or SGML) down to Mini (SGML only)

For more specialized vocabulary, the Oxford Business French Dictionary and Oxford Business Spanish Dictionary offer comprehensive bilingual coverage of words and phrases from the general language of business to specific areas such as marketing and the Internet:

format SGML
coverage over 50,000 words and phrases
over 80,000 translations
features up-to-date coverage of finance, marketing and other business areas
unrivalled coverage of Internet and e-commerce terminology

Soundfiles

We hold two extensive sets of high-quality soundfiles matching the headwords of the Shorter Oxford English Dictionary (4th ed) and the Concise Oxford Dictionary (9th ed):

format 8-bit 11kHz WAV for Shorter; 16-bit 22kHz WAV for Concise
coverage 95,000 files for Shorter; 60,000 for Concise
features accurate coverage of different homographs, variant forms and inflections
clear linking of soundfiles to phonetic information
full information on parts of speech and subsenses covered

Natural Language Lexicons

Alongside our formal dictionary resources, we hold extensive fully-tagged SGML databases of morphological and phonetic data purpose-built for natural language applications. These source lexicons currently exist for general vocabulary in English (UK and US), French, Spanish and Italian, with a smaller database in preparation for German. Further lexicons are also available in specialist reference areas, e.g. medical.

Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a keyboard representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. computing, engineering, zoology.

English

Sources Shorter Oxford Dictionary, Oxford Dictionary of English, New Oxford American Dictionary
Additional features

exclusive US or World English orthographic forms
phonetic variants and primary and secondary stress information
clear potential for subset generation using listed sources as benchmarks
up-to-date coverage of special-interest domains
extensive coverage of encyclopedic (real-world) terms, proper names, and brand names
extensive coverage of compound nouns and other compound expressions

Coverage UK over 220,000 headwords; 340,000 wordforms; 55,000 proper nouns; 3,000 abbreviations
US over 165,000 headwords; 255,000 wordforms; 25,000 proper nouns; 2,000 abbreviations

French, Spanish, German, Italian

Sources high profile sources from European partners reinforcing OUP resources
Additional features clear indication of preferred orthographic forms, with links from variants
exceptional coverage of placenames
fully tagged coverage of regional usage
Coverage French over 90,000 headwords; 400,000 wordforms; 35,000 proper nouns; 1000 abbreviations
Spanish 90,000 headwords; 575,000 wordforms; 25,000 proper nouns; 1000 abbreviations
Italian 115,000 headwords; over 925,000 wordforms; full stress marking
German 25,000 entries; over 180,000 wordforms; 15,000 stress markers

Reference Resources

Reference books from Oxford are renowned the world over for their quality, authority, and reliability. The Oxford reference range provides excellent coverage of specialized terminology and topics in a wide variety of subjects, from music, art, literature and religion, to science, warfare, and wine. A-Z subject reference (e.g. science), language reference (e.g. grammar) and general reference (e.g. placenames) are available in the Oxford Paperback Reference series, and more discursive coverage comes from the Oxford Companion range, adding up to well over 100 titles available in XML or SGML. The following are just a few of the titles on offer:

Dictionary of Law XML 3500 entries
Dictionary of Business XML 6000 entries
Dictionary of Physics XML 4000 entries
Concise Medical Dictionary XML 10000 entries
Dictionary of Writers/Works XML 3000 authors; 2000 characters; 26000 titles
Dictionary of Quotations SGML 20000 quotes; 3000 authors; 65000 keywords

For wordgame or general knowledge applications, we also offer crossword-solver lists featuring 220,000 items grouped by word length, a further 10,000 abbreviations listed with 15,000 possible expansions, and over 30,000 encyclopedic items grouped under 280 subject headings, from political leaders (with their dates) and world currencies (with countries of use), to mountain ranges and galaxies, to human bones and phobias.

If you would like more details and samples of our data, please contact us.