pywiktionary API

The library provides classes which are usable by third party tools.

Wiktionary Class

class pywiktionary.Wiktionary(lang=None, XSAMPA=False)[source]

Wiktionary class for IPA extraction from XML dump or MediaWiki API.

To extraction IPA for a certain language, specify lang parameter, default is extracting IPA for all available languages.

To convert IPA text to X-SAMPA text, use XSAMPA parameter.

Parameters:
  • lang (string) – String of language type.
  • XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
extract_IPA(dump_file)[source]

Extraction IPA list from Wiktionary XML dump.

Parameters:dump_file (string) – Path of Wiktionary XML dump file.
Returns:List of extracted IPA results in {"id": "", "title": "", "pronunciation": ""} format.
Return type:list
get_entry_pronunciation(wiki_text, title=None)[source]

Extraction IPA for entry in Wiktionary XML dump.

Parameters:
  • wiki_text (string) – String of XML entry wiki text.
  • title (string) – String of wiki entry title.
Returns:

Dict of word’s IPA results. Key: language name; Value: list of IPA text.

Return type:

dict

lookup(word)[source]

Look up IPA of word through Wiktionary API.

Parameters:word (string) – String of a word to be looked up.
Returns:Dict of word’s IPA results. Key: language name; Value: list of IPA text.
Return type:dict
set_XSAMPA(XSAMPA)[source]

Set X-SAMPA conversion option.

Parameters:XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
set_lang(lang)[source]

Set language.

Parameters:lang (string) – String of language name.
set_parser()[source]

Set parser for Wiktionary.

Use the Wiktionary lang and XSAMPA parameters.

Parser Class

class pywiktionary.Parser(lang=None, XSAMPA=False)[source]

Wiktionary parser to extract IPA text from pronunciation section.

To extraction IPA for a certain language, specify lang parameter, default is extracting IPA for all available languages.

To convert IPA text to X-SAMPA text, use XSAMPA parameter.

Parameters:
  • lang (string) – String of language type.
  • XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
expand_template(text)[source]

Expand IPA Template through Wiktionary API.

Used to expand {{*-IPA}} template in parser and return IPA list.

Parameters:text (string) – String of template text inside “{{” and “}}”.
Returns:List of expanded IPA text.
Return type:list of string

Examples

>>> parser = Parser()
>>> template = "{{la-IPA|eccl=yes|thēsaurus}}"
>>> parser.expand_template(template)
['/tʰeːˈsau̯.rus/', '[tʰeːˈsau̯.rʊs]', '/teˈsau̯.rus/']
parse(wiki_text, title=None)[source]

Parse Wiktionary wiki text.

Split Wiktionary wiki text into different langugaes and return parseed IPA result.

Parameters:
  • wiki_text (string) – String of Wiktionary wiki text, from XML dump or Wiktionary API.
  • title (string) – String of wiki entry title.
Returns:

Dict of parsed IPA results. Key: language name; Value: list of IPA text.

Return type:

dict

parse_detail(wiki_text, depth=3)[source]

Parse the section of a certain language in wiki text.

Parse pronunciation section of the certain language recursively.

Parameters:
  • wiki_text (string) – String of wiki text in a language section.
  • depth (int) – Integer indicated depth of pronunciation section.
Returns:

List of extracted IPA text in {"IPA": "", "X-SAMPA": "", "lang": ""} format.

Return type:

list of dict

parse_pronunciation(wiki_text)[source]

Parse pronunciation section in wiki text.

Parse IPA text from pronunciation section and convert to X-SAMPA.

Parameters:wiki_text (string) – String of pronunciation section in wiki text.
Returns:List of extracted IPA text in {"IPA": "", "X-SAMPA": "", "lang": ""} format.
Return type:list of dict

Utilities

IPA and X-SAMPA related variables and functions. Modified from https://en.wiktionary.org/wiki/Module:IPA Lua module partially.

IPA.IPA.IPA_to_CMUBET(text)[source]

Convert IPA to CMUBET for US English.

Use IPA and symbol set used in Wiktionary and CMUBET symbol set used in CMUDict.

Parameters:text (string) – String of IPA text parsed from Wiktionary.
Returns:Converted CMUBET text.
Return type:string
IPA.IPA.IPA_to_XSAMPA(text)[source]

Convert IPA to X-SAMPA.

Use IPA and X-SAMPA symbol sets used in Wiktionary.

Parameters:text (string) – String of IPA text parsed from Wiktionary.
Returns:Converted X-SAMPA text.
Return type:string

Notes

  • Use _j for palatalized instead of '
  • Use = for syllabic instead of _=
  • Use ~ for nasalization instead of _~
  • Please refer to IPA <-> X-SAMPA Symbol Set for more details.

Examples

>>> IPA_text = "/t͡ʃeɪnd͡ʒ/" # en: [[change]]
>>> XSAMPA_text = IPA_to_XSAMPA(IPA_text)
>>> XSAMPA_text
"/t__SeInd__Z/"

Convert spelling text in {{*-IPA}} to IPA pronunciation.

Most are modified from Wiktionary Lua Module.

IPA.fr_pron.to_IPA(text, pos='')[source]

Generates French IPA from spelling.

Implements template {{fr-IPA}}.

Parameters:
  • text (string) – String of fr-IPA text parsed in {{fr-IPA}} from Wiktionary.
  • pos (string) – String of |pos= parameter parsed in {{fr-IPA}}.
Returns:

Converted French IPA.

Return type:

string

Notes

Examples

>>> fr_text = "hæmorrhagie" # fr: [[hæmorrhagie]]
>>> fr_IPA = fr_pron.to_IPA(fr_text)
>>> fr_IPA
"e.mɔ.ʁa.ʒi"
IPA.ru_pron.to_IPA(text, adj='', gem='', bracket='', pos='')[source]

Generates Russian IPA from spelling.

Implements template {{ru-IPA}}.

Parameters:
  • text (string) – String of ru-IPA text parsed in {{ru-IPA}} from Wiktionary.
  • adj (string) – String of |noadj= parameter parsed in {{ru-IPA}}.
  • gem (string) – String of |gem= parameter parsed in {{ru-IPA}}.
  • bracket (string) – String of |bracket= parameter parsed in {{ru-IPA}}.
  • pos (string) – String of |pos= parameter parsed in {{ru-IPA}}.
Returns:

Converted Russian IPA.

Return type:

string

Notes

Examples

>>> ru_text = "счастли́вый" # ru: [[счастли́вый]]
>>> ru_IPA = ru_pron.to_IPA(ru_text)
>>> ru_IPA
"ɕːɪs⁽ʲ⁾ˈlʲivɨj"
IPA.hi_pron.to_IPA(text)[source]

Generates Hindi IPA from spelling.

Implements template {{hi-IPA}}.

Parameters:text (string) – String of hi-IPA text parsed in {{hi-IPA}} from Wiktionary.
Returns:Converted Hindi IPA.
Return type:string

Notes

Examples

>>> hi_text = "मैं" # hi: [[मैं]]
>>> hi_IPA = hi_pron.to_IPA(hi_text)
>>> hi_IPA
"mɛ̃ː"
IPA.es_pron.to_IPA(word, LatinAmerica=False, phonetic=True)[source]

Generates Spanish IPA from spelling.

Implements template {{es-IPA}}.

Parameters:
  • word (string) – String of es-IPA text parsed in {{es-IPA}} from Wiktionary.
  • LatinAmerica (bool) – Value of |LatinAmerica= parameter parsed in {{es-IPA}}.
  • phonetic (bool) – Value of |phonetic= parameter parsed in {{es-IPA}}.
Returns:

Converted Spanish IPA.

Return type:

string

Notes

Examples

>>> es_text = "baca" # es: [[baca]]
>>> es_IPA = es_pron.to_IPA(es_text)
>>> es_IPA
"ˈbaka"
IPA.cmn_pron.to_IPA(text, IPA_tone=True)[source]

Generates Mandarin IPA from Pinyin.

Implements |m= parameter for template {{zh-pron}}.

Parameters:
  • text (string) – String of |m= parameter parsed in {{zh-pron}} from Wiktionary.
  • IPA_tone (bool) – Whether add IPA tone in result.
Returns:

Converted Mandarin IPA.

Return type:

string

Notes

Examples

>>> cmn_text = "pīnyīn" # zh: [[拼音]]
>>> cmn_IPA = cmn_pron.to_IPA(cmn_text)
>>> cmn_IPA
"pʰin⁵⁵ in⁵⁵"