pywiktionary API¶

The library provides classes which are usable by third party tools.

Wiktionary Class
Parser Class
Utilities

`Wiktionary` Class ¶

class pywiktionary.Wiktionary(lang=None, XSAMPA=False)[source]¶

Wiktionary class for IPA extraction from XML dump or MediaWiki API.

To extraction IPA for a certain language, specify lang parameter, default is extracting IPA for all available languages.

To convert IPA text to X-SAMPA text, use XSAMPA parameter.

Parameters:	lang (string) – String of language type. XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.

extract_IPA(dump_file)[source]¶

Extraction IPA list from Wiktionary XML dump.

Parameters:	dump_file (string) – Path of Wiktionary XML dump file.
Returns:	List of extracted IPA results in `{"id": "", "title": "", "pronunciation": ""}` format.
Return type:	list

get_entry_pronunciation(wiki_text, title=None)[source]¶

Extraction IPA for entry in Wiktionary XML dump.

Parameters:	wiki_text (string) – String of XML entry wiki text. title (string) – String of wiki entry title.
Returns:	Dict of word’s IPA results. Key: language name; Value: list of IPA text.
Return type:	dict

lookup(word)[source]¶

Look up IPA of word through Wiktionary API.

Parameters:	word (string) – String of a word to be looked up.
Returns:	Dict of word’s IPA results. Key: language name; Value: list of IPA text.
Return type:	dict

set_XSAMPA(XSAMPA)[source]¶

Set X-SAMPA conversion option.

Parameters:	XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.

set_lang(lang)[source]¶

Set language.

Parameters:	lang (string) – String of language name.

set_parser()[source]¶

Set parser for Wiktionary.

Use the Wiktionary lang and XSAMPA parameters.

`Parser` Class ¶

class pywiktionary.Parser(lang=None, XSAMPA=False)[source]¶

Wiktionary parser to extract IPA text from pronunciation section.

To extraction IPA for a certain language, specify lang parameter, default is extracting IPA for all available languages.

To convert IPA text to X-SAMPA text, use XSAMPA parameter.

Parameters:	lang (string) – String of language type. XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.

expand_template(text)[source]¶

Expand IPA Template through Wiktionary API.

Used to expand {{*-IPA}} template in parser and return IPA list.

Parameters:	text (string) – String of template text inside “{{” and “}}”.
Returns:	List of expanded IPA text.
Return type:	list of string

Examples

>>> parser = Parser()
>>> template = "{{la-IPA|eccl=yes|thēsaurus}}"
>>> parser.expand_template(template)
['/tʰeːˈsau̯.rus/', '[tʰeːˈsau̯.rʊs]', '/teˈsau̯.rus/']

parse(wiki_text, title=None)[source]¶

Parse Wiktionary wiki text.

Split Wiktionary wiki text into different langugaes and return parseed IPA result.

Parameters:	wiki_text (string) – String of Wiktionary wiki text, from XML dump or Wiktionary API. title (string) – String of wiki entry title.
Returns:	Dict of parsed IPA results. Key: language name; Value: list of IPA text.
Return type:	dict

parse_detail(wiki_text, depth=3)[source]¶

Parse the section of a certain language in wiki text.

Parse pronunciation section of the certain language recursively.

Parameters:	wiki_text (string) – String of wiki text in a language section. depth (int) – Integer indicated depth of pronunciation section.
Returns:	List of extracted IPA text in `{"IPA": "", "X-SAMPA": "", "lang": ""}` format.
Return type:	list of dict

parse_pronunciation(wiki_text)[source]¶

Parse pronunciation section in wiki text.

Parse IPA text from pronunciation section and convert to X-SAMPA.

Parameters:	wiki_text (string) – String of pronunciation section in wiki text.
Returns:	List of extracted IPA text in `{"IPA": "", "X-SAMPA": "", "lang": ""}` format.
Return type:	list of dict

Utilities ¶

IPA and X-SAMPA related variables and functions. Modified from https://en.wiktionary.org/wiki/Module:IPA Lua module partially.

IPA.IPA.IPA_to_CMUBET(text)[source]¶

Convert IPA to CMUBET for US English.

Use IPA and symbol set used in Wiktionary and CMUBET symbol set used in CMUDict.

Parameters:	text (string) – String of IPA text parsed from Wiktionary.
Returns:	Converted CMUBET text.
Return type:	string

IPA.IPA.IPA_to_XSAMPA(text)[source]¶

Convert IPA to X-SAMPA.

Use IPA and X-SAMPA symbol sets used in Wiktionary.

Parameters:	text (string) – String of IPA text parsed from Wiktionary.
Returns:	Converted X-SAMPA text.
Return type:	string

Notes

Use _j for palatalized instead of '
Use = for syllabic instead of _=
Use ~ for nasalization instead of _~
Please refer to IPA <-> X-SAMPA Symbol Set for more details.

Examples

>>> IPA_text = "/t͡ʃeɪnd͡ʒ/" # en: [[change]]
>>> XSAMPA_text = IPA_to_XSAMPA(IPA_text)
>>> XSAMPA_text
"/t__SeInd__Z/"

Convert spelling text in {{*-IPA}} to IPA pronunciation.

Most are modified from Wiktionary Lua Module.

IPA.fr_pron.to_IPA(text, pos='')[source]¶

Generates French IPA from spelling.

Implements template {{fr-IPA}}.

Parameters:	text (string) – String of fr-IPA text parsed in {{fr-IPA}} from Wiktionary. pos (string) – String of `\|pos=` parameter parsed in {{fr-IPA}}.
Returns:	Converted French IPA.
Return type:	string

Notes

Modified from Wiktioanry fr-pron Lua module partially.
Rewritten from rewritten by Benwing and original by Kc kennylau.
Testcases are modified from Wiktionary fr-pron/testcases.

Examples

>>> fr_text = "hæmorrhagie" # fr: [[hæmorrhagie]]
>>> fr_IPA = fr_pron.to_IPA(fr_text)
>>> fr_IPA
"e.mɔ.ʁa.ʒi"

IPA.ru_pron.to_IPA(text, adj='', gem='', bracket='', pos='')[source]¶

Generates Russian IPA from spelling.

Implements template {{ru-IPA}}.

Parameters:	text (string) – String of ru-IPA text parsed in {{ru-IPA}} from Wiktionary. adj (string) – String of `\|noadj=` parameter parsed in {{ru-IPA}}. gem (string) – String of `\|gem=` parameter parsed in {{ru-IPA}}. bracket (string) – String of `\|bracket=` parameter parsed in {{ru-IPA}}. pos (string) – String of `\|pos=` parameter parsed in {{ru-IPA}}.
Returns:	Converted Russian IPA.
Return type:	string

Notes

Modified from Wiktioanry ru-pron Lua module partially.
Rewritten from Author: Originally Wyang; rewritten by Benwing; additional contributions from Atitarev and a bit from others.
Testcases are modified from Wiktionary ru-pron/testcases.

Examples

>>> ru_text = "счастли́вый" # ru: [[счастли́вый]]
>>> ru_IPA = ru_pron.to_IPA(ru_text)
>>> ru_IPA
"ɕːɪs⁽ʲ⁾ˈlʲivɨj"

IPA.hi_pron.to_IPA(text)[source]¶

Generates Hindi IPA from spelling.

Implements template {{hi-IPA}}.

Parameters:	text (string) – String of hi-IPA text parsed in {{hi-IPA}} from Wiktionary.
Returns:	Converted Hindi IPA.
Return type:	string

Notes

Modified from Wiktioanry hi-IPA Lua module partially.
Testcases are modified from Wiktionary hi-IPA/testcases.

Examples

>>> hi_text = "मैं" # hi: [[मैं]]
>>> hi_IPA = hi_pron.to_IPA(hi_text)
>>> hi_IPA
"mɛ̃ː"

IPA.es_pron.to_IPA(word, LatinAmerica=False, phonetic=True)[source]¶

Generates Spanish IPA from spelling.

Implements template {{es-IPA}}.

Parameters:	word (string) – String of es-IPA text parsed in {{es-IPA}} from Wiktionary. LatinAmerica (bool) – Value of `\|LatinAmerica=` parameter parsed in {{es-IPA}}. phonetic (bool) – Value of `\|phonetic=` parameter parsed in {{es-IPA}}.
Returns:	Converted Spanish IPA.
Return type:	string

Notes

Modified from Wiktioanry es-pronunc Lua module partially.
Testcases are modified from Wiktionary es-pronunc/testcases.

Examples

>>> es_text = "baca" # es: [[baca]]
>>> es_IPA = es_pron.to_IPA(es_text)
>>> es_IPA
"ˈbaka"

IPA.cmn_pron.to_IPA(text, IPA_tone=True)[source]¶

Generates Mandarin IPA from Pinyin.

Implements |m= parameter for template {{zh-pron}}.

Parameters:	text (string) – String of `\|m=` parameter parsed in {{zh-pron}} from Wiktionary. IPA_tone (bool) – Whether add IPA tone in result.
Returns:	Converted Mandarin IPA.
Return type:	string

Notes

Modified from Wiktioanry cmn-pron Lua module partially.

Examples

>>> cmn_text = "pīnyīn" # zh: [[拼音]]
>>> cmn_IPA = cmn_pron.to_IPA(cmn_text)
>>> cmn_IPA
"pʰin⁵⁵ in⁵⁵"

pywiktionary API¶

Wiktionary Class¶

Parser Class¶

Utilities¶

`Wiktionary` Class ¶

`Parser` Class ¶

Utilities ¶