pywiktionary API¶
The library provides classes which are usable by third party tools.
Wiktionary
Class¶
-
class
pywiktionary.
Wiktionary
(lang=None, XSAMPA=False)[source]¶ Wiktionary class for IPA extraction from XML dump or MediaWiki API.
To extraction IPA for a certain language, specify
lang
parameter, default is extracting IPA for all available languages.To convert IPA text to X-SAMPA text, use
XSAMPA
parameter.Parameters: - lang (string) – String of language type.
- XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
-
extract_IPA
(dump_file)[source]¶ Extraction IPA list from Wiktionary XML dump.
Parameters: dump_file (string) – Path of Wiktionary XML dump file. Returns: List of extracted IPA results in {"id": "", "title": "", "pronunciation": ""}
format.Return type: list
-
get_entry_pronunciation
(wiki_text, title=None)[source]¶ Extraction IPA for entry in Wiktionary XML dump.
Parameters: - wiki_text (string) – String of XML entry wiki text.
- title (string) – String of wiki entry title.
Returns: Dict of word’s IPA results. Key: language name; Value: list of IPA text.
Return type: dict
-
lookup
(word)[source]¶ Look up IPA of word through Wiktionary API.
Parameters: word (string) – String of a word to be looked up. Returns: Dict of word’s IPA results. Key: language name; Value: list of IPA text. Return type: dict
Parser
Class¶
-
class
pywiktionary.
Parser
(lang=None, XSAMPA=False)[source]¶ Wiktionary parser to extract IPA text from pronunciation section.
To extraction IPA for a certain language, specify
lang
parameter, default is extracting IPA for all available languages.To convert IPA text to X-SAMPA text, use
XSAMPA
parameter.Parameters: - lang (string) – String of language type.
- XSAMPA (boolean) – Option for IPA to X-SAMPA conversion.
-
expand_template
(text)[source]¶ Expand IPA Template through Wiktionary API.
Used to expand
{{*-IPA}}
template in parser and return IPA list.Parameters: text (string) – String of template text inside “{{” and “}}”. Returns: List of expanded IPA text. Return type: list of string Examples
>>> parser = Parser() >>> template = "{{la-IPA|eccl=yes|thēsaurus}}" >>> parser.expand_template(template) ['/tʰeːˈsau̯.rus/', '[tʰeːˈsau̯.rʊs]', '/teˈsau̯.rus/']
-
parse
(wiki_text, title=None)[source]¶ Parse Wiktionary wiki text.
Split Wiktionary wiki text into different langugaes and return parseed IPA result.
Parameters: - wiki_text (string) – String of Wiktionary wiki text, from XML dump or Wiktionary API.
- title (string) – String of wiki entry title.
Returns: Dict of parsed IPA results. Key: language name; Value: list of IPA text.
Return type: dict
-
parse_detail
(wiki_text, depth=3)[source]¶ Parse the section of a certain language in wiki text.
Parse pronunciation section of the certain language recursively.
Parameters: - wiki_text (string) – String of wiki text in a language section.
- depth (int) – Integer indicated depth of pronunciation section.
Returns: List of extracted IPA text in
{"IPA": "", "X-SAMPA": "", "lang": ""}
format.Return type: list of dict
-
parse_pronunciation
(wiki_text)[source]¶ Parse pronunciation section in wiki text.
Parse IPA text from pronunciation section and convert to X-SAMPA.
Parameters: wiki_text (string) – String of pronunciation section in wiki text. Returns: List of extracted IPA text in {"IPA": "", "X-SAMPA": "", "lang": ""}
format.Return type: list of dict
Utilities¶
IPA and X-SAMPA related variables and functions. Modified from https://en.wiktionary.org/wiki/Module:IPA Lua module partially.
-
IPA.IPA.
IPA_to_CMUBET
(text)[source]¶ Convert IPA to CMUBET for US English.
Use IPA and symbol set used in Wiktionary and CMUBET symbol set used in CMUDict.
Parameters: text (string) – String of IPA text parsed from Wiktionary. Returns: Converted CMUBET text. Return type: string
-
IPA.IPA.
IPA_to_XSAMPA
(text)[source]¶ Convert IPA to X-SAMPA.
Use IPA and X-SAMPA symbol sets used in Wiktionary.
Parameters: text (string) – String of IPA text parsed from Wiktionary. Returns: Converted X-SAMPA text. Return type: string Notes
- Use
_j
for palatalized instead of'
- Use
=
for syllabic instead of_=
- Use
~
for nasalization instead of_~
- Please refer to IPA <-> X-SAMPA Symbol Set for more details.
Examples
>>> IPA_text = "/t͡ʃeɪnd͡ʒ/" # en: [[change]] >>> XSAMPA_text = IPA_to_XSAMPA(IPA_text) >>> XSAMPA_text "/t__SeInd__Z/"
- Use
Convert spelling text in {{*-IPA}}
to IPA pronunciation.
Most are modified from Wiktionary Lua Module.
-
IPA.fr_pron.
to_IPA
(text, pos='')[source]¶ Generates French IPA from spelling.
Implements template {{fr-IPA}}.
Parameters: - text (string) – String of fr-IPA text parsed in {{fr-IPA}} from Wiktionary.
- pos (string) – String of
|pos=
parameter parsed in {{fr-IPA}}.
Returns: Converted French IPA.
Return type: string
Notes
- Modified from Wiktioanry fr-pron Lua module partially.
- Rewritten from rewritten by Benwing and original by Kc kennylau.
- Testcases are modified from Wiktionary fr-pron/testcases.
Examples
>>> fr_text = "hæmorrhagie" # fr: [[hæmorrhagie]] >>> fr_IPA = fr_pron.to_IPA(fr_text) >>> fr_IPA "e.mɔ.ʁa.ʒi"
-
IPA.ru_pron.
to_IPA
(text, adj='', gem='', bracket='', pos='')[source]¶ Generates Russian IPA from spelling.
Implements template {{ru-IPA}}.
Parameters: - text (string) – String of ru-IPA text parsed in {{ru-IPA}} from Wiktionary.
- adj (string) – String of
|noadj=
parameter parsed in {{ru-IPA}}. - gem (string) – String of
|gem=
parameter parsed in {{ru-IPA}}. - bracket (string) – String of
|bracket=
parameter parsed in {{ru-IPA}}. - pos (string) – String of
|pos=
parameter parsed in {{ru-IPA}}.
Returns: Converted Russian IPA.
Return type: string
Notes
- Modified from Wiktioanry ru-pron Lua module partially.
- Rewritten from Author: Originally Wyang; rewritten by Benwing; additional contributions from Atitarev and a bit from others.
- Testcases are modified from Wiktionary ru-pron/testcases.
Examples
>>> ru_text = "счастли́вый" # ru: [[счастли́вый]] >>> ru_IPA = ru_pron.to_IPA(ru_text) >>> ru_IPA "ɕːɪs⁽ʲ⁾ˈlʲivɨj"
-
IPA.hi_pron.
to_IPA
(text)[source]¶ Generates Hindi IPA from spelling.
Implements template {{hi-IPA}}.
Parameters: text (string) – String of hi-IPA text parsed in {{hi-IPA}} from Wiktionary. Returns: Converted Hindi IPA. Return type: string Notes
- Modified from Wiktioanry hi-IPA Lua module partially.
- Testcases are modified from Wiktionary hi-IPA/testcases.
Examples
>>> hi_text = "मैं" # hi: [[मैं]] >>> hi_IPA = hi_pron.to_IPA(hi_text) >>> hi_IPA "mɛ̃ː"
-
IPA.es_pron.
to_IPA
(word, LatinAmerica=False, phonetic=True)[source]¶ Generates Spanish IPA from spelling.
Implements template {{es-IPA}}.
Parameters: - word (string) – String of es-IPA text parsed in {{es-IPA}} from Wiktionary.
- LatinAmerica (bool) – Value of
|LatinAmerica=
parameter parsed in {{es-IPA}}. - phonetic (bool) – Value of
|phonetic=
parameter parsed in {{es-IPA}}.
Returns: Converted Spanish IPA.
Return type: string
Notes
- Modified from Wiktioanry es-pronunc Lua module partially.
- Testcases are modified from Wiktionary es-pronunc/testcases.
Examples
>>> es_text = "baca" # es: [[baca]] >>> es_IPA = es_pron.to_IPA(es_text) >>> es_IPA "ˈbaka"
-
IPA.cmn_pron.
to_IPA
(text, IPA_tone=True)[source]¶ Generates Mandarin IPA from Pinyin.
Implements
|m=
parameter for template {{zh-pron}}.Parameters: - text (string) – String of
|m=
parameter parsed in {{zh-pron}} from Wiktionary. - IPA_tone (bool) – Whether add IPA tone in result.
Returns: Converted Mandarin IPA.
Return type: string
Notes
- Modified from Wiktioanry cmn-pron Lua module partially.
Examples
>>> cmn_text = "pīnyīn" # zh: [[拼音]] >>> cmn_IPA = cmn_pron.to_IPA(cmn_text) >>> cmn_IPA "pʰin⁵⁵ in⁵⁵"
- text (string) – String of