Patents by Inventor Achraf Chalabi

Achraf Chalabi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Resolving out-of-vocabulary words during machine translation

Patent number: 8990066

Abstract: Some implementations provide techniques and arrangements to perform automated translation from a source language to a target language. For example, an out-of-vocabulary word may be identified and a morphological analysis may be performed to determine whether the out-of-vocabulary word reduces to at least one stem. If the out-of-vocabulary word reduces to a stem, the stem may be translated. The translated stem may be inflected if the out-of-vocabulary word is inflected. If the out-of-vocabulary word has any affixes, the affixes may be translated. In some cases, the translated affixes may be reordered before being combined with the inflected and translated stem. If the out-of-vocabulary word is misspelled, the spelling of the out-of-vocabulary word may be corrected before performing the morphological analysis. If the out-of-vocabulary word is a colloquial form of a formal word, the out-of-vocabulary word may be replaced with the formal word before performing the morphological analysis.

Type: Grant

Filed: January 31, 2012

Date of Patent: March 24, 2015

Assignee: Microsoft Corporation

Inventors: Achraf Chalabi, Ahmed Said Morsy, Hany Awadalla, Mohamed El-Sharqwi, Sayed Hassan
Syntax-based augmentation of statistical machine translation phrase tables

Patent number: 8874433

Abstract: Machine translation phrase table augmentation embodiments are described that employ an automatic syntax-based scheme to produce additional phrase pairs and insert them into a phrase table. One general process implementing this augmentation involves inputting one or more syntactic transfer patterns, and for each pattern synthesizing phrases in a source language of the type associated with the pattern using a source language lexicon. Phrases, such as those not found in a monolingual corpus of the source language, are eliminated from the synthesized phrases. Each of the remaining synthesized phrases is then translated into the target language using the syntactic transfer pattern, a bilingual source-to-target language dictionary, and a morphological synthesizer. Those translated phrases not found in a monolingual corpus of the target language are then eliminated. Phrase pairs made up of a remaining translated phrase and its corresponding source language phrase are then added to the phrase table being augmented.

Type: Grant

Filed: May 20, 2011

Date of Patent: October 28, 2014

Assignee: Microsoft Corporation

Inventors: Achraf Chalabi, Waleed Ammar, Mostafa Ashour
Universal text input

Patent number: 8738356

Abstract: The universal text input technique described herein addresses the difficulties of typing text in various languages and scripts, and offers a unified solution, which combines character conversion, next word prediction, spelling correction and automatic script switching to make it extremely simple to type any language from any device. The technique provides a rich and seamless input experience in any language through a universal IME (input method editor). It allows a user to type in any script for any language using a regular qwerty keyboard via phonetic input and at the same time allows for auto-completion and spelling correction of words and phrases while typing. The technique also provides a modeless input that automatically turns on and off an input mode that changes between different types of script.

Type: Grant

Filed: May 18, 2011

Date of Patent: May 27, 2014

Assignee: Microsoft Corp.

Inventors: Hisami Suzuki, Vikram Dendi, Christopher Brian Quirk, Pallavi Choudhury, Jianfeng Gao, Achraf Chalabi
Transliterating semitic languages including diacritics

Patent number: 8612206

Abstract: The present disclosure describes a system and method of transliterating Semitic languages with support for diacritics. An input module receives and pre-processes Romanized character and forwards the pre-processed Romanized characters to a transliteration engine. The transliteration engine selects candidate transliteration rules, applies the rules, and scores and ranks the results for output. To optimize search for candidate transliteration rules, the transliteration engine may apply word-stemming strategies to process inflections indicated by affixes. The present disclosure further describes optimizations as pre-processing emphasis text, caching, dynamic transliteration rule pruning, and buffering/throttling input. The system and methods are suitable for multiple applications including but not limited to web applications, windows applications, client-server applications and input method editors such as those via Microsoft Text Services Framework TSF™.

Type: Grant

Filed: December 8, 2009

Date of Patent: December 17, 2013

Assignee: Microsoft Corporation

Inventors: Achraf Chalabi, Hany Grees, Mostafa Ashour, Roaa Mohammed
RESOLVING OUT-OF-VOCABULARY WORDS DURING MACHINE TRANSLATION

Publication number: 20130197896

Abstract: Some implementations provide techniques and arrangements to perform automated translation from a source language to a target language. For example, an out-of-vocabulary word may be identified and a morphological analysis may be performed to determine whether the out-of-vocabulary word reduces to at least one stem. If the out-of-vocabulary word reduces to a stem, the stem may be translated. The translated stem may be inflected if the out-of-vocabulary word is inflected. If the out-of-vocabulary word has any affixes, the affixes may be translated. In some cases, the translated affixes may be reordered before being combined with the inflected and translated stem. If the out-of-vocabulary word is misspelled, the spelling of the out-of-vocabulary word may be corrected before performing the morphological analysis. If the out-of-vocabulary word is a colloquial form of a formal word, the out-of-vocabulary word may be replaced with the formal word before performing the morphological analysis.

Type: Application

Filed: January 31, 2012

Publication date: August 1, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Achraf Chalabi, Ahmed Said Morsy, Hany Awadalla, Mohamed El-Sharqwi, Sayed Hassan
UNIVERSAL TEXT INPUT

Publication number: 20120296627

Abstract: The universal text input technique described herein addresses the difficulties of typing text in various languages and scripts, and offers a unified solution, which combines character conversion, next word prediction, spelling correction and automatic script switching to make it extremely simple to type any language from any device. The technique provides a rich and seamless input experience in any language through a universal IME (input method editor). It allows a user to type in any script for any language using a regular qwerty keyboard via phonetic input and at the same time allows for auto-completion and spelling correction of words and phrases while typing. The technique also provides a modeless input that automatically turns on and off an input mode that changes between different types of script.

Type: Application

Filed: May 18, 2011

Publication date: November 22, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Hisami Suzuki, Vikram Dendi, Christopher Brian Quirk, Pallavi Choudhury, Jianfeng Gao, Achraf Chalabi
SYNTAX-BASED AUGMENTATION OF STATISTICAL MACHINE TRANSLATION PHRASE TABLES

Publication number: 20120296633

Abstract: Machine translation phrase table augmentation embodiments are described that employ an automatic syntax-based scheme to produce additional phrase pairs and insert them into a phrase table. One general process implementing this augmentation involves inputting one or more syntactic transfer patterns, and for each pattern synthesizing phrases in a source language of the type associated with the pattern using a source language lexicon. Phrases, such as those not found in a monolingual corpus of the source language, are eliminated from the synthesized phrases. Each of the remaining synthesized phrases is then translated into the target language using the syntactic transfer pattern, a bilingual source-to-target language dictionary, and a morphological synthesizer. Those translated phrases not found in a monolingual corpus of the target language are then eliminated. Phrase pairs made up of a remaining translated phrase and its corresponding source language phrase are then added to the phrase table being augmented.

Type: Application

Filed: May 20, 2011

Publication date: November 22, 2012

Applicant: Microsoft Corporation

Inventors: Achraf Chalabi, Waleed Ammar, Mostafa Ashour
TRANSLITERATING SEMITIC LANGUAGES INCLUDING DIACRITICS

Publication number: 20110137635

Abstract: The present disclosure describes a system and method of transliterating Semitic languages with support for diacritics. An input module receives and pre-processes Romanized character and forwards the pre-processed Romanized characters to a transliteration engine. The transliteration engine selects candidate transliteration rules, applies the rules, and scores and ranks the results for output. To optimize search for candidate transliteration rules, the transliteration engine may apply word-stemming strategies to process inflections indicated by affixes. The present disclosure further describes optimizations as pre-processing emphasis text, caching, dynamic transliteration rule pruning, and buffering/throttling input. The system and methods are suitable for multiple applications including but not limited to web applications, windows applications, client-server applications and input method editors such as those via Microsoft Text Services Framework TSF™.

Type: Application

Filed: December 8, 2009

Publication date: June 9, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Achraf Chalabi, Hany Grees, Mostafa Ashour, Roaa Mohammed
Method and system for theme-based word sense ambiguity reduction

Patent number: 7184948

Abstract: Word sense ambiguity, for “thematic” words in a sentence, is achieved based on thematic prediction. The senses of “thematic” words are disambiguated in a sentence by determining and weighting possible themes for that sentence. Possible themes are determined for that sentence based on thematic information associated with the different senses of each word in the sentence. A highly deterministic thematic-based word sense disambiguation method is used to preprocess the sentence prior to further syntactic and semantic analysis, thereby enhancing accuracy and decreasing the demand for computational resources (memory and CPU) by reducing input ambiguities.

Type: Grant

Filed: June 15, 2001

Date of Patent: February 27, 2007

Assignee: Sakhr Software Company

Inventor: Achraf Chalabi
Method and system for theme-based word sense ambiguity reduction

Publication number: 20030028367

Abstract: Word sense ambiguity, for “thematic” words in a sentence, is achieved based on thematic prediction. The senses of “thematic” words are disambiguated in a sentence by determining and weighting possible themes for that sentence. Possible themes are determined for that sentence based on thematic information associated with the different senses of each word in the sentence. A highly deterministic thematic-based word sense disambiguation method is used to preprocess the sentence prior to further syntactic and semantic analysis, thereby enhancing accuracy and decreasing the demand for computational resources (memory and CPU) by reducing input ambiguities.

Type: Application

Filed: June 15, 2001

Publication date: February 6, 2003

Inventor: Achraf Chalabi