Patents by Inventor Hongtao Dai
Hongtao Dai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12153624Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.Type: GrantFiled: April 4, 2022Date of Patent: November 26, 2024Assignee: OPEN TEXT CORPORATIONInventors: Chao Chen, Kunwu Huang, Hongtao Dai, Jingjing Liu
-
Patent number: 11763102Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a multi-language text. According to embodiments of the present disclosure, the multi-language text including contents in a plurality of languages may be encoded with a Unicode. The method further comprises splitting the multi-language text into a plurality of parts based on the Unicode of the multi-language text, contents of the plurality of parts having different languages. In addition, the multi-language text may also be processed based on the plurality of parts.Type: GrantFiled: February 26, 2021Date of Patent: September 19, 2023Assignee: EMC IP Holding Company, LLCInventors: Kun Wu Huang, Winston Lei Zhang, Chao Chen, Jingjing Liu, Duke Hongtao Dai
-
Patent number: 11500943Abstract: A method for servicing document search requests. The method includes receiving, by a document management service, a document search query from a requesting user, and injecting, into the document search query, a user access vector. The user access vector specifies, for the requesting user, access control lists that are associated with the requesting user. The method further includes identifying, in a document repository, documents that match the document search query with the injected user access vector. A matching document requires a match of terms in the search query with terms in the matching document, and a match of at least one access control list specified in the matching document and at least one of the access control lists specified in the user access vector.Type: GrantFiled: November 15, 2019Date of Patent: November 15, 2022Assignee: EMC IP HOLDING COMPANY LLCInventors: Chao Chen, Jingjing Liu, Lei Zhang, Kunwu Huang, Hongtao Dai, Ying Teng
-
Publication number: 20220222292Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.Type: ApplicationFiled: April 4, 2022Publication date: July 14, 2022Inventors: Chao Chen, Kunwu Huang, Hongtao Dai, Jingjing Liu
-
Patent number: 11321384Abstract: Ideogram character analysis includes partitioning an original ideogram character into strokes, and mapping each stroke to a corresponding stroke identifier (id) to create an original stroke id sequence that includes stroke identifiers. A candidate ideogram character that has a candidate stroke id sequence within a threshold distance to the original stroke id sequence is selected. One or more embodiments may create a new phrase by replacing the original ideogram character with the candidate ideogram character in a search phrase. One or more embodiments perform a search using the search phrase and the new phrase to obtain a result, and present the result. One or more embodiments may replace an original ideogram character in a character recognized document with the candidate ideogram character and store the character recognized document.Type: GrantFiled: September 30, 2015Date of Patent: May 3, 2022Assignee: OPEN TEXT CORPORATIONInventors: Chao Chen, Kunwu Huang, Hongtao Dai, Jingjing Liu
-
Patent number: 11256691Abstract: In general, in one aspect, the invention relates to a method for servicing requests. The method includes receiving, from a client system, a request comprising a query, where the query includes a first plurality of terms. The method further includes generating, using a thesaurus library, a related query including a second plurality of terms, where at least one term in the second plurality of terms is present in the first plurality of terms. The method further includes issuing the query to a content repository to obtain a first result, issuing the related query to the content repository to obtain a second result, processing the first result and the second result to generate a final result, and providing the final result to the client system.Type: GrantFiled: June 29, 2016Date of Patent: February 22, 2022Assignee: EMC CorporationInventors: Kunwu Huang, Lei Zhang, Chao Chen, Jingjing Liu, Hongtao Dai, Ying Teng
-
Patent number: 11068536Abstract: Embodiments of the present disclosure relate to a method and apparatus for managing a document index. The method comprises determining an independently updatable field in a plurality of documents, the independently updatable field comprising at least one item. The method further comprises creating an index for an item in the independently updatable field, the index containing an identifier of a document comprising the item, the document being included in the plurality of documents. Furthermore, the method further comprises storing the identifier of the document in blocks such that the index is updatable without modifying the identifier of the document.Type: GrantFiled: June 22, 2017Date of Patent: July 20, 2021Assignee: EMC IP Holding Company LLCInventors: Kun Wu Huang, Winston Lei Zhang, Chao Chen, Jingjing Liu, Duke Hongtao Dai
-
Patent number: 11048763Abstract: Techniques for searching a character string involve: determining a first set of documents including a first token in the character string, and a second set of documents including a second token in the character string; and generating a third set of documents based on the first and second sets of documents, in the third set of documents: i) a document being included in the first and second sets of documents, and ii) a distance between the first and second tokens in the document being equal to a distance between the first and second tokens in the character string.Type: GrantFiled: December 31, 2019Date of Patent: June 29, 2021Assignee: EMC IP Holding Company LLCInventors: Duke Hongtao Dai, Winston Lei Zhang, Chao Chen, Kun Wu Huang, Jingjing Liu
-
Publication number: 20210182506Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a multi-language text. According to embodiments of the present disclosure, the multi-language text including contents in a plurality of languages may be encoded with a Unicode. The method further comprises splitting the multi-language text into a plurality of parts based on the Unicode of the multi-language text, contents of the plurality of parts having different languages. In addition, the multi-language text may also be processed based on the plurality of parts.Type: ApplicationFiled: February 26, 2021Publication date: June 17, 2021Inventors: Kun Wu Huang, Winston Lei Zhang, Chao Chen, Jingjing Liu, Duke Hongtao Dai
-
Patent number: 10936829Abstract: Embodiments of the present disclosure provide a method and apparatus for processing a multi-language text. According to embodiments of the present disclosure, the multi-language text including contents in a plurality of languages may be encoded with a Unicode. The method further comprises splitting the multi-language text into a plurality of parts based on the Unicode of the multi-language text, contents of the plurality of parts having different languages. In addition, the multi-language text may also be processed based on the plurality of parts.Type: GrantFiled: June 21, 2017Date of Patent: March 2, 2021Assignee: EMC IP Holding Company LLCInventors: Kun Wu Huang, Winston Lei Zhang, Chao Chen, Jingjing Liu, Duke Hongtao Dai
-
Patent number: 10860590Abstract: The present disclosure provides method and apparatus of information processing. The method comprises: in response to a request of a first user for first information, searching a database to obtain second information; determining a first relevance between a second user associated with the second information and the first user; determining a second relevance between the second information and the first information based on the first relevance; and presenting the second information to the first user based at least in part on the second relevance.Type: GrantFiled: April 17, 2018Date of Patent: December 8, 2020Assignee: EMC IP Holding Corporation LLCInventors: Duke Hongtao Dai, Winston Lei Zhang, Kun Wu (Sheperd) Huang, Charlie Chao Chen, Jingjing Liu
-
Patent number: 10762139Abstract: A method for managing a document search index. The method includes obtaining indexing terms for documents in a document repository to generate search index fragments, storing the search index fragments in a document search index, and constructing a hierarchical structure from a first set of stored search index fragments. The selected first set of search index fragments is arranged by size, in the hierarchical structure. The method further includes selecting, based on a minimal size, a second set of stored search index fragments from the hierarchical structure, merging the second set of stored search index fragments to obtain a larger search index fragment, storing the larger search index fragment in the document search index, and serving at least one search request using the larger search index fragment.Type: GrantFiled: September 29, 2016Date of Patent: September 1, 2020Assignee: EMC IP Holding Company LLCInventors: Hongtao Dai, Lei Zhang, Chao Chen, Kunwu Huang, Jingjing Liu, Ying Teng
-
Patent number: 10713305Abstract: A method for document search in a structured document repository. The method includes obtaining a document search query from a client, obtaining location constraints for documents to be identified in a structured document repository based on the document search query, identifying, in a document search index associated with the structured document repository, a document that matches the search query and the location constraints, and providing information associated with the identified document to the client.Type: GrantFiled: September 29, 2016Date of Patent: July 14, 2020Assignee: EMC IP Holding Company LLCInventors: Hongtao Dai, Lei Zhang, Chao Chen, Kunwu Huang, Jingjing Liu, Ying Teng
-
Patent number: 10691757Abstract: A method for servicing document search requests. The method includes receiving, by a document management service, a document search query from a requesting user, identifying, in a document repository, by the document management service, a document that matches the search query, and obtaining a permission level by the document management service, from an access control cache, based on a combination of the requesting user and an access control list required by the document. The access control cache is located on the document management service, and the access control cache is populated using content in an access control repository located on a repository server, separate from the document management service. The method further includes making a determination that the permission level is sufficient and based on the determination, returning the document to the requesting user, as a search result.Type: GrantFiled: September 29, 2016Date of Patent: June 23, 2020Assignee: EMC IP Holding Company LLCInventors: Chao Chen, Jingjing Liu, Lei Zhang, Kunwu Huang, Hongtao Dai, Ying Teng
-
Patent number: 10671652Abstract: Embodiments of the present disclosure generally relate to a method and device for creating an index. For example, the embodiments of the present disclosure propose a method for creating an index, comprising: dividing a document into a plurality of regions; determining the number of times that a token appears in the plurality of regions, the token including at least one character in the document; assigning respective weights to the plurality of regions; and creating an inverted document linked list directed to the token based on the number of times that the token appears in the plurality of regions and respective weights of the plurality of regions. In addition, the embodiments of the present disclosure propose a corresponding device and computer program product for creating an index.Type: GrantFiled: December 19, 2017Date of Patent: June 2, 2020Assignee: EMC IP Holding Company LLCInventors: Winston Lei Zhang, Charlie Chen, Kun Wu (Sheperd) Huang, Jingjing Liu, Duke Hongtao Dai
-
Publication number: 20200133981Abstract: Techniques for searching a character string involve: determining a first set of documents including a first token in the character string, and a second set of documents including a second token in the character string; and generating a third set of documents based on the first and second sets of documents, in the third set of documents: i) a document being included in the first and second sets of documents, and ii) a distance between the first and second tokens in the document being equal to a distance between the first and second tokens in the character string.Type: ApplicationFiled: December 31, 2019Publication date: April 30, 2020Inventors: Duke Hongtao Dai, Winston Lei Zhang, Chao Chen, Kun Wu Huang, Jingjing Liu
-
Patent number: 10606902Abstract: A method for servicing document search requests. The method includes receiving, by a document management service, a document search query from a requesting user, and injecting, into the document search query, a user access vector. The user access vector specifies, for the requesting user, access control lists that are associated with the requesting user. The method further includes identifying, in a document repository, documents that match the document search query with the injected user access vector. A matching document requires a match of terms in the search query with terms in the matching document, and a match of at least one access control list specified in the matching document and at least one of the access control lists specified in the user access vector.Type: GrantFiled: September 29, 2016Date of Patent: March 31, 2020Assignee: EMC IP Holding Company LLCInventors: Chao Chen, Jingjing Liu, Lei Zhang, Kunwu Huang, Hongtao Dai, Ying Teng
-
Publication number: 20200081925Abstract: A method for servicing document search requests. The method includes receiving, by a document management service, a document search query from a requesting user, and injecting, into the document search query, a user access vector. The user access vector specifies, for the requesting user, access control lists that are associated with the requesting user. The method further includes identifying, in a document repository, documents that match the document search query with the injected user access vector. A matching document requires a match of terms in the search query with terms in the matching document, and a match of at least one access control list specified in the matching document and at least one of the access control lists specified in the user access vector.Type: ApplicationFiled: November 15, 2019Publication date: March 12, 2020Inventors: Chao Chen, Jingjing Liu, Lei Zhang, Kunwu Huang, Hongtao Dai, Ying Teng
-
Patent number: 10546024Abstract: Techniques for searching a character string involve: determining a first set of documents including a first token in the character string, and a second set of documents including a second token in the character string; and generating a third set of documents based on the first and second sets of documents, in the third set of documents: i) a document being included in the first and second sets of documents, and ii) a distance between the first and second tokens in the document being equal to a distance between the first and second tokens in the character string.Type: GrantFiled: March 20, 2017Date of Patent: January 28, 2020Assignee: EMC IP Holding Company LLCInventors: Duke Hongtao Dai, Winston Lei Zhang, Chao Chen, Kun Wu Huang, Jingjing Liu
-
Patent number: 10489466Abstract: A method for document similarity analysis. The method includes obtaining a document to be archived, obtaining indexing terms for the document to be archived, and identifying document categories similar to the document to be archived, based on indexing terms and corresponding term frequencies. The method further includes making a determination that a weakly similar document category exists and based on the determination: generating a new document category and registering the document in the new document category, and registering the document in the weakly similar document category.Type: GrantFiled: September 29, 2016Date of Patent: November 26, 2019Assignee: EMC IP Holding Company LLCInventors: Lei Zhang, Chao Chen, Kunwu Huang, Hongtao Dai, Jingjing Liu, Ying Teng