Patents by Inventor Hai Cheng Wang
Hai Cheng Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250131759Abstract: In an approach, a processor performs document layout analysis on a document generating a plurality of textual regions; extracts characteristics from each of the plurality of textual regions and associates the respective characteristics to the respective textual region as metadata; classifies each of the plurality of textual regions as an optical character recognition (OCR) region, non-OCR valuable region, or non-OCR non-valuable region using a classifier; performs OCR on each OCR region generating an OCR output; identifies associated constant OCR data from a constant OCR data repository for each non-OCR valuable region; merges the associated constant OCR data with the OCR output generating a complete OCR data for the received document; performs data extraction on the complete OCR data to identify data fields and key-value pairs generating extracted data; and determines whether the extracted data is valid based on a set of rules.Type: ApplicationFiled: October 24, 2023Publication date: April 24, 2025Inventors: Jun Hong Zhao, Dong Rui Li, Ang Yi, Jing Zhang, Hai Cheng Wang, Yang Zhong Li
-
Patent number: 12259920Abstract: Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.Type: GrantFiled: September 7, 2023Date of Patent: March 25, 2025Assignee: International Business Machines CorporationInventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Yang Zhong Li, Rajesh M. Desai, Xue Lan Zhang
-
Publication number: 20250086222Abstract: Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.Type: ApplicationFiled: September 7, 2023Publication date: March 13, 2025Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Yang Zhong Li, Rajesh M. Desai, Xue Lan Zhang
-
Patent number: 12056948Abstract: In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.Type: GrantFiled: July 19, 2021Date of Patent: August 6, 2024Assignee: International Business Machines CorporationInventors: Ang Yi, Nazrul Islam, Rajesh M. Desai, Jing Zhang, Dong Rui Li, Xue Mei Deng, Ye Chen, Hai Cheng Wang
-
Publication number: 20240193978Abstract: Computer implemented methods, systems, and computer program products include program code executing on a processor(s) that merges a document comprising multiple pages into a single document image. The program code processes the single document image to identify structural elements and textual content. The program code compares the structural elements of the single document image to other structural elements of a group of document templates stored in a database to identify a subset of the group of documents templates with a threshold number of similarities to the single document image. The program code generates, from the single document image, a graph structure representing the document, where the graph structure comprises visual information and connections related to the structural elements and concepts comprising the textual content. The program code uses the structure to identify a document template that is a closest match to the document.Type: ApplicationFiled: December 13, 2022Publication date: June 13, 2024Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Rajesh M. Desai, Yang Zhong Li, Ye Chen
-
Publication number: 20240046677Abstract: A computer-implemented method for text block segmentation includes determining a first text block segmentation pattern utilized to generate a segmented text block based, at least in part, on a comparison of semantic information associated with the segmented text block and a plurality of predefined types of text block segmentation patterns indicated by a graph; calculating a first degree of confidence in a size of the segmented text block based, at least in part, on comparing semantic entities associated with the segmented text block with semantic entities indicated by leaf nodes stemming from a first non-leaf node included in the graph and representative of the first type of text block segmentation pattern; and determining that the size of the segmented text block is non-optimal based on the calculated degree of confidence in the size of the segmented text block being below a predetermined threshold.Type: ApplicationFiled: July 26, 2022Publication date: February 8, 2024Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Rajesh M. Desai, Yang Zhong Li, Xue Xu
-
Publication number: 20230012784Abstract: In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.Type: ApplicationFiled: July 19, 2021Publication date: January 19, 2023Inventors: Ang Yi, Nazrul Islam, Rajesh M. Desai, Jing Zhang, Dong Rui Li, Xue Mei Deng, Ye Chen, Hai Cheng Wang
-
Patent number: 11514121Abstract: Embodiments of the present disclosure relate to a method, system, and computer program product for webpage customization. In some embodiments, a method is disclosed. According to the method, a webpage to be provided to a user is obtained. The webpage comprises at least a first element having a first set of style attributes. A second element matching the first element is determined from a set of elements customized for the user. The second element has a second set of style attributes. The webpage is customized for the user by applying at least part of the second set of style attributes to the first element. The customized webpage is provided to the user. In other embodiments, a system and a computer program product are disclosed.Type: GrantFiled: August 10, 2020Date of Patent: November 29, 2022Assignee: International Business Machines CorporationInventors: Dong Rui Li, Ang Yi, Hai Cheng Wang, Jun Hong Zhao, Ye Chen, Xiao Jian Lian, Jing Chen
-
Publication number: 20220309072Abstract: A computer transforms content of a composite table into structured data objects. The computer receives a composite table and identifying a data zone characterized by data columns, and a header zone. The computer identifies first header cells arranged coextensive with a single data column and second header cells arranged coextensive with a set of data columns. The computer generates a hierarchical representation of said header cells, based at least in part, on the header cell arrangements. The computer generates a revised table based on the hierarchical representation, with the first header cells identifying a data column and the second header cells identify a first header cell. The computer generates structured data objects representing the zones and being arranged based, at least in part, on the revised table and where the structured data objects are keyed to the first header cells.Type: ApplicationFiled: March 26, 2021Publication date: September 29, 2022Inventors: Xue Lan Zhang, Hai Cheng Wang, Jing Zhang, Jun Hong Zhao, Ang Yi, Dong Rui Li
-
Patent number: 11436249Abstract: A computer transforms content of a composite table into structured data objects. The computer receives a composite table and identifying a data zone characterized by data columns, and a header zone. The computer identifies first header cells arranged coextensive with a single data column and second header cells arranged coextensive with a set of data columns. The computer generates a hierarchical representation of said header cells, based at least in part, on the header cell arrangements. The computer generates a revised table based on the hierarchical representation, with the first header cells identifying a data column and the second header cells identify a first header cell. The computer generates structured data objects representing the zones and being arranged based, at least in part, on the revised table and where the structured data objects are keyed to the first header cells.Type: GrantFiled: March 26, 2021Date of Patent: September 6, 2022Assignee: International Business Machines CorporationInventors: Xue Lan Zhang, Hai Cheng Wang, Jing Zhang, Jun Hong Zhao, Ang Yi, Dong Rui Li
-
Publication number: 20220043870Abstract: Embodiments of the present disclosure relate to a method, system, and computer program product for webpage customization. In some embodiments, a method is disclosed. According to the method, a webpage to be provided to a user is obtained. The webpage comprises at least a first element having a first set of style attributes. A second element matching the first element is determined from a set of elements customized for the user. The second element has a second set of style attributes. The webpage is customized for the user by applying at least part of the second set of style attributes to the first element. The customized webpage is provided to the user. In other embodiments, a system and a computer program product are disclosed.Type: ApplicationFiled: August 10, 2020Publication date: February 10, 2022Inventors: Dong Rui Li, Ang Yi, Hai Cheng Wang, Jun Hong Zhao, Ye Chen, Xiao Jian Lian, Jing Chen