Patents by Inventor Hai Cheng Wang

Hai Cheng Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250131759
    Abstract: In an approach, a processor performs document layout analysis on a document generating a plurality of textual regions; extracts characteristics from each of the plurality of textual regions and associates the respective characteristics to the respective textual region as metadata; classifies each of the plurality of textual regions as an optical character recognition (OCR) region, non-OCR valuable region, or non-OCR non-valuable region using a classifier; performs OCR on each OCR region generating an OCR output; identifies associated constant OCR data from a constant OCR data repository for each non-OCR valuable region; merges the associated constant OCR data with the OCR output generating a complete OCR data for the received document; performs data extraction on the complete OCR data to identify data fields and key-value pairs generating extracted data; and determines whether the extracted data is valid based on a set of rules.
    Type: Application
    Filed: October 24, 2023
    Publication date: April 24, 2025
    Inventors: Jun Hong Zhao, Dong Rui Li, Ang Yi, Jing Zhang, Hai Cheng Wang, Yang Zhong Li
  • Patent number: 12259920
    Abstract: Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.
    Type: Grant
    Filed: September 7, 2023
    Date of Patent: March 25, 2025
    Assignee: International Business Machines Corporation
    Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Yang Zhong Li, Rajesh M. Desai, Xue Lan Zhang
  • Publication number: 20250086222
    Abstract: Disclosed embodiments provide techniques for monitoring and evaluating the effectiveness of key value pairs (KVPs) used in a document processing system. In embodiments, KVPs are obtained from multiple extractors of a document processing system. A score is computed for the KVPs by computing an effectiveness metric for each KVP from the multiple KVPs. In response to the computed score being below a predetermined threshold, a model retraining process is performed to generate a new set of KVP extractors, and provide the new set of KVPs to the document processing system.
    Type: Application
    Filed: September 7, 2023
    Publication date: March 13, 2025
    Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Yang Zhong Li, Rajesh M. Desai, Xue Lan Zhang
  • Patent number: 12056948
    Abstract: In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.
    Type: Grant
    Filed: July 19, 2021
    Date of Patent: August 6, 2024
    Assignee: International Business Machines Corporation
    Inventors: Ang Yi, Nazrul Islam, Rajesh M. Desai, Jing Zhang, Dong Rui Li, Xue Mei Deng, Ye Chen, Hai Cheng Wang
  • Publication number: 20240193978
    Abstract: Computer implemented methods, systems, and computer program products include program code executing on a processor(s) that merges a document comprising multiple pages into a single document image. The program code processes the single document image to identify structural elements and textual content. The program code compares the structural elements of the single document image to other structural elements of a group of document templates stored in a database to identify a subset of the group of documents templates with a threshold number of similarities to the single document image. The program code generates, from the single document image, a graph structure representing the document, where the graph structure comprises visual information and connections related to the structural elements and concepts comprising the textual content. The program code uses the structure to identify a document template that is a closest match to the document.
    Type: Application
    Filed: December 13, 2022
    Publication date: June 13, 2024
    Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Rajesh M. Desai, Yang Zhong Li, Ye Chen
  • Publication number: 20240046677
    Abstract: A computer-implemented method for text block segmentation includes determining a first text block segmentation pattern utilized to generate a segmented text block based, at least in part, on a comparison of semantic information associated with the segmented text block and a plurality of predefined types of text block segmentation patterns indicated by a graph; calculating a first degree of confidence in a size of the segmented text block based, at least in part, on comparing semantic entities associated with the segmented text block with semantic entities indicated by leaf nodes stemming from a first non-leaf node included in the graph and representative of the first type of text block segmentation pattern; and determining that the size of the segmented text block is non-optimal based on the calculated degree of confidence in the size of the segmented text block being below a predetermined threshold.
    Type: Application
    Filed: July 26, 2022
    Publication date: February 8, 2024
    Inventors: Ang Yi, Jing Zhang, Hai Cheng Wang, Jun Hong Zhao, Rajesh M. Desai, Yang Zhong Li, Xue Xu
  • Publication number: 20230012784
    Abstract: In an approach, a processor identifies a plurality of text separators in a borderless table, a text separator of the plurality of text separators defining a non-text region between two consecutive text lines in the borderless table. A processor classifies the plurality of text separators into a number of target clusters comprised in a target group based on property information related to the plurality of text separators, the number of target clusters corresponding to a number of separator types. A processor provides indication information to indicate respective separator types of the plurality of text separators based on a result of the classifying.
    Type: Application
    Filed: July 19, 2021
    Publication date: January 19, 2023
    Inventors: Ang Yi, Nazrul Islam, Rajesh M. Desai, Jing Zhang, Dong Rui Li, Xue Mei Deng, Ye Chen, Hai Cheng Wang
  • Patent number: 11514121
    Abstract: Embodiments of the present disclosure relate to a method, system, and computer program product for webpage customization. In some embodiments, a method is disclosed. According to the method, a webpage to be provided to a user is obtained. The webpage comprises at least a first element having a first set of style attributes. A second element matching the first element is determined from a set of elements customized for the user. The second element has a second set of style attributes. The webpage is customized for the user by applying at least part of the second set of style attributes to the first element. The customized webpage is provided to the user. In other embodiments, a system and a computer program product are disclosed.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: November 29, 2022
    Assignee: International Business Machines Corporation
    Inventors: Dong Rui Li, Ang Yi, Hai Cheng Wang, Jun Hong Zhao, Ye Chen, Xiao Jian Lian, Jing Chen
  • Publication number: 20220309072
    Abstract: A computer transforms content of a composite table into structured data objects. The computer receives a composite table and identifying a data zone characterized by data columns, and a header zone. The computer identifies first header cells arranged coextensive with a single data column and second header cells arranged coextensive with a set of data columns. The computer generates a hierarchical representation of said header cells, based at least in part, on the header cell arrangements. The computer generates a revised table based on the hierarchical representation, with the first header cells identifying a data column and the second header cells identify a first header cell. The computer generates structured data objects representing the zones and being arranged based, at least in part, on the revised table and where the structured data objects are keyed to the first header cells.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Inventors: Xue Lan Zhang, Hai Cheng Wang, Jing Zhang, Jun Hong Zhao, Ang Yi, Dong Rui Li
  • Patent number: 11436249
    Abstract: A computer transforms content of a composite table into structured data objects. The computer receives a composite table and identifying a data zone characterized by data columns, and a header zone. The computer identifies first header cells arranged coextensive with a single data column and second header cells arranged coextensive with a set of data columns. The computer generates a hierarchical representation of said header cells, based at least in part, on the header cell arrangements. The computer generates a revised table based on the hierarchical representation, with the first header cells identifying a data column and the second header cells identify a first header cell. The computer generates structured data objects representing the zones and being arranged based, at least in part, on the revised table and where the structured data objects are keyed to the first header cells.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: September 6, 2022
    Assignee: International Business Machines Corporation
    Inventors: Xue Lan Zhang, Hai Cheng Wang, Jing Zhang, Jun Hong Zhao, Ang Yi, Dong Rui Li
  • Publication number: 20220043870
    Abstract: Embodiments of the present disclosure relate to a method, system, and computer program product for webpage customization. In some embodiments, a method is disclosed. According to the method, a webpage to be provided to a user is obtained. The webpage comprises at least a first element having a first set of style attributes. A second element matching the first element is determined from a set of elements customized for the user. The second element has a second set of style attributes. The webpage is customized for the user by applying at least part of the second set of style attributes to the first element. The customized webpage is provided to the user. In other embodiments, a system and a computer program product are disclosed.
    Type: Application
    Filed: August 10, 2020
    Publication date: February 10, 2022
    Inventors: Dong Rui Li, Ang Yi, Hai Cheng Wang, Jun Hong Zhao, Ye Chen, Xiao Jian Lian, Jing Chen