Patents by Inventor Andrius Kuksta

Andrius Kuksta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12339914
    Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Grant
    Filed: July 1, 2022
    Date of Patent: June 24, 2025
    Assignee: Oxylabs, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta
  • Patent number: 12287837
    Abstract: Disclosed herein are system, method, and computer program product embodiments for improving web scraping technology by using machine learning to generate parsing expressions. A system receives a request to identify an element in a first document at a target web page. The system downloads and modifies the first document by adding an index value as an attribute to a tag for the element. A query is submitted to a large language model (LLM), including the modified first document, a description of the element, and a request asking the LLM to identify the element based on the description. The system obtains, from the LLM, the index value assigned to the element. The system generates an expression defining a path to the element in the first document using the index returned by the large language model. The system downloads a second document, and parses data of a second element using the expression.
    Type: Grant
    Filed: September 10, 2024
    Date of Patent: April 29, 2025
    Assignee: Oxylabs, UAB
    Inventors: Karolis Kluonaitis, Martynas Juravicius, Andrius Kuksta
  • Patent number: 12086209
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: September 10, 2024
    Assignee: OXYLABS, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta
  • Publication number: 20240241923
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: March 28, 2024
    Publication date: July 18, 2024
    Applicant: OXYLABS, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Publication number: 20230259586
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: April 24, 2023
    Publication date: August 17, 2023
    Applicant: OXYLABS, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Publication number: 20230214588
    Abstract: Systems and methods to intelligently adapt parsing rules according to the layout changes occurring in multiple targets are disclosed. Specifically, the disclosure provides a solution to detect the layout changes in a target domain and to update parsing templates or parsing rules. The disclosed embodiments in one aspect describe methods and systems to receive and store parsing templates or parsing rules and monitoring tables or a list of related URLs within an internal storage facility. Methods and systems to scrape and parse data by following parsing rules or using parsing templates. The methods and systems describe the manner in which the parsed data and the actual data are analyzed to detect any changes in the layout of the target domain(s). The methods and systems give details on how to decide whether to update parsing rules or parsing templates depending on the layout changes in the target domains.
    Type: Application
    Filed: January 6, 2022
    Publication date: July 6, 2023
    Applicant: coretech lt, UAB
    Inventors: Andrius KUKSTA, Martynas JURAVICIUS
  • Patent number: 11669588
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Grant
    Filed: August 30, 2022
    Date of Patent: June 6, 2023
    Assignee: Oxylabs, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta
  • Publication number: 20230018387
    Abstract: The current application discloses processor-implemented methods and systems of processing unclassified HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving unclassified HTML documents, isolating elements relevant for category identification, deriving classification attributes from the isolated elements, and applying a Machine Learning-based classification model resulting in HTML data items classified and labelled accordingly. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 19, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Andrius KUKSTA, Jurijus GORSKOVAS, Martynas JURAVICIUS
  • Publication number: 20220414397
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: August 30, 2022
    Publication date: December 29, 2022
    Applicant: METACLUSTER LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Publication number: 20220414166
    Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Application
    Filed: July 1, 2022
    Publication date: December 29, 2022
    Applicant: Metacluster LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Publication number: 20220318564
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: March 30, 2021
    Publication date: October 6, 2022
    Applicant: METACLUSTER LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Patent number: 11461588
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: October 4, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta
  • Patent number: 11379542
    Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: July 5, 2022
    Assignee: Metacluster LT, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta