Patents by Inventor Andrius Kuksta
Andrius Kuksta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12339914Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.Type: GrantFiled: July 1, 2022Date of Patent: June 24, 2025Assignee: Oxylabs, UABInventors: Martynas Juravicius, Andrius Kuksta
-
Patent number: 12287837Abstract: Disclosed herein are system, method, and computer program product embodiments for improving web scraping technology by using machine learning to generate parsing expressions. A system receives a request to identify an element in a first document at a target web page. The system downloads and modifies the first document by adding an index value as an attribute to a tag for the element. A query is submitted to a large language model (LLM), including the modified first document, a description of the element, and a request asking the LLM to identify the element based on the description. The system obtains, from the LLM, the index value assigned to the element. The system generates an expression defining a path to the element in the first document using the index returned by the large language model. The system downloads a second document, and parses data of a second element using the expression.Type: GrantFiled: September 10, 2024Date of Patent: April 29, 2025Assignee: Oxylabs, UABInventors: Karolis Kluonaitis, Martynas Juravicius, Andrius Kuksta
-
Patent number: 12086209Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: GrantFiled: April 24, 2023Date of Patent: September 10, 2024Assignee: OXYLABS, UABInventors: Martynas Juravicius, Andrius Kuksta
-
Publication number: 20240241923Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: ApplicationFiled: March 28, 2024Publication date: July 18, 2024Applicant: OXYLABS, UABInventors: Martynas JURAVICIUS, Andrius KUKSTA
-
Publication number: 20230259586Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: ApplicationFiled: April 24, 2023Publication date: August 17, 2023Applicant: OXYLABS, UABInventors: Martynas JURAVICIUS, Andrius KUKSTA
-
Publication number: 20230214588Abstract: Systems and methods to intelligently adapt parsing rules according to the layout changes occurring in multiple targets are disclosed. Specifically, the disclosure provides a solution to detect the layout changes in a target domain and to update parsing templates or parsing rules. The disclosed embodiments in one aspect describe methods and systems to receive and store parsing templates or parsing rules and monitoring tables or a list of related URLs within an internal storage facility. Methods and systems to scrape and parse data by following parsing rules or using parsing templates. The methods and systems describe the manner in which the parsed data and the actual data are analyzed to detect any changes in the layout of the target domain(s). The methods and systems give details on how to decide whether to update parsing rules or parsing templates depending on the layout changes in the target domains.Type: ApplicationFiled: January 6, 2022Publication date: July 6, 2023Applicant: coretech lt, UABInventors: Andrius KUKSTA, Martynas JURAVICIUS
-
Patent number: 11669588Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: GrantFiled: August 30, 2022Date of Patent: June 6, 2023Assignee: Oxylabs, UABInventors: Martynas Juravicius, Andrius Kuksta
-
Publication number: 20230018387Abstract: The current application discloses processor-implemented methods and systems of processing unclassified HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving unclassified HTML documents, isolating elements relevant for category identification, deriving classification attributes from the isolated elements, and applying a Machine Learning-based classification model resulting in HTML data items classified and labelled accordingly. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.Type: ApplicationFiled: July 6, 2021Publication date: January 19, 2023Applicant: Metacluster LT, UABInventors: Andrius KUKSTA, Jurijus GORSKOVAS, Martynas JURAVICIUS
-
Publication number: 20220414397Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: ApplicationFiled: August 30, 2022Publication date: December 29, 2022Applicant: METACLUSTER LT, UABInventors: Martynas JURAVICIUS, Andrius KUKSTA
-
Publication number: 20220414166Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.Type: ApplicationFiled: July 1, 2022Publication date: December 29, 2022Applicant: Metacluster LT, UABInventors: Martynas JURAVICIUS, Andrius KUKSTA
-
Publication number: 20220318564Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: ApplicationFiled: March 30, 2021Publication date: October 6, 2022Applicant: METACLUSTER LT, UABInventors: Martynas JURAVICIUS, Andrius KUKSTA
-
Patent number: 11461588Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.Type: GrantFiled: March 30, 2021Date of Patent: October 4, 2022Assignee: METACLUSTER LT, UABInventors: Martynas Juravicius, Andrius Kuksta
-
Patent number: 11379542Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.Type: GrantFiled: June 25, 2021Date of Patent: July 5, 2022Assignee: Metacluster LT, UABInventors: Martynas Juravicius, Andrius Kuksta