Patents Assigned to metacluster lt, UAB
  • Publication number: 20230099967
    Abstract: Systems and methods to manage and regulate the requests of multiple proxy clients are disclosed. In one aspect, the system and methods disclosed herein aids in configuring proxy server(s) with a rate-limit functionality. Configuration of the rate-limit functionality may be realized by, but not limited to, installing configuration file(s) and/or software application(s) on the proxy server(s). The configuration provides information about the list of restricted and unrestricted domains and their respective request limit specification in a given time frame. Therefore, each time before a proxy server forwards the clients' requests to a target domain, the proxy server checks and ensures that the request count to the particular target domain is well within the limit specified in the request limit specification. Thus, the embodiments described herein aid in preventing the IP addresses of proxy service providers from being blocked or denied from the target websites.
    Type: Application
    Filed: September 29, 2022
    Publication date: March 30, 2023
    Applicant: METACLUSTER LT, UAB
    Inventors: Giedrius STALIORAITIS, Ovidijus BALKAUSKAS
  • Publication number: 20230066328
    Abstract: Systems and methods to intelligently optimize data collection requests are disclosed. In one embodiment, systems are configured to identify and select a complete set of suitable parameters to execute the data collection requests. In another embodiment, systems are configured to identify and select a partial set of suitable parameters to execute the data collection requests. The present embodiments can implement machine learning algorithms to identify and select the suitable parameters according to the nature of the data collection requests and the targets. Moreover, the embodiments provide systems and methods to generate feedback data based upon the effectiveness of the data collection parameters. Furthermore, the embodiments provide systems and methods to score the set of suitable parameters based on the feedback data and the overall cost, which are then stored in an internal database.
    Type: Application
    Filed: August 31, 2022
    Publication date: March 2, 2023
    Applicant: METACLUSTER LT, UAB
    Inventors: MARTYNAS JURAVICIUS, ERIKAS BULBA, MANTAS BRILIAUSKAS
  • Publication number: 20230033150
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Application
    Filed: September 30, 2022
    Publication date: February 2, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Eivydas VILCINSKAS, Arnas PETRUSKEVICIUS, Giedrius STALIORAITIS, Martynas JURAVICIUS, Rimantas STANKEVICIUS
  • Publication number: 20230017698
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Application
    Filed: March 21, 2022
    Publication date: January 19, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Eivydas Vilcinskas, Arnas Petruskevicius, Giedrius Stalioraitis, Martynas Juravicius, Rimantas Stankevicius
  • Publication number: 20230018506
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Application
    Filed: April 21, 2022
    Publication date: January 19, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Eivydas Vilcinskas, Arnas Petruskevicius, Giedrius Stalioraitis, Martynas Juravicius, Rimantas Stankevicius
  • Publication number: 20230018983
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Application
    Filed: July 12, 2021
    Publication date: January 19, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Eivydas Vilcinskas, Arnas Petruskevicius, Giedrius Stalioraitis, Martynas Juravicius, Rimantas Stankevicius
  • Publication number: 20230018387
    Abstract: The current application discloses processor-implemented methods and systems of processing unclassified HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving unclassified HTML documents, isolating elements relevant for category identification, deriving classification attributes from the isolated elements, and applying a Machine Learning-based classification model resulting in HTML data items classified and labelled accordingly. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Application
    Filed: July 6, 2021
    Publication date: January 19, 2023
    Applicant: Metacluster LT, UAB
    Inventors: Andrius KUKSTA, Jurijus GORSKOVAS, Martynas JURAVICIUS
  • Publication number: 20220414164
    Abstract: In one aspect, methods and systems for producing an index of a target website are described. In another aspect, methods and systems for extracting specific information from one or more specific indexed URLs are described. The method and system for producing an index of a target website include receiving and analyzing a client's specifications for the index, accessing a target website, extracting the relevant information from the target website, parsing the extracted information in order to identify the URLs, producing the index containing the identified URLs, storing the index (which contains the list of indexed URLs) in a database, compiling the index (which contains the list of indexed URLs) into different formats requested by the client and providing the client, the access information for accessing the compiled index.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Applicant: Metacluster LT, UAB
    Inventors: Eivydas VILCINSKAS, Rimantas STANKEVICIUS, Aleksandras SULZENKO
  • Publication number: 20220414166
    Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Application
    Filed: July 1, 2022
    Publication date: December 29, 2022
    Applicant: Metacluster LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Publication number: 20220414397
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: August 30, 2022
    Publication date: December 29, 2022
    Applicant: METACLUSTER LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Patent number: 11496594
    Abstract: Systems and methods to manage and regulate the requests of multiple proxy clients are disclosed. In one aspect, the system and methods disclosed herein aids in configuring proxy server(s) with a rate-limit functionality. Configuration of the rate-limit functionality may be realized by, but not limited to, installing configuration file(s) and/or software application(s) on the proxy server(s). The configuration provides information about the list of restricted and unrestricted domains and their respective request limit specification in a given time frame. Therefore, each time before a proxy server forwards the clients' requests to a target domain, the proxy server checks and ensures that the request count to the particular target domain is well within the limit specified in the request limit specification. Thus, the embodiments described herein aid in preventing the IP addresses of proxy service providers from being blocked or denied from the target websites.
    Type: Grant
    Filed: June 3, 2022
    Date of Patent: November 8, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Giedrius Stalioraitis, Ovidijus Balkauskas
  • Patent number: 11468137
    Abstract: Systems and methods to intelligently optimize data collection requests are disclosed. In one embodiment, systems are configured to identify and select a complete set of suitable parameters to execute the data collection requests. In another embodiment, systems are configured to identify and select a partial set of suitable parameters to execute the data collection requests. The present embodiments can implement machine learning algorithms to identify and select the suitable parameters according to the nature of the data collection requests and the targets. Moreover, the embodiments provide systems and methods to generate feedback data based upon the effectiveness of the data collection parameters. Furthermore, the embodiments provide systems and methods to score the set of suitable parameters based on the feedback data and the overall cost, which are then stored in an internal database.
    Type: Grant
    Filed: March 22, 2022
    Date of Patent: October 11, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Martynas Juravicius, Erikas Bulba, Mantas Briliauskas
  • Patent number: 11470174
    Abstract: Systems and methods of task implementation are extended as provided herein and target the web crawling process through a step of submitting a request by a customer to a web crawler. The systems and methods allow a more complex request for a web crawler to be defined in order to receive more specific data. In one aspect, a method for data extraction and gathering from a Network by a Service provider infrastructure include the following steps: checking the parameters of a request received from a User's Device, adjusting the request parameters according to pre-established Scraping logic, selecting a Proxy according to the criteria of the pre-established Scraping logic, sending the adjusted request to the Target through the selected Proxy, checking metadata received from the Target, and forwarding the data to the User's device.
    Type: Grant
    Filed: April 22, 2022
    Date of Patent: October 11, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Eivydas Vilcinskas, Martynas Juravicius, Giedrius Stalioraitis
  • Publication number: 20220318564
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Application
    Filed: March 30, 2021
    Publication date: October 6, 2022
    Applicant: METACLUSTER LT, UAB
    Inventors: Martynas JURAVICIUS, Andrius KUKSTA
  • Patent number: 11461588
    Abstract: Systems and methods that allow examination of response data collected from content providers and provide for classification and routing according to the classification. The process of classification employs an unsupervised, or partially unsupervised, Machine Learning classifier model for identifying data collection responses that contains no data, mangled data, or a block, for assigning a classification correspondingly and for feeding the classification decision back to a data collection platform.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: October 4, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta
  • Patent number: 11416291
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: August 16, 2022
    Assignee: Metacluster LT, UAB
    Inventors: Eivydas Vilcinskas, Arnas Petruskevicius, Giedrius Stalioraitis, Martynas Juravicius, Rimantas Stankevicius
  • Patent number: 11416564
    Abstract: Embodiments disclose a system that allows for improved generation of web requests for scraping that, because of the nature of the requests and time and manner they are sent out, appear more organic, as in human generated, than conventional automated scraping systems. The system then manages how a client request to scrape a target website is made to the site, masking the request in a manner that makes it appear to the Web server as if the request is not generated by an automated system. In this way, by appearing more organic, Web servers may be less likely to block requests from the disclosed system or may take longer to block requests from the disclosed system. By avoiding Web servers blocking requests and extending the lifetime of IP proxies before they are blocked, embodiments can use a limited IP proxy address space more efficiently.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: August 16, 2022
    Assignee: Metacluster LT, UAB
    Inventors: Eivydas Vilcinskas, Arnas Petruskevicius, Giedrius Stalioraitis, Martynas Juravicius, Rimantas Stankevicius
  • Publication number: 20220247829
    Abstract: Systems and methods of task implementation are extended as provided herein and target the web crawling process through a step of submitting a request by a customer to a web crawler. The systems and methods allow a more complex request for a web crawler to be defined in order to receive more specific data. In one aspect, a method for data extraction and gathering from a Network by a Service provider infrastructure include the following steps: checking the parameters of a request received from a User's Device, adjusting the request parameters according to pre-established Scraping logic, selecting a Proxy according to the criteria of the pre-established Scraping logic, sending the adjusted request to the Target through the selected Proxy, checking metadata received from the Target, and forwarding the data to the User's device.
    Type: Application
    Filed: April 22, 2022
    Publication date: August 4, 2022
    Applicant: METACLUSTER LT, UAB
    Inventors: Eivydas Vilcinskas, Martynas Juravicius, Giedrius Stalioraitis
  • Patent number: 11381666
    Abstract: Systems and methods to manage and regulate the requests of multiple proxy clients are disclosed. In one aspect, the system and methods disclosed herein aids in configuring proxy server(s) with a rate-limit functionality. Configuration of the rate-limit functionality may be realized by, but not limited to, installing configuration file(s) and/or software application(s) on the proxy server(s). The configuration provides information about the list of restricted and unrestricted domains and their respective request limit specification in a given time frame. Therefore, each time before a proxy server forwards the clients' requests to a target domain, the proxy server checks and ensures that the request count to the particular target domain is well within the limit specified in the request limit specification. Thus, the embodiments described herein aid in preventing the IP addresses of proxy service providers from being blocked or denied from the target websites.
    Type: Grant
    Filed: February 24, 2022
    Date of Patent: July 5, 2022
    Assignee: METACLUSTER LT, UAB
    Inventors: Giedrius Stalioraitis, Ovidijus Balkauskas
  • Patent number: 11379542
    Abstract: ADVANCED RESPONSE PROCESSING IN WEB DATA COLLECTION discloses processor-implemented apparatuses, methods, and systems of processing unstructured raw HTML responses collected in the context of a data collection service, the method comprising, in one embodiment, receiving raw unstructured HTML documents and extracting text data with associated meta information that may comprise style and formatting information. In some embodiments data field tags and values may be assigned to the text blocks extracted, classifying the data based on the processing of Machine Learning algorithms. Additionally, blocks of extracted data may be grouped and re-grouped together and presented as a single data point. In another embodiment the system may aggregate and present the text data with the associated meta information in a structured format. In certain embodiments the Machine Learning model may be a model trained on a pre-created training data set labeled manually or in an automatic fashion.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: July 5, 2022
    Assignee: Metacluster LT, UAB
    Inventors: Martynas Juravicius, Andrius Kuksta