Patents Assigned to SAS Institute
-
Publication number: 20250231992Abstract: Techniques described herein provide for automated near-duplicate detection for new text documents given text documents that were previously processed using automated near-duplicate detection for text documents. In one example, a system can receive new documents and documents that were previously processed using a predefined processing technique for automated near-duplicate detection. The system can process the new documents and cluster the new documents into multiple predefined clusters previously identified using the predefined processing technique. For each predefined cluster including at least one new document, the system can generate document groups by determining similarity scores using the predefined processing technique as applied to the documents in the predefined clusters. The system can identify a representative document for each document group and generate an output data structure including the document groups and the representative document for each group.Type: ApplicationFiled: February 7, 2025Publication date: July 17, 2025Applicant: SAS Institute Inc.Inventors: Fan WANG, Teresa S. JADE, Xu YANG
-
Publication number: 20250231993Abstract: Techniques described herein provide for text string comparison for documents identified using automated near-duplicate detection. In one example, a system can receive a pair of documents. The system can extract text strings from the documents. The system can normalize the extracted text strings using a predefined normalization scheme. The system can identify boilerplate text segments in the normalized text strings. The system can remove the boilerplate text segments from the normalized text strings to generate filtered text strings. The system can divide the filtered text strings by identifying section indicators. The system can, for each section, generate groupings of text strings and determine a similarity score between each pair of corresponding groupings to identify matching groupings of text strings. The system can generate an output for display showing the visual indications of the matched groupings of text strings.Type: ApplicationFiled: February 7, 2025Publication date: July 17, 2025Applicant: SAS Institute Inc.Inventors: Fan WANG, Teresa S. JADE, Xu YANG
-
Patent number: 12353501Abstract: A system and method include dividing training data into training data blocks, determining a support vector subset, distributing the training data blocks and the support vector subset to worker machines, receiving a first set of sub-results from worker machines, combining the first set of sub-results, solving a linear system, distributing a first set of variables to worker machines, receiving a second set of sub-results from worker machines, selecting a step size value and sending the selected step size value to worker machines, receiving updated values of the first set of variables and second set of variables from worker machines, receiving a maximum residual error value from worker machines, selecting a maximum value of the maximum residual error values, responsive to determining that selected maximum value satisfies an optimality condition, outputting a weight value and a bias value, and predicting a label using the weight value and the bias value.Type: GrantFiled: October 17, 2024Date of Patent: July 8, 2025Assignee: SAS Institute Inc.Inventors: Riadh Omheni, Joshua David Griffin
-
Patent number: 12341800Abstract: Functions representing sequences of values of a time-series dataset measured within a particular time period are accessed. For a current time window of the time period, a first discretized covariance function is computed that represents a relationship between each value measured within the current time window. Eigenanalysis of the first covariance function is performed to estimate first eigenfunctions. The current time window is incremented to obtain a subsequent time window that overlaps a majority of the current time window at a shared window region. A second discretized covariance function is computed for the subsequent time window and eigenanalysis is performed to estimate second normalized eigenfunctions. An angle change is computed between a portion of the first normalized eigenfunctions and a corresponding portion of the second normalized eigenfunctions located within the shared window region. Based on the angle change, an anomaly detection output is generated.Type: GrantFiled: December 16, 2024Date of Patent: June 24, 2025Assignee: SAS Institute Inc.Inventors: Chengpeng Zeng, Kai Shen, Zohreh Asgharzadeh Talebi
-
Publication number: 20250181650Abstract: Techniques described herein provide for generation of structured output for documents identified using automated near-duplicate detection. In one example, a system can receive a set of documents including at least one pair of similar documents determined to be similar to one another based on similarity scores generated using a predefined similarity scoring technique. The system can generate document groups by merging together pairs of documents that share at least one document. The system can, for each of the document groups, identify a representative document for the document group. The system can generate an output for display including a section for each document group, in which each section includes the representative document for the document group and, for each document in the document group, the similarity score relative to the representative document for the document group.Type: ApplicationFiled: February 7, 2025Publication date: June 5, 2025Applicant: SAS Institute Inc.Inventors: Fan WANG, Teresa S. JADE, Xu YANG
-
Publication number: 20250156467Abstract: A computer-implemented system, computer-implemented method, and computer-program product includes obtaining a text document that includes text describing an action; extracting one or more action tokens from the text document; executing a plurality of linguistic pattern searches that search the text document for one or more likelihood tokens associated with the one or more action tokens; classifying the action to a likelihood category associated with a respective linguistic pattern search of the plurality of linguistic pattern searches that identified the one or more likelihood tokens; classifying the text document to a respective domain; computing a priority value of the action described in the text document based on an input of the likelihood category and the respective domain; and generating a priority summary artifact that visually prioritizes the text document over one or more other text documents when the priority value of the action satisfies a predefined maximum priority threshold value.Type: ApplicationFiled: August 22, 2024Publication date: May 15, 2025Applicant: SAS Institute Inc.Inventors: Teresa S. Jade, Julia Moreno, Ashley Mary Beck
-
Patent number: 12299360Abstract: An apparatus includes processor(s) to: receive a request to test goodness-of-fit of a spatial process model; generate a KD tree from observed spatial point dataset including locations within a region at which instances of an event occurred; derive, from the observed spatial point dataset, multiple quadrats into which the region is divided; receive, from multiple processors, current levels of availability of processing resources including quantities of currently available execution threads; select, based on the quantity of currently available execution threads, a subset of the multiple processors to perform multiple iterations of a portion of the test in parallel; provide, to each processor of the subset, the KD tree, the spatial process model, and the multiple quadrats; receive, from each processor of the subset, per-quadrat data portions indicative of results of an iteration; derive a goodness-of-fit statistic from the per-quadrat data portions; and transmit an indication of goodness-of-fit to another deviceType: GrantFiled: November 26, 2021Date of Patent: May 13, 2025Assignee: SAS Institute Inc.Inventor: Pradeep Mohan
-
Patent number: 12298963Abstract: A new value is written from a dataset to a data structure comprising a set of sorted values. The new value replaces an oldest value and is inserted in a sorted position. The data structure is modified by subtracting a median value from each value of the set of sorted values to obtain sorted signed deviation values. The sorted signed deviation values are segmented to obtain data substructures comprising subsets of sorted absolute deviation values. A binary search is performed on the data substructures to identify a median absolute deviation value. A difference is computed between a particular value and the median value, and based on whether the difference is less than a threshold value computed from the median absolute deviation value, an outlier decision output is generated indicative of whether the particular value comprises an outlier value.Type: GrantFiled: October 30, 2024Date of Patent: May 13, 2025Assignee: SAS Institute, Inc.Inventors: Hongtao Hu, Mahesh V Joshi
-
Patent number: 12293213Abstract: A system and method include creating a project package for an Event Stream Processing (ESP) project, generating a first manifest file from the project package, creating a first container pod on a cluster based on the first manifest file, executing a container file generator software and a build kit software on the first container pod, executing an ESP server on the container file generator software, executing the ESP project on the ESP server such that data is not streaming to the ESP server, identifying a list of required software components needed to execute the ESP project, creating a container file having a subset of software components based on the list of required software components, generating a ESP project container image for the ESP server based on the container file, and deploying the ESP project using the ESP project container image to analyze data streamed to the ESP project.Type: GrantFiled: December 20, 2024Date of Patent: May 6, 2025Assignee: SAS Institute Inc.Inventors: Frédéric Combaneyre, Joydeep Bhattacharya
-
Publication number: 20250139088Abstract: A computer-implemented system, computer-implemented method, and computer-program product includes receiving a natural language query from a user for executing an analytical task; generating an analytical large language model (LLM) prompt based on the natural language query and, in response to generating the analytical LLM prompt, orchestrating an LLM-directed workflow for handling the natural language query by: automatically prompting, using the analytical LLM prompt, an analytical task-oriented LLM to generate a structured query for querying a data catalog application; querying the data catalog application using the structured query generated by the analytical task-oriented LLM; obtaining query results from the data catalog application, where the query results include metadata associated with at least one element accessible to the data catalog application; prompting the analytical task-oriented LLM to identify a given analytical task associated with a given analytical agent; and automatically executing, by tType: ApplicationFiled: October 2, 2024Publication date: May 1, 2025Applicant: SAS Institute Inc.Inventor: David Hermann Peter Weik
-
Patent number: 12287783Abstract: A system and method include breaking symmetry in a query graph by converting the query graph into a transformed query graph by generating a symmetry breaking expression that includes detecting one or more orbits in the transformed query graph, selecting an orbit from the one or more orbits having more than one node, generating an automorphism breaking sub-expression for the selected orbit, assigning a node of the selected orbit a unique node attribute, recalculating the one or more orbits in the transformed query graph, repeating the process until each node is in its own orbit, and combining each of the automorphism breaking sub-expressions to obtain the symmetry breaking expression. Using the symmetry breaking expression, the system and method include finding one or more subgraphs of a main graph that match the symmetry breaking expression of the query graph.Type: GrantFiled: August 19, 2024Date of Patent: April 29, 2025Assignee: SAS Institute Inc.Inventors: Brandon Michael Reese, Steven Harenberg
-
Publication number: 20250117632Abstract: A system, method, and computer-program product includes obtaining a decisioning dataset comprising a plurality of favorable decisioning records and at least one unfavorable decisioning record; detecting, via a machine learning algorithm, a favorable decisioning record of the plurality of favorable decisioning records that has a vector value closest to a vector value of the unfavorable decisioning record; executing a counterfactual assessment between the favorable decisioning record and the unfavorable decisioning record; generating an explainability artifact based on one or more bias intensity metrics to explain a bias in a machine learning-based decisioning model; and in response to generating the explainability artifact, displaying the explainability artifact in a user interface.Type: ApplicationFiled: July 5, 2024Publication date: April 10, 2025Applicant: SAS Institute Inc.Inventors: Luiz Henrique Outi Kauffmann, Aline Riquetti Campos Emídio
-
Publication number: 20250117664Abstract: A system, method, and computer-program product includes obtaining a decisioning dataset comprising a plurality of favorable decisioning records and at least one unfavorable decisioning record; detecting, via a machine learning algorithm, a favorable decisioning record of the plurality of favorable decisioning records that has a vector value closest to a vector value of the unfavorable decisioning record; executing a counterfactual assessment between the favorable decisioning record and the unfavorable decisioning record; generating an explainability artifact based on one or more bias intensity metrics to explain a bias in a machine learning-based decisioning model; and in response to generating the explainability artifact, displaying the explainability artifact in a user interface.Type: ApplicationFiled: July 5, 2024Publication date: April 10, 2025Applicant: SAS Institute Inc.Inventors: Luiz Henrique Outi Kauffmann, Aline Riquetti Campos Emídio
-
Publication number: 20250117192Abstract: In one example, a computer system can generate a graphical user interface (GUI) for forecasting software including a drag-and-drop canvas with a set of rearrangeable nodes defining a forecasting pipeline. The computer system can detect a user interaction for attaching an external-language execution node to the pipeline, which can be used to insert custom code defined using an external programming language. The computer system can receive the custom code. The computer system can receive a user input to initiate execution of the pipeline. The computer system can generate wrapped custom code by augmenting the custom code with additional program code including shared variables. The computer system can provide the wrapped custom code to a set of execution threads configured to execute the wrapped custom code as part of the pipeline to generate one or more forecasts. The computer system can output the forecasts in the GUI.Type: ApplicationFiled: July 2, 2024Publication date: April 10, 2025Applicant: SAS Institute Inc.Inventors: Iman Vasheghani Farahani, Mahesh V. Joshi, Phillip M. Helmkamp, Rajib Nath, Vilochan Suresh Muley, Javier Delgado, Michele Angelo Trovero
-
Publication number: 20250103579Abstract: In one example, a system can receive, from application code including an analysis operation performed on a set of data, an indication to access the set of data included in a tabular data structure using an application programming interface (API), in which the tabular data structure is associated with a memory allocation and a type. The system can determine that the type of the tabular data structure is the native type, the native type characterizing data structures that are accessed using a first programming language and a second programming language. The system can identify a proxy data table that shares the memory allocation, the proxy data table accessed using the API based on the second programming language. The system can issue one or more read commands to the proxy data table to cause the set of data to be read from the tabular data structure.Type: ApplicationFiled: October 10, 2024Publication date: March 27, 2025Applicant: SAS Institute Inc.Inventors: Yongqiao Xiao, Mary Elizabeth Carter, Arash Dehghan Banadaki, Avery Winston Acierno, Patrick Nathan Koch
-
Publication number: 20250103578Abstract: In one example, a system can receive information about a tabular data structure in a memory including a set of data and a first memory allocation. The system can determine a type of the tabular data structure, the type selected from among two types including a native type and a non-native type. The system can, in response to the type being the native type, identify a first proxy data table usable as a proxy for the tabular data structure that shares the first memory allocation. The system can receive a first indication to access the set of data from application code. The system can issue one or more first read commands to the first proxy data table to cause the set of data to be read from the tabular data structure.Type: ApplicationFiled: October 10, 2024Publication date: March 27, 2025Applicant: SAS Institute Inc.Inventors: Yongqiao Xiao, Mary Elizabeth Carter, Arash Dehghan Banadaki, Avery Winston Acierno, Patrick Nathan Koch
-
Publication number: 20250068490Abstract: A system, method, and computer-program product includes implementing a cross-process queue within a single computer that is configured to transfer a data block between an operating system process executing a write operation and an operating system process executing a read operation, initializing in-memory cell indices within the cross-process queue that include a write operation index tracking index values of one or more cells within the cross-process queue that are available to write and a read operation index tracking index values of one or more cells within the cross-process queue that are available to read, and implementing a cell synchronization data structure tracking states of a plurality of cells of the index of cells of the cross-process queue.Type: ApplicationFiled: June 7, 2024Publication date: February 27, 2025Applicant: SAS Institute Inc.Inventors: Lawrence Edmund Lewis, Mohammadreza Nazari, Amirhassan Fallah Dizche
-
Publication number: 20250068658Abstract: Embodiments described herein relate to the efficient generation of synthetic datasets that represent many-to-many relationships. In particular, certain embodiments implement a particular factorization for many-to-many generative models, which leads to a scalable generation framework by combining random graph theory and representation learning. Further embodiments we extend the framework to establish the notion of differential privacy within the synthetically generated data. The embodiments described herein are therefore able to generate synthetic datasets efficiently while preserving information within and across many-to-many datasets with improved accuracy.Type: ApplicationFiled: November 8, 2024Publication date: February 27, 2025Applicant: SAS Institute Inc.Inventors: Kai Xu, Georgi Valentinov Ganev, Emile Isak Joubert, Rees Stephen Davison, Olivier Rene Maurice Van Acker, Luke Anthony William Robinson, Sofiane Mahiou
-
Publication number: 20250068357Abstract: A system, method, and computer-program product includes implementing a cross-process queue within a single computer that is configured to transfer a data block between an operating system process executing a write operation and an operating system process executing a read operation, initializing in-memory cell indices within the cross-process queue that include a write operation index tracking index values of one or more cells within the cross-process queue that are available to write and a read operation index tracking index values of one or more cells within the cross-process queue that are available to read, and implementing a cell synchronization data structure tracking states of a plurality of cells of the index of cells of the cross-process queue.Type: ApplicationFiled: June 7, 2024Publication date: February 27, 2025Applicant: SAS Institute Inc.Inventors: Lawrence Edmund Lewis, Mohammadreza Nazari, Amirhassan Fallah Dizche
-
Publication number: 20250068927Abstract: A system, method, and computer-program product includes receiving an input comprising a plurality of pre-defined factor matrices and an implicit feedback dataset partitioned into a plurality of implicit feedback data subsets; distributing the input across a controller node and a plurality of worker nodes implemented in a distributed computing environment; and training a model using the controller node and the plurality of worker nodes, wherein training the model includes: initializing, by the controller node, a controller-specific user parameters matrix and a controller-specific item parameters matrix, broadcasting, by the controller node, the controller-specific user parameters matrix and the controller-specific item parameters matrix to each worker node of the plurality of worker nodes, and concurrently executing an aggregation model training algorithm at the controller node and a plurality of localized model training algorithms across the plurality of worker nodes until a training termination condition isType: ApplicationFiled: February 21, 2024Publication date: February 27, 2025Applicant: SAS Institute Inc.Inventors: Xuejun Liao, Patrick Nathan Koch