Patents Assigned to SAS Institute
  • Publication number: 20210026805
    Abstract: An apparatus includes a processor to: instantiate data buffers of a queue, reading threads, and provision threads; within each reading thread, use an identifier provided in a data buffer of the queue to retrieve the corresponding data set part and part metadata from storage device(s), and store both within the data buffer; operate the queue as a (FIFO) buffer; within each provision thread, retrieve a row group from among multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to decompress at least one column, and provide the data values of the row group to the requesting device or an application routine; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.
    Type: Application
    Filed: September 30, 2020
    Publication date: January 28, 2021
    Applicant: SAS Institute Inc.
    Inventors: Brian Payton Bowman, Gordon Lyle Keener
  • Patent number: 10902329
    Abstract: A computing device receives training data representing different observations where each observation is categorized into one of options for a target variable. The device obtains computer command(s) for categorizing into one of the options for the target variable. The device generates a sampling scheme for sampling terms of the training data. The device generates sampling models by, for N iterations of the sampling scheme: determining a subset of the training data based on a training data index; sampling, based on a term index, the subset of the training data for a subset of terms; and generating, based on the subset of terms, a sampling model for categorizing, according to the computer command(s). Each sampling model is generated from a different subset of terms such that the sampling models are randomized. The device computes an aggregated model for categorizing test data into one of the options for the target variable.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: January 26, 2021
    Assignee: SAS Institute Inc.
    Inventors: Bruce Monroe Mills, Vinicius Rabbi Vivaldi
  • Publication number: 20210019284
    Abstract: An apparatus includes a processor to: within each collection thread, assemble a row group from stored rows, generate row group metadata corresponding to the row group, and store the row group and row group metadata within a data buffer of a queue; operate the queue as a FIFO buffer; within each aggregation thread, retrieve multiple row groups and corresponding row group metadata from multiple data buffers of the queue, assemble a data set part from the multiple row groups, generate part metadata that includes the row group metadata, and transmit, to storage device(s) and/or a requesting device, the data set part and/or the part metadata; and in response to each retrieval of at least a row group from a data buffer of the queue for an aggregation thread, analyze availability of storage space within the node device to determine whether to dynamically adjust the quantity of data buffers.
    Type: Application
    Filed: September 30, 2020
    Publication date: January 21, 2021
    Applicant: SAS Institute Inc.
    Inventor: Brian Payton Bowman
  • Patent number: 10885020
    Abstract: A computing device obtains an indication of data records resolved to describe a single entity in an entity resolution. The data records comprise peripheral records resolved to describe the single entity based on matching data of a central record of the data records. The device generates an indication indicating that at least one of the first peripheral record and the second peripheral record does not describe the single entity by setting a first one of the data records as a source; and setting a second one of the data records as a sink. The device generates a data structure identifying record linkage information for records of the dataset. The record linkage information indicates one or more pathways between the source and the sink along the linked records. The device executes a minimum cut algorithm to identify one or more connections of the one or more pathways to unlink.
    Type: Grant
    Filed: July 22, 2020
    Date of Patent: January 5, 2021
    Assignee: SAS Institute Inc.
    Inventor: Nicholas Akbar Ablitt
  • Patent number: 10878345
    Abstract: A computing device receives factor information indicating multiple factors (e.g., hyperparameters for designing a system comprising a machine learning algorithm). The computing device receives range information indicating initial ranges with a range for each of possible options for the multiple factors. The computing device obtains a space-filling design for the design space. The space-filing design indicates selected design points in the design space. Each of the selected design points represents assigned options assigned to the multiple factors. The assigned options are assigned from the initial ranges. The computing device generates, based on the space-filling design, an initial design suite that provides initial design cases corresponding to one or more of the selected design points. The computing device generates evaluations of the initial design cases. The computing device outputs, based on the evaluations of the initial design cases, an indication of a selected design case.
    Type: Grant
    Filed: October 25, 2019
    Date of Patent: December 29, 2020
    Assignee: SAS Institute Inc.
    Inventors: Ryan Adam Lekivetz, Joseph Albert Morgan, Bradley Allen Jones, Russell Dean Wolfinger
  • Patent number: 10872277
    Abstract: A computing system classifies distributed data. A first computation request is sent to worker computing devices. A first response is received from each worker computing device. Each first response includes a first matrix computed as a second order derivative of a logarithm of a predefined likelihood function on a subset of training data distributed to each respective worker computing device. A global first matrix is defined by concatenating the first matrix from each worker computing device. A kernel matrix is computed using the training data and a predefined kernel function. A second computation request is sent to the worker computing devices. The second computation request indicates that each worker computing device compute a classification probability for each observation vector distributed to a respective worker computing device using the defined global first matrix and the computed kernel matrix. The determined classification probability is output for each observation vector.
    Type: Grant
    Filed: July 22, 2020
    Date of Patent: December 22, 2020
    Assignee: SAS Institute Inc.
    Inventor: Yingjian Wang
  • Patent number: 10867253
    Abstract: A computing system trains a clustering model. A responsibility parameter vector is initialized for each observation vector and includes a probability value of a cluster membership. The observation vectors include a plurality of classified observation vectors and a plurality of unclassified observation vectors. (A) Beta distribution parameter values are computed for each cluster. (B) Parameter values are computed for a normal-Wishart distribution for each cluster. (C) Each responsibility parameter vector is updated using the beta distribution parameter values, the parameter values, and a respective observation vector. (D) A convergence parameter value is computed. (E) (A) to (D) are repeated until the computed convergence parameter value indicates the responsibility parameter vector defined for each observation vector of the plurality of unclassified observation vectors is converged.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: December 15, 2020
    Assignee: SAS Institute Inc.
    Inventors: Yingjian Wang, Xu Chen
  • Patent number: 10867380
    Abstract: A computing system obtains image data capturing first and second objects. The system determines, based on user-identified data points, boundaries of the objects and generates a component of a dataset by computing a first data value related to an attribute of a key point in the first image; and computing a second data value related to an attribute of a key point in the first image. The system generates a second component of the dataset, the second component representing updated relative information between the first and second object by generating predicted changes in the first data value and second data value for the second image. The system computes a third data value and a fourth data value related to respective data points in a first and second polygon in the second image. The generating the updated relative information is based on the predicted changes and computed values.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: December 15, 2020
    Assignee: SAS Institute Inc.
    Inventors: Sharmin Pathan, Hamza Mustafa Ghadyali, Xunlei Wu, Ivan Borges Oliveira
  • Publication number: 20200387747
    Abstract: Physical-device anomalies and degradation can be mitigated by implementing some aspects described herein. For example, a system can determine a first data window and a second data window by applying a window function to streaming data. The system can determine a first principal eigenvector of the first data window and a first principal eigenvector of the second data window. The system can determine an angle change between the first principal eigenvectors of the two data windows. The system can then detect an anomaly based on determining that the angle change exceeds a predefined angle-change threshold. Additionally or alternatively, the system may compare the first principal eigenvector for the second data window to a baseline value to determine an absolute angle associated with the second data window. The system can then detect a degradation based on determining that the absolute angle exceeds a predefined absolute-angle threshold.
    Type: Application
    Filed: May 5, 2020
    Publication date: December 10, 2020
    Applicant: SAS Institute Inc.
    Inventors: Kyungduck Cha, Carol Wagih Sadek, Zohreh Asgharzadeh Talebi
  • Patent number: 10860809
    Abstract: A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.
    Type: Grant
    Filed: April 2, 2020
    Date of Patent: December 8, 2020
    Assignee: SAS Institute Inc.
    Inventors: James Allen Cox, Russell Albright, Saratendu Sethi
  • Patent number: 10841326
    Abstract: An authentication packet including a user identifier is received. The user identifier identifies a user of a second computing device being monitored by the first computing device. Authentication data is parsed from the authentication packet. A peer group identifier is determined that identifies a peer group to which the user is assigned. Members of the peer group are identified based on an expected network activity behavior. The authentication data and the peer group identifier are buffered into a first event block object and into a second event block object. The first event block object is sent to a first source window of an event stream processing engine (ESPE) that processes a netflow packet. The second event block object is sent to a second source window of the ESPE that processes the authentication packet. The first source window and the second source window are different source windows of the ESPE.
    Type: Grant
    Filed: October 8, 2019
    Date of Patent: November 17, 2020
    Assignee: SAS Institute Inc.
    Inventors: Bryan C. Harris, Glen R. Goodwin, Sean Riley Dyer, Alexius Kofi Ameyaw Boakye, Jr., Christopher Francis Smith, Pankaj Ramesh Telang, Damian Tane Herrick
  • Patent number: 10832174
    Abstract: Data is classified using automatically selected hyperparameter values. (A) A first loss value is determined based on a converged classification matrix. (B) Each observation vector is assigned to a cluster using a clustering algorithm based on the converged classification matrix. (C) A predefined number of observation vectors is selected from each cluster. D) Classified observation vectors and unclassified observation vectors are updated based on the selections in (C) and (A) is repeated. (E) An entropy loss value is determined, wherein (A) to (E) are repeated for a plurality of different values of a kernel parameter value and a batch size value. (F) A second loss value is determined based on the converged classification matrix, a label matrix defined from the converged classification matrix, and a weight value. (L) (A) to (F) are repeated with a plurality of different values of the weight value until convergence is satisfied.
    Type: Grant
    Filed: March 12, 2020
    Date of Patent: November 10, 2020
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Brett Alan Wujek
  • Patent number: 10824694
    Abstract: A computing system defines transformed variable values for training a machine learning model. A data description is determined for each variable of a plurality of variables from observation vectors. A number of rare-levels is determined for any variable of the plurality of variables that has a nominal variable type. Bins that describe a cumulative distribution function are defined for each variable based on the data description determined for each variable and based on the number of rare-levels determined for any variable of the plurality of variables identified as the nominal variable type. A transformed value is determined for each variable and for each observation vector of the observation vectors using the bins defined for a respective variable of the plurality of variables. Each determined transformed value is written to a transformed dataset with a respective observation vector of the observation vectors.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: November 3, 2020
    Assignee: SAS Institute Inc.
    Inventors: Biruk Gebremariam, Mark Traccarella
  • Patent number: 10803214
    Abstract: A computing device receives a request for a design of an experiment. The design comprises test cases with test conditions for testing factors for the experiment. The device receives a value for a parameter of multiple parameters for the design. The multiple parameters indicate a total number of test cases for the design, a total number of factors for the design, and a total number of groups for grouping factors. The device generates a value for each of one or more other parameters of the multiple parameters such that the design is a supersaturated design. The device generates, based on the first value for the first parameter and the value for each of the one or more other parameters, the supersaturated design for the experiment that is a design that distributes each of the factors into one of the groups. The device outputs an indication of the supersaturated design.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: October 13, 2020
    Assignee: SAS Institute Inc.
    Inventors: Bradley Allen Jones, Ryan Adam Lekivetz, Joseph Albert Morgan, Caleb Bridges King
  • Publication number: 20200293360
    Abstract: Techniques to manage virtual classes for statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, statistics of the statistical test based on parameter vectors to follow a probability distribution, a statistic simulator component to simulate statistics for the parameter vectors from the simulated data with a distributed computing system comprising multiple nodes each having one or more processors capable of executing multiple threads, the simulation to occur by distribution of portions of the simulated data across the multiple nodes of the distributed computing system, and a distributed control engine to control task execution on the distributed portions of the simulated data on each node of the distributed computing system with a virtual software class arranged to coordinate task and sub-task operations across the nodes of the distributed computing system. Other embodiments are described and claimed.
    Type: Application
    Filed: March 31, 2020
    Publication date: September 17, 2020
    Applicant: SAS Institute Inc.
    Inventors: Xilong Chen, Mark Roland Little
  • Patent number: 10769528
    Abstract: A computer trains a neural network model. (B) A neural network is executed to compute a post-iteration gradient vector and a current iteration weight vector. (C) A search direction vector is computed using a Hessian approximation matrix and the post-iteration gradient vector. (D) A step size value is initialized. (E) An objective function value is computed that indicates an error measure of the executed neural network. (F) When the computed objective function value is greater than an upper bound value, the step size value is updated using a predefined backtracking factor value. The upper bound value is computed as a sliding average of a predefined upper bound updating interval value number of previous upper bound values. (G) (E) and (F) are repeated until the computed objective function value is not greater than the upper bound value. (H) An updated weight vector is computed to describe a trained neural network model.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: September 8, 2020
    Assignee: SAS Institute Inc.
    Inventors: Ben-hao Wang, Joshua David Griffin, Seyedalireza Yektamaram, Yan Xu
  • Patent number: 10754764
    Abstract: A computing device receives data comprising inputs representing a respective option for each of factors in each of test cases. The data comprises a response of the system for each of the test cases. The computing device receives a request requesting an evaluation of the data for generating a model (e.g. a machine learning algorithm) to predict responses based on the factors. The computing device obtains different group identifiers for each of groups for distributing the test cases for the system (e.g., groups of a K-fold cross-validation). The computing device for each of validation(s): generates a data set comprising a respective data element for each of the test cases of the plurality of test cases; and controls assignment of a group identifier of the different group identifiers to each of the respective data elements. The computing device outputs an indication of one or more generated data sets for the validation(s).
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: August 25, 2020
    Assignee: SAS Institute Inc.
    Inventors: Ryan Adam Lekivetz, Joseph Albert Morgan, Bradley Allen Jones, Russell Dean Wolfinger
  • Patent number: 10699207
    Abstract: A computing device computes a weight matrix to compute a predicted value. For each of a plurality of related tasks, an augmented observation matrix, a plug-in autocovariance matrix, and a plug-in covariance vector are computed. A weight matrix used to predict the characteristic for each of a plurality of variables and each of a plurality of related tasks is computed. (a) and (b) are repeated with the computed updated weight matrix as the computed weight matrix until a convergence criterion is satisfied: (a) a gradient descent matrix is computed using the computed plug-in autocovariance matrix, the computed plug-in covariance vector, the computed weight matrix, and a predefined relationship matrix, wherein the predefined relationship matrix defines a relationship between the plurality of related tasks, and (b) an updated weight matrix is computed using the computed gradient descent matrix.
    Type: Grant
    Filed: October 9, 2019
    Date of Patent: June 30, 2020
    Assignee: SAS Institute Inc.
    Inventors: Xin Jiang Hunt, Jorge Manuel Gomes da Silva, Ilknur Kaynar Kabul
  • Patent number: 10699081
    Abstract: A human language analyzer receives, at the human language analyzer, text data representing information in a human language. The human language analyzer receives a computer command for identifying a text data component of the text data. The computer command comprises at least two requirements for the text data component. The human language analyzer, responsive to identifying that the first requirement and the second requirement are met, locates the text data component from one of two clauses. A clause analyzer receives a clause request to locate clauses within text data representing information in a human language. The clause analyzer receives, responsive to a dependency request, token information in a token data set. The clause analyzer determines a location for each clause of the sentence portion in a hierarchy of clauses. The clause analyzer generates and outputs a new data set based on the token data set and the hierarchy of clauses.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: June 30, 2020
    Assignee: SAS Institute Inc.
    Inventors: Teresa S. Jade, Wei-shan Chiang, Aaron Douglas Arthur, Seng Lee, Qin Yang, Xu Yang
  • Patent number: 10664555
    Abstract: A computing device provides distributed estimation of an empirical distribution function. A boundary cumulative distribution function (CDF) value is defined at a start of each region of a plurality of regions. An accuracy value is defined for each region. (a) First equal proportion bins are computed for a first sample of a first marginal variable using the defined boundary CDF value for each region. (b) Second equal proportion bins are computed for the first sample of the first marginal variable within each region based on the defined accuracy value for each region. (c) The computed second equal proportion bins are added as an empirical distribution function (EDF) for the first marginal variable. (d) (a) to (c) are repeated for each remaining sample of the first marginal variable. (e) (a) to (d) are repeated with each remaining marginal variable of a plurality of marginal variables as the first marginal variable.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: May 26, 2020
    Assignee: SAS Institute Inc.
    Inventor: Mahesh V. Joshi