Patents Assigned to SAS Institute
  • Patent number: 11010691
    Abstract: Data is classified using semi-supervised data. A decomposition is performed to define a first decomposition matrix that includes first eigenvectors of a weight matrix, a second decomposition matrix that includes second eigenvectors of a transpose of the weight matrix, and a diagonal matrix that includes eigenvalues of the first eigenvectors. Eigenvectors are selected from the first eigenvectors to define a reduced decomposition matrix. A linear transformation matrix is computed as a function of the first decomposition matrix, the reduced decomposition matrix, the diagonal matrix, and a penalty matrix. When a rank of the linear transformation matrix is less than a number of rows of the penalty matrix, a classification matrix is computed by updating a gradient of a cost function. When the rank of the linear transformation matrix is equal to the number of rows of the penalty matrix, the classification matrix is computed using a dual formulation.
    Type: Grant
    Filed: November 10, 2020
    Date of Patent: May 18, 2021
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Jorge Manuel Gomes da Silva, Brett Alan Wujek
  • Patent number: 11003733
    Abstract: A computing device computes a plurality of quantile regression solvers for a dataset at a plurality of quantile levels. Each observation vector includes an explanatory vector of a plurality of explanatory variable values and a response variable value. The read dataset is recursively divided into subsets of the plurality of observation vectors, a lower counterweight vector and an upper counterweight vector are computed for each of the subsets, and a quantile regression solver is fit to each of the subsets using the associated, computed lower counterweight vector and the associated, computed upper counterweight vector to describe a quantile function of the response variable values for a selected quantile level of the identified plurality of quantile level values. For each selected quantile level, a parameter estimate vector and a dual solution vector that describe the quantile function are output in association with the selected quantile level.
    Type: Grant
    Filed: December 21, 2017
    Date of Patent: May 11, 2021
    Assignee: SAS Institute Inc.
    Inventor: Yonggang Yao
  • Patent number: 10984075
    Abstract: A computer transforms high-dimensional data into low-dimensional data. A distance is computed between a selected observation vector and each observation vector of a plurality of observation vectors, a nearest neighbors are selected using the computed distances, and a first sigmoid function is applied to compute a distance similarity value between the selected observation vector and each of the selected nearest neighbors where each of the computed distance similarity values is added to a first matrix. The process is repeated with each observation vector of the plurality of observation vectors as the selected observation vector. An optimization method is executed with an initial matrix, the first matrix, and a gradient of a second sigmoid function that computes a second distance similarity value between the selected observation vector and each of the nearest neighbors to transform each observation vector of the plurality of observation vectors into the low-dimensional space.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: April 20, 2021
    Assignee: SAS Institute Inc.
    Inventors: Yu Liang, Arin Chaudhuri, Haoyu Wang
  • Publication number: 20210110527
    Abstract: Embodiments are generally directed to techniques for extracting contextually structured data from document images, such as by automatically identifying document layout, document data, and/or document metadata in a document image, for instance. Many embodiments are particularly directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. For example, the document template database may include a plurality of templates for identifying/explaining key data elements in various document image formats that can be used to extract contextually structured data from incoming document images with a matching document image format. Several embodiments are particularly directed to automatically identifying and associating document metadata with corresponding document data in a document image, such as for generating a machine-facilitated annotation of the document image.
    Type: Application
    Filed: October 29, 2020
    Publication date: April 15, 2021
    Applicant: SAS Institute Inc.
    Inventors: David James Wheaton, William Robert Nadolski, Heather Michelle GoodyKoontz
  • Publication number: 20210110289
    Abstract: The computing device receives a first user input request to modify a structural equation model (SEM) in a graphical user interface. The modification of the SEM includes modifying one or more SEM path diagram elements. The computing device detects whether a first SEM path diagram element is modified responsive to the received first user input request. Based on the detection, the computing device determines whether the modification violates a first set of SEM rules, a second set of SEM rules, or one or more launch conditions prior to initiating execution of the SEM. Based on determining a violation of the SEM rules or the launch conditions or that there was not a violation, the computing device displays a graphical indicator for indicating a fatal error for the SEM modification, a warning error for the SEM modification, or a valid SEM modification.
    Type: Application
    Filed: October 13, 2020
    Publication date: April 15, 2021
    Applicant: SAS Institute Inc.
    Inventors: Laura Castro-Schilo, James Robert Koepfler, Christopher Michael Gotwalt
  • Patent number: 10978053
    Abstract: A system determines user intent from a received conversation element. A plurality of distinct intent labels are generated for the received conversation element. The generated plurality of distinct intent labels are divided into a plurality of interpretation partitions with overlapping semantic content. for each interpretation partition of the plurality of interpretation partitions, a set of maximal coherent subgroups are defined that do not disagree on labels for terms in each subgroup, a score is computed for each maximal coherent subgroup of the defined set of maximal coherent subgroups, and a maximal coherent subgroup is selected from the set of maximal coherent subgroups based on the computed score. Intent labels are aggregated from the selected maximal coherent subgroup of each interpretation partition of the plurality of interpretation partitions to define a multiple intent interpretation of the received conversation element.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: April 13, 2021
    Assignee: SAS Institute Inc.
    Inventors: Jared Michael Dean Smythe, Richard Welland Crowell
  • Patent number: 10970651
    Abstract: Graphical interactive model selection is provided. A dataset includes observation vectors defined for each value of a plurality of values of a group variable. A nonlinear model is trained with each plurality of observation vectors to describe the response variable based on the explanatory variable for each value of the plurality of values of the group variable. Nonlinear model results are presented within a first sub-window of a first window. An indicator of a request to perform parameter analysis of the nonlinear model results is received. A linear model is trained. Trained linear model results from the trained linear model are presented within a second sub-window of the first window for each parameter variable of the nonlinear model. Predicted response variable values are presented as a function of the explanatory variable and the factor variable value using the trained nonlinear model within a third sub-window of the first window.
    Type: Grant
    Filed: June 18, 2020
    Date of Patent: April 6, 2021
    Assignee: SAS Institute Inc.
    Inventors: Clayton Adam Barker, Ryan Jeremy Parker, Christopher Michael Gotwalt
  • Patent number: 10965706
    Abstract: A computing device determines a peer group identifier and supplements netflow records with the peer group identifier. An authentication event block object is received that was sent to a first source window. The authentication event block object includes a user identifier, an IP address, and a peer group identifier. Members of the peer group are identified based on an expected network activity behavior. The user identifier and the peer group identifier are stored in association with the IP address in a cache. A netflow event block object sent to the first source window is received that includes a netflow packet IP address. Netflow data is parsed from the netflow event block object into a netflow record. When the stored IP address matches the netflow packet IP address, the netflow record is supplemented with the user identifier and the peer group identifier. The supplemented netflow record is output to summary data.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: March 30, 2021
    Assignee: SAS Institute Inc.
    Inventors: Bryan C. Harris, Alexius Kofi Ameyaw Boakye, Jr., Sean Riley Dyer, Christopher Francis Smith
  • Patent number: 10963788
    Abstract: Graphical interactive model selection is provided. A basis function is fit to each plurality of observation vectors defined for each value of a group variable. Basis results are presented within a first sub-window of a first window of a display. Functional principal component analysis (FPCA) is automatically performed on each basis function. FPCA results are presented within a second sub-window of the first window. An indicator of a request to perform functional analysis using the FPCA results based on a predefined factor variable is received in association with the first window. A model is trained using an eigenvalue and an eigenfunction computed as a result of the FPCA for each plurality of observation vectors using the factor variable value as a model effect. (G) Trained model results are presented within a third sub-window of the first window of the display.
    Type: Grant
    Filed: July 2, 2020
    Date of Patent: March 30, 2021
    Assignee: SAS Institute Inc.
    Inventors: Ryan Jeremy Parker, Clayton Adam Barker, Christopher Michael Gotwalt
  • Patent number: 10963804
    Abstract: Graphical interactive prediction evaluation is provided. An extrapolation threshold value is computed using an extrapolation threshold function with an explanatory variable value of each of a plurality of explanatory variables read for each observation vector of a plurality of observation vectors. A model is fit to the observation vectors. Model results are presented in a display that include a first value for each explanatory variable. An indicator of a second value of at least one of the explanatory variables that is different from the first value is received. An extrapolation value is computed using an extrapolation function with the second value and the first value of others of the explanatory variables. The extrapolation value is compared to the extrapolation threshold value. An extrapolation indicator is presented in the display when the comparison indicates that the second value is an extrapolation.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: March 30, 2021
    Assignee: SAS Institute Inc.
    Inventors: Jeremy Ryan Ash, Christopher Michael Gotwalt, Laura Carmen Lancaster
  • Patent number: 10963802
    Abstract: A computing device selects decision variable values. A lower boundary value and an upper boundary value is defined for a decision variable. (A) A plurality of decision variable configurations is determined using a search method. The value for the decision variable is between the lower boundary value and the upper boundary value. (B) A decision variable configuration is selected. (C) A model of the model type is trained using the decision variable configuration. (D) The model is scored to compute an objective function value. (E) The computed objective function value and the selected decision variable configuration are stored. (F) (B) through (E) is repeated for a plurality of decision variable configurations. (G) The lower boundary value and the upper boundary value are updated using the objective function value and the decision variable configuration stored. Repeat (A)-(F) with the lower boundary value and the upper boundary value updated in (G).
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: March 30, 2021
    Assignee: SAS Institute Inc.
    Inventors: Steven Joseph Gardner, Joshua David Griffin, Yan Xu, Yan Gao
  • Patent number: 10956825
    Abstract: Data is classified using semi-supervised data. A weight matrix is computed using a kernel function applied to observation vectors. A decomposition of the computed weight matrix is performed. A predefined number of eigenvectors is selected from the decomposed weight matrix to define a decomposition matrix. (A) A gradient value is computed as a function of the defined decomposition matrix, sparse coefficients, and a label vector. (B) A value of each coefficient of the sparse coefficients is updated based on the gradient value. (A) and (B) are repeated until a convergence parameter value indicates the sparse coefficients have converged. A classification matrix is defined using the converged sparse coefficients. The target variable value is determined and output for each observation vector based on the defined classification matrix to update the label vector and defined to represent the label for a respective unclassified observation vector.
    Type: Grant
    Filed: June 18, 2020
    Date of Patent: March 23, 2021
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Jorge Manuel Gomes da Silva, Brett Alan Wujek
  • Patent number: 10956835
    Abstract: A computing device compresses a gradient boosting tree predictive model. A gradient boosting tree predictive model is trained using a plurality of observation vectors. Each observation vector includes an explanatory variable value of an explanatory variable and a response variable value for a response variable. The gradient boosting tree predictive type model is trained to predict the response variable value of each observation vector based on a respective explanatory variable value of each observation vector. The trained gradient boosting tree predictive model is compressed using a compression model with a predefined penalty constant value and with a predefined array of coefficients to reduce a number of trees of the trained gradient boosting tree predictive model. The compression model minimizes a sparsity norm loss function. The compressed, trained gradient boosting tree predictive model is output for predicting a new response variable value from a new observation vector.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: March 23, 2021
    Assignee: SAS Institute Inc.
    Inventors: Rui Shi, Guixian Lin, Xiangqian Hu, Yan Xu
  • Publication number: 20210073023
    Abstract: Techniques to manage virtual classes for statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, statistics of the statistical test based on parameter vectors to follow a probability distribution, a statistic simulator component to simulate statistics for the parameter vectors from the simulated data with a distributed computing system comprising multiple nodes each having one or more processors capable of executing multiple threads, the simulation to occur by distribution of portions of the simulated data across the multiple nodes of the distributed computing system, and a distributed control engine to control task execution on the distributed portions of the simulated data on each node of the distributed computing system with a virtual software class arranged to coordinate task and sub-task operations across the nodes of the distributed computing system. Other embodiments are described and claimed.
    Type: Application
    Filed: November 19, 2020
    Publication date: March 11, 2021
    Applicant: SAS Institute Inc.
    Inventors: Xilong Chen, Mark Roland Little
  • Patent number: 10929762
    Abstract: Data is classified using corrected semi-supervised data. Cluster centers are defined for unclassified observations. A class is determined for each cluster. A distance value is computed between a classified observation and each cluster center. When the class of the classified observation is not the class determined for the cluster center having a minimum distance, a first distance value is selected as the minimum distance, a second distance value is selected as the distance value computed to the cluster center having the class of the classified observation, a ratio value is computed between the second distance value and the first distance value, and the class of the classified observation is changed to the class determined for the cluster center having the minimum distance value when the computed ratio value satisfies a label correction threshold. A classification matrix is defined using corrected observations to determine the class for the unclassified observations.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: February 23, 2021
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Brett Alan Wujek
  • Publication number: 20210042659
    Abstract: Computer-based models can be developed, deployed, and managed in an automated manner. For example, a model building tool can be selected based on the model building tool being compatible with one or more parameters. A first machine-learning model can be generated using the model building tool and trained using a training dataset. The first machine-learning model can then be used to perform a task. Thereafter, a new model-building tool can be selected based on the new model-building tool being compatible with the one or more parameters. A second machine-learning model can be generated using the new model-building tool and trained using the training dataset. The accuracy of the first machine-learning model can be compared to the accuracy of the second machine-learning model. Based on the second machine-learning model being more accurate, the second machine-learning model can be used to perform the particular task rather than the first machine-learning model.
    Type: Application
    Filed: October 23, 2020
    Publication date: February 11, 2021
    Applicant: SAS Institute Inc.
    Inventors: Chengwen Robert Chu, Wenjie Bao, Glen Joseph Clingroth
  • Publication number: 20210042265
    Abstract: An apparatus includes a processor to: instantiate collection threads, data buffers of a queue, and aggregation threads; within each collection thread, assemble a row group from a subset of the multiple rows, reorganize the data values row-wise to columnar organization, and store the row group within a data buffer of the queue; operate the buffer queue as a FIFO buffer; within each aggregation thread, retrieve multiple row groups from multiple data buffers of the queue, assemble a data set part from the multiple row groups, transmit, to storage device(s) via a network, the data set part; and in response to each instance of retrieval of a row group from a data buffer of the buffer queue for use within an aggregation thread, analyze a level of availability of at least storage space within the node device to determine whether to dynamically adjust the quantity of data buffers of the buffer queue.
    Type: Application
    Filed: September 29, 2020
    Publication date: February 11, 2021
    Applicant: SAS Institute Inc.
    Inventor: Brian Payton Bowman
  • Publication number: 20210026805
    Abstract: An apparatus includes a processor to: instantiate data buffers of a queue, reading threads, and provision threads; within each reading thread, use an identifier provided in a data buffer of the queue to retrieve the corresponding data set part and part metadata from storage device(s), and store both within the data buffer; operate the queue as a (FIFO) buffer; within each provision thread, retrieve a row group from among multiple row groups and corresponding metadata from within the data buffer, use information in the metadata to decompress at least one column, and provide the data values of the row group to the requesting device or an application routine; and in response to each instance of storage of a data set part within a data buffer of the queue, analyze the availability of storage space and/or of processing resources to determine whether to dynamically adjust the quantity of reading threads.
    Type: Application
    Filed: September 30, 2020
    Publication date: January 28, 2021
    Applicant: SAS Institute Inc.
    Inventors: Brian Payton Bowman, Gordon Lyle Keener
  • Patent number: D915458
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: April 6, 2021
    Assignee: SAS Institute Inc.
    Inventors: Michael Ryan Chipley, Steven Todd Barlow
  • Patent number: D915459
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: April 6, 2021
    Assignee: SAS Institute Inc.
    Inventors: Michael Ryan Chipley, Steven Todd Barlow