Patents Assigned to SAS Institute
  • Publication number: 20200143253
    Abstract: An apparatus includes a processor to: provide a set of feature routines to a set of processor cores to detect features of a data set distributed thereamong; generate metadata indicative of the detected features; generate context data indicative of contextual aspects of the data set; provide the metadata and context data to each processor core, and distribute a set of suggestion models thereamong to enable derivation of a suggested subset of data preparation operations to be suggested to be performed on the data set; transmit indications of the suggested subset to a viewing device, and receive therefrom indications of a selected subset of data preparation operations selected to be performed; compare the selected and suggested subsets; and in response to differences therebetween, re-train at least one suggestion model of the set of suggestion models based at least on the combination of the metadata, context data and selected subset.
    Type: Application
    Filed: December 24, 2019
    Publication date: May 7, 2020
    Applicant: SAS Institute Inc.
    Inventors: Nancy Anne Rausch, Roger Jay Barney, John P. Trawinski
  • Publication number: 20200143246
    Abstract: A pipeline system for time-series data forecasting using a distributed computing environment is disclosed herein. In one example, a pipeline for forecasting time series is generated. The pipeline represents a sequence of operations for processing the time series to produce modeling results such as forecasts of the time series. The pipeline includes a segmentation operation for categorizing the time series into multiple demand classes based on demand characteristics of the time series. The pipeline also includes multiple sub-pipelines corresponding to the multiple demand classes. Each of the sub-pipelines applies a model strategy to the time series in the corresponding demand class. The model strategy is selected from multiple candidate model strategies based on predetermined relationships between the demand classes and the candidate model strategies. The pipeline is executed to determine the modeling results for the time series.
    Type: Application
    Filed: December 24, 2019
    Publication date: May 7, 2020
    Applicant: SAS Institute Inc.
    Inventors: YUE LI, MICHELE ANGELO TROVERO, PHILLIP MARK HELMKAMP, JERZY MICHAL BRZEZICKI, MACKLIN CARTER FRAZIER, TIMOTHY PATRICK HALEY, RANDY THOMAS SOLOMONSON, SANGMIN KIM, STEVEN CHRISTOPHER MILLS, YUNG-HSIN CHIEN, RON TRAVIS HODGIN, JINGRUI XIE
  • Patent number: 10642610
    Abstract: In some examples, computing devices can partition timestamped data into groups. The computing devices can then distribute the timestamped data based on the groups. The computing devices can also obtain copies of a script configured to process the timestamped data, such that each computing device receives a copy of the script. The computing devices can determine one or more code segments associated with the groups based on content of the script. The one or more code segments can be in one or more programming languages that are different than a programming language of the script. The computing devices can then run the copies of the script to process the timestamped data within the groups. This may involve interacting with one or more job servers configured to run the one or more code segments associated with the groups.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: May 5, 2020
    Assignee: SAS Institute Inc.
    Inventors: Michael James Leonard, Thiago Santos Quirino, Edward Tilden Blair, Jennifer Leigh Sloan Beeman, David Bruce Elsheimer, Javier Delgado
  • Patent number: 10635947
    Abstract: A computer trains a classification model. (A) An estimation vector is computed for each observation vector using a weight value, a mean vector, and a covariance matrix. The estimation vector includes a probability value for each class of a plurality of classes for each observation vector that indicates a likelihood that each observation vector is associated with each class. A subset of the plurality of observation vectors has a predefined class assignment. (B) The weight value is updated using the computed estimation vector. (C) The mean vector for each class is updated using the computed estimation vector. (D) The covariance matrix for each class is updated using the computed estimation vector. (E) A convergence parameter value is computed. (F) A classification model is trained by repeating (A) to (E) until the computed convergence parameter value indicates the mean vector for each class of the plurality of classes is converged.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: April 28, 2020
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Yingjian Wang, Saratendu Sethi
  • Patent number: 10628755
    Abstract: A computing system trains a clustering model. A responsibility parameter vector includes a probability value of a cluster membership in each cluster for each respective observation vector. (A) Parameter values for a normal-Wishart distribution are computed for each cluster using a mean value, an inverse precision parameter value, each observation vector, and each respective responsibility parameter vector. (B) The responsibility parameter vector is updated using a multivariate student t-distribution function with the computed parameter values for the normal-Wishart distribution and a respective observation vector of the observation vectors as input values. (C) A convergence parameter value is computed. (D) (A) to (C) are repeated until the computed convergence parameter value indicates the responsibility parameter vector defined for each observation vector is converged. A cluster membership is determined for each observation vector using a respective, updated responsibility parameter vector.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: April 21, 2020
    Assignee: SAS Institute Inc.
    Inventor: Yingjian Wang
  • Patent number: 10600005
    Abstract: A computing device selects a feature set and hyperparameters for a machine learning model to predict a value for a characteristic in a scoring dataset. A number of training model iterations is determined. A unique evaluation pair is selected for each iteration that indicates a feature set selected from feature sets and a hyperparameter configuration selected from hyperparameter configurations. A machine learning model is trained using each unique evaluation pair. Each trained machine learning model is validated to compute a performance measure value. An estimation model is trained with the feature set, the hyperparameter configuration, and the performance measure value computed for unique evaluation pair. The trained estimation model is executed to compute the performance measure value for each unique evaluation pair. A final feature set and a final hyperparameter configuration are selected based on the computed performance measure value.
    Type: Grant
    Filed: May 14, 2019
    Date of Patent: March 24, 2020
    Assignee: SAS Institute Inc.
    Inventors: Funda Gunes, Wendy Ann Czika, Susan Edwards Haller, Udo Sglavo
  • Patent number: 10586165
    Abstract: A computing system trains a clustering model. A responsibility parameter vector is initialized for each observation vector that includes a probability value of a cluster membership in each cluster. (A) Beta distribution parameter values are computed for each cluster. (B) Parameter values are computed for a normal-Wishart distribution for each cluster. (C) Each responsibility parameter vector defined for each observation vector is updated using the computed beta distribution parameter values, the computed parameter values for the normal-Wishart distribution, and a respective observation vector of the plurality of observation vectors. (D) A convergence parameter value is computed. (E) (A) to (D) are repeated until the computed convergence parameter value indicates the responsibility parameter vector defined for each observation vector is converged. A cluster membership is determined for each observation vector using a respective, updated responsibility parameter vector.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: March 10, 2020
    Assignee: SAS Institute Inc.
    Inventor: Yingjian Wang
  • Patent number: 10565085
    Abstract: Metadata received from each worker computing device describes EDF estimates for samples of marginal variables stored on each respective worker computing device. Combinations of the EDF estimates are enumerated and assigned to each worker computing device based on the metadata. A request to compute outcome expectation measure values for an outcome expectation measure is initiated to each worker computing device based on the assigned combinations. The outcome expectation measure values computed by each worker computing device are received from each respective worker computing device. The received outcome expectation measure values are accumulated for the outcome expectation measure. A mean value and a standard deviation value are computed for the outcome expectation measure from the accumulated, received outcome expectation measure values. The computed mean and standard deviation values for the outcome expectation measure are output to represent an expected outcome based on the marginal variables.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: February 18, 2020
    Assignee: SAS Institute, Inc.
    Inventor: Mahesh V. Joshi
  • Patent number: 10565528
    Abstract: A computing device determines a sparse feature representation for a machine learning model. Landmark observation vectors are randomly selected. Neighbor observation vectors are randomly selected that are less than a predefined distance from a selected landmark observation vector. The observation vectors are projected into a neighborhood subspace defined by principal components computed for the neighbor observation vectors. A distance vector includes a distance value computed between each landmark observation vector and each observation vector of the projected observation vectors. Nearest landmark observation vectors are selected from the landmark observation vectors for each observation vector. A second distance vector that includes a second distance value computed between each observation vector and each landmark observation vector is added to a feature distance matrix, where the second distance value is zero for each landmark observation vector not included in the nearest landmark observation vectors.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: February 18, 2020
    Assignee: SAS Institute Inc.
    Inventors: Namita Dilip Lokare, Jorge Manuel Gomes da Silva, Ilknur Kaynar Kabul
  • Patent number: 10559308
    Abstract: A system determines user intent from text. A conversation element is received. An intent is determined by matching a domain independent relationship and a domain dependent term determined from the received conversation element to an intent included in an intent database that stores a plurality of intents and by inputting the matched intent into a trained classifier that computes a likelihood that the matched intent is the intent of the received conversation element. An action is determined based on the determined intent. A response to the received conversation element is generated based on the determined action and output.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: February 11, 2020
    Assignee: SAS Institute Inc.
    Inventors: Jared Michael Dean Smythe, David Blake Styles, Richard Welland Crowell
  • Publication number: 20200042904
    Abstract: A system can obtain observations from a dataset. The system can generate a set of training partitions based on the observations and generate an ensemble of machine-learning models based on the set of training partitions. The system can then receive new data and detect whether the new data is indicative of the event using the ensemble. In some cases, the system can update the ensemble by providing the new data as input to an unsupervised machine-learning model that is separate from the ensemble of machine-learning models; receiving an output from the unsupervised machine-learning model indicating whether or not the new data is indicative of the event; incorporating a new observation into the dataset indicating whether or not the new data is indicative of the event based on the output from the unsupervised machine-learning model; and updating the ensemble based on the dataset with the new observation.
    Type: Application
    Filed: August 2, 2019
    Publication date: February 6, 2020
    Applicant: SAS Institute Inc.
    Inventors: Yue Qi, Jeffrey Todd Miller, JR., Thomas Francis Mutdosch, Rory David Ness MacKenzie, Iain Douglas Jackson, Peter Rowland Eastwood, Ryan Gillespie, Adam Michael Ames, Andrew John Knotts, Robert Wayne Thompson
  • Patent number: 10535422
    Abstract: A computing device obtains a metric N indicating a quantity of a plurality of test cases for an output design of an experiment Each element of a test case of the output design is a test condition for testing one of factors for the experiment. The computing device obtains input indicating a quantity p of an indicated plurality of factors for the output design. The computing device determines whether there are stored instructions for generating an initial screening design for the experiment. The computing device responsive to determining that there are stored instructions, selects, using the stored instructions, the initial screening design for the experiment. The computing device determines whether to modify the initial screening design based on modification criteria comprising a secondary criterion, the metric N, and/or the quantity p. The computing device outputs an indication of the updated screening design for the output design of the experiment.
    Type: Grant
    Filed: July 10, 2019
    Date of Patent: January 14, 2020
    Assignee: SAS Institute Inc.
    Inventors: Ryan Adam Lekivetz, Caleb Bridges King, Joseph Albert Morgan, Bradley Allen Jones
  • Patent number: 10521734
    Abstract: A computing device predicts an event or classifies an observation. A trained labeling model is executed with unlabeled observations to define a label distribution probability matrix used to select a label for each observation. Unique combinations of observations selected from the unlabeled observations are defined. A marginal distribution value is computed from the label distribution probability matrix. A joint distribution value is computed between observations included in each combination. A mutual information value is computed for each combination as a combination of the marginal distribution value and the joint distribution value computed for the respective combination. A predefined number of observation vector combinations is selected from the combinations that have highest values for the computed mutual information value. Labeled observation vectors are updated to include each observation vector included in the selected observation vector combinations with a respective obtained label.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: December 31, 2019
    Assignee: SAS Institute Inc.
    Inventors: Xu Chen, Jorge Manuel Gomes da Silva
  • Publication number: 20190394083
    Abstract: A pipeline system for time-series data forecasting using a distributed computing environment is disclosed herein. In one example, a pipeline for forecasting time series is generated. The pipeline represents a sequence of operations for processing the time series to produce forecasts. The sequence of operations include model strategy operations for applying various model strategies to the time series to determine error distributions corresponding to the model strategies. The sequence of operations further include a model-strategy comparison operation for determining which of the model strategies is a champion model strategy for the plurality of time series based on the error distributions of the model strategies. The pipeline is executed to determine the champion model strategy for the time series.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 26, 2019
    Applicant: SAS Institute Inc.
    Inventors: Udo Vincenzo Sglavo, Phillip Mark Helmkamp, Jerzy Michal Brzezicki, Timothy Patrick Haley, Sujatha Pothireddy
  • Patent number: 10509847
    Abstract: A computing device determines hyperparameter values for outlier detection. An LOF score is computed for observation vectors using a neighborhood size value. Outlier observation vectors are selected from the observation vectors. Outlier mean and outlier variance values are computed of the LOF scores of the outlier observation vectors. Inlier observation vectors are selected from the observation vectors that have highest computed LOF scores of the observation vectors that are not included in the outlier observation vectors. Inlier mean and inlier variance values are computed of the LOF scores of the inlier observation vectors. A difference value is computed using the outlier mean and variance values and the inlier mean and variance values. The process is repeated with each neighborhood size value of a plurality of neighborhood size values. A tuned neighborhood size value is selected as the neighborhood size value associated with an extremum value of the difference value.
    Type: Grant
    Filed: May 14, 2019
    Date of Patent: December 17, 2019
    Assignee: SAS Institute Inc.
    Inventors: Zekun Xu, Deovrat Vijay Kakde, Arin Chaudhuri
  • Patent number: 10503846
    Abstract: A computing device generates representative points, each representing a potential design point for a design space. The computing device determines for the design space primary clusters, a categorical factor, and at least two levels for the categorical factor. The computing device, for each of the primary clusters, selects a design point from each sub-cluster of the respective primary cluster. The computing device, for each of the primary clusters, allocates the at least two levels of the categorical factor, such that a level of the at least two levels is allocated to each selected design point in the respective primary cluster. The computing device modifies an initial sub-design that represents the selected design points allocated a given level of the categorical factor by increasing separation between design points allocated a same level of the categorical factor. The computing device outputs to an output device a modified design for the design space.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: December 10, 2019
    Assignee: SAS Institute Inc.
    Inventors: Ryan Adam Lekivetz, Joseph Albert Morgan, Bradley Allen Jones
  • Publication number: 20190370836
    Abstract: Managing the amount of computing resources required to execute a process for determining values of a parameter associated with an object over a lifetime of the object is disclosed here. In one example, a data structure is generated. The data structure including candidate values for the parameter that comply with constraints assigned to multiple dates occurring during the lifetime of the object. The data structure is pruned by aggregating actionable periods. A first combination of candidate values associated with the aggregated actionable periods is determined that results in the minimum amount of the object being provided to the users during the lifetime. A second combination of candidate values associated with the aggregated actionable periods is determined that satisfies a return objective. The second combination of values are usable by a remote computing device to implement a value schedule for the object.
    Type: Application
    Filed: May 29, 2019
    Publication date: December 5, 2019
    Applicant: SAS Institute Inc.
    Inventors: Natalia Summerville, Ivan Borges Oliveira, Scott Shuler, Golbarg Tutunchi, Fang Liang
  • Publication number: 20190354410
    Abstract: Exemplary embodiments relate to systems for building a model of changes to data items when information the data items is limited or not directly observed. Exemplary embodiments allow properties of the data items to be inferred using a single data structure and creates a highly granular log of changes to the data item. Using this data structure, the time-varying nature of changes to the data item can be determined. The data structure may be used to identify characteristics associated with a regularly-performed action, to examine how adherence to the action affects a system, and to identify outcomes of non-adherence. Fungible data items may be mapped to a remediable condition or remedy class. This may be accomplished by automatically deriving conditions and remedial information from available information, matching the conditions to remedial classes or types via a customizable mapping, and then calculating adherence for the condition on the available information.
    Type: Application
    Filed: August 5, 2019
    Publication date: November 21, 2019
    Applicant: SAS Institute Inc.
    Inventors: Ruth Ellen Baldasaro, Jennifer Lee Hargrove, Edward Lew Rowe, Emily Louise Chapman-McQuiston
  • Patent number: 10482376
    Abstract: The computing device generates a classification model providing prediction data indicating predicted users in a target population who will respond to a target stimulus according to a predefined user response category. The computing device displays in GUI a graphical representation of a generated classification model and a plurality of options each specifying one of different objectives for determining a proportion of users in the target population to expose to the target stimulus. The computing device predicts proportion data indicating the proportion of users in the target population to expose to the target stimulus based on the determined location of the cut-off. The computing device issues one or more indications as to whether to use the classification model as a basis for exposing the proportion of users in the target population to the target stimulus according to the proportion data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: November 19, 2019
    Assignee: SAS Institute Inc.
    Inventors: Amrut Shantaram Vaze, Michael Ryan Chipley, Leigh Anne Ward, Ashish Mishra, Steven Todd Barlow, Suchitra Balaso Chikhalkar, Sameer Waman Tatke
  • Patent number: 10474959
    Abstract: A computing device computes a weight matrix to compute a predicted value. For each of a plurality of related tasks, an augmented observation matrix, a plug-in autocovariance matrix, and a plug-in covariance vector are computed. A weight matrix used to predict the characteristic for each of a plurality of variables and each of a plurality of related tasks is computed. (a) and (b) are repeated with the computed updated weight matrix as the computed weight matrix until a convergence criterion is satisfied: (a) a gradient descent matrix is computed using the computed plug-in autocovariance matrix, the computed plug-in covariance vector, the computed weight matrix, and a predefined relationship matrix, wherein the predefined relationship matrix defines a relationship between the plurality of related tasks, and (b) an updated weight matrix is computed using the computed gradient descent matrix.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: November 12, 2019
    Assignee: SAS Institute Inc.
    Inventors: Xin Jiang Hunt, Saba Emrani, Jorge Manuel Gomes da Silva, Ilknur Kaynar Kabul