Patents Assigned to SAS Institute
-
Publication number: 20200143253Abstract: An apparatus includes a processor to: provide a set of feature routines to a set of processor cores to detect features of a data set distributed thereamong; generate metadata indicative of the detected features; generate context data indicative of contextual aspects of the data set; provide the metadata and context data to each processor core, and distribute a set of suggestion models thereamong to enable derivation of a suggested subset of data preparation operations to be suggested to be performed on the data set; transmit indications of the suggested subset to a viewing device, and receive therefrom indications of a selected subset of data preparation operations selected to be performed; compare the selected and suggested subsets; and in response to differences therebetween, re-train at least one suggestion model of the set of suggestion models based at least on the combination of the metadata, context data and selected subset.Type: ApplicationFiled: December 24, 2019Publication date: May 7, 2020Applicant: SAS Institute Inc.Inventors: Nancy Anne Rausch, Roger Jay Barney, John P. Trawinski
-
Publication number: 20200143246Abstract: A pipeline system for time-series data forecasting using a distributed computing environment is disclosed herein. In one example, a pipeline for forecasting time series is generated. The pipeline represents a sequence of operations for processing the time series to produce modeling results such as forecasts of the time series. The pipeline includes a segmentation operation for categorizing the time series into multiple demand classes based on demand characteristics of the time series. The pipeline also includes multiple sub-pipelines corresponding to the multiple demand classes. Each of the sub-pipelines applies a model strategy to the time series in the corresponding demand class. The model strategy is selected from multiple candidate model strategies based on predetermined relationships between the demand classes and the candidate model strategies. The pipeline is executed to determine the modeling results for the time series.Type: ApplicationFiled: December 24, 2019Publication date: May 7, 2020Applicant: SAS Institute Inc.Inventors: YUE LI, MICHELE ANGELO TROVERO, PHILLIP MARK HELMKAMP, JERZY MICHAL BRZEZICKI, MACKLIN CARTER FRAZIER, TIMOTHY PATRICK HALEY, RANDY THOMAS SOLOMONSON, SANGMIN KIM, STEVEN CHRISTOPHER MILLS, YUNG-HSIN CHIEN, RON TRAVIS HODGIN, JINGRUI XIE
-
Patent number: 10642610Abstract: In some examples, computing devices can partition timestamped data into groups. The computing devices can then distribute the timestamped data based on the groups. The computing devices can also obtain copies of a script configured to process the timestamped data, such that each computing device receives a copy of the script. The computing devices can determine one or more code segments associated with the groups based on content of the script. The one or more code segments can be in one or more programming languages that are different than a programming language of the script. The computing devices can then run the copies of the script to process the timestamped data within the groups. This may involve interacting with one or more job servers configured to run the one or more code segments associated with the groups.Type: GrantFiled: November 27, 2019Date of Patent: May 5, 2020Assignee: SAS Institute Inc.Inventors: Michael James Leonard, Thiago Santos Quirino, Edward Tilden Blair, Jennifer Leigh Sloan Beeman, David Bruce Elsheimer, Javier Delgado
-
Patent number: 10635947Abstract: A computer trains a classification model. (A) An estimation vector is computed for each observation vector using a weight value, a mean vector, and a covariance matrix. The estimation vector includes a probability value for each class of a plurality of classes for each observation vector that indicates a likelihood that each observation vector is associated with each class. A subset of the plurality of observation vectors has a predefined class assignment. (B) The weight value is updated using the computed estimation vector. (C) The mean vector for each class is updated using the computed estimation vector. (D) The covariance matrix for each class is updated using the computed estimation vector. (E) A convergence parameter value is computed. (F) A classification model is trained by repeating (A) to (E) until the computed convergence parameter value indicates the mean vector for each class of the plurality of classes is converged.Type: GrantFiled: September 30, 2019Date of Patent: April 28, 2020Assignee: SAS Institute Inc.Inventors: Xu Chen, Yingjian Wang, Saratendu Sethi
-
Patent number: 10628755Abstract: A computing system trains a clustering model. A responsibility parameter vector includes a probability value of a cluster membership in each cluster for each respective observation vector. (A) Parameter values for a normal-Wishart distribution are computed for each cluster using a mean value, an inverse precision parameter value, each observation vector, and each respective responsibility parameter vector. (B) The responsibility parameter vector is updated using a multivariate student t-distribution function with the computed parameter values for the normal-Wishart distribution and a respective observation vector of the observation vectors as input values. (C) A convergence parameter value is computed. (D) (A) to (C) are repeated until the computed convergence parameter value indicates the responsibility parameter vector defined for each observation vector is converged. A cluster membership is determined for each observation vector using a respective, updated responsibility parameter vector.Type: GrantFiled: September 10, 2019Date of Patent: April 21, 2020Assignee: SAS Institute Inc.Inventor: Yingjian Wang
-
Patent number: 10600005Abstract: A computing device selects a feature set and hyperparameters for a machine learning model to predict a value for a characteristic in a scoring dataset. A number of training model iterations is determined. A unique evaluation pair is selected for each iteration that indicates a feature set selected from feature sets and a hyperparameter configuration selected from hyperparameter configurations. A machine learning model is trained using each unique evaluation pair. Each trained machine learning model is validated to compute a performance measure value. An estimation model is trained with the feature set, the hyperparameter configuration, and the performance measure value computed for unique evaluation pair. The trained estimation model is executed to compute the performance measure value for each unique evaluation pair. A final feature set and a final hyperparameter configuration are selected based on the computed performance measure value.Type: GrantFiled: May 14, 2019Date of Patent: March 24, 2020Assignee: SAS Institute Inc.Inventors: Funda Gunes, Wendy Ann Czika, Susan Edwards Haller, Udo Sglavo
-
Patent number: 10586165Abstract: A computing system trains a clustering model. A responsibility parameter vector is initialized for each observation vector that includes a probability value of a cluster membership in each cluster. (A) Beta distribution parameter values are computed for each cluster. (B) Parameter values are computed for a normal-Wishart distribution for each cluster. (C) Each responsibility parameter vector defined for each observation vector is updated using the computed beta distribution parameter values, the computed parameter values for the normal-Wishart distribution, and a respective observation vector of the plurality of observation vectors. (D) A convergence parameter value is computed. (E) (A) to (D) are repeated until the computed convergence parameter value indicates the responsibility parameter vector defined for each observation vector is converged. A cluster membership is determined for each observation vector using a respective, updated responsibility parameter vector.Type: GrantFiled: September 6, 2019Date of Patent: March 10, 2020Assignee: SAS Institute Inc.Inventor: Yingjian Wang
-
Patent number: 10565085Abstract: Metadata received from each worker computing device describes EDF estimates for samples of marginal variables stored on each respective worker computing device. Combinations of the EDF estimates are enumerated and assigned to each worker computing device based on the metadata. A request to compute outcome expectation measure values for an outcome expectation measure is initiated to each worker computing device based on the assigned combinations. The outcome expectation measure values computed by each worker computing device are received from each respective worker computing device. The received outcome expectation measure values are accumulated for the outcome expectation measure. A mean value and a standard deviation value are computed for the outcome expectation measure from the accumulated, received outcome expectation measure values. The computed mean and standard deviation values for the outcome expectation measure are output to represent an expected outcome based on the marginal variables.Type: GrantFiled: June 6, 2019Date of Patent: February 18, 2020Assignee: SAS Institute, Inc.Inventor: Mahesh V. Joshi
-
Patent number: 10565528Abstract: A computing device determines a sparse feature representation for a machine learning model. Landmark observation vectors are randomly selected. Neighbor observation vectors are randomly selected that are less than a predefined distance from a selected landmark observation vector. The observation vectors are projected into a neighborhood subspace defined by principal components computed for the neighbor observation vectors. A distance vector includes a distance value computed between each landmark observation vector and each observation vector of the projected observation vectors. Nearest landmark observation vectors are selected from the landmark observation vectors for each observation vector. A second distance vector that includes a second distance value computed between each observation vector and each landmark observation vector is added to a feature distance matrix, where the second distance value is zero for each landmark observation vector not included in the nearest landmark observation vectors.Type: GrantFiled: December 17, 2018Date of Patent: February 18, 2020Assignee: SAS Institute Inc.Inventors: Namita Dilip Lokare, Jorge Manuel Gomes da Silva, Ilknur Kaynar Kabul
-
Patent number: 10559308Abstract: A system determines user intent from text. A conversation element is received. An intent is determined by matching a domain independent relationship and a domain dependent term determined from the received conversation element to an intent included in an intent database that stores a plurality of intents and by inputting the matched intent into a trained classifier that computes a likelihood that the matched intent is the intent of the received conversation element. An action is determined based on the determined intent. A response to the received conversation element is generated based on the determined action and output.Type: GrantFiled: June 7, 2019Date of Patent: February 11, 2020Assignee: SAS Institute Inc.Inventors: Jared Michael Dean Smythe, David Blake Styles, Richard Welland Crowell
-
Publication number: 20200042904Abstract: A system can obtain observations from a dataset. The system can generate a set of training partitions based on the observations and generate an ensemble of machine-learning models based on the set of training partitions. The system can then receive new data and detect whether the new data is indicative of the event using the ensemble. In some cases, the system can update the ensemble by providing the new data as input to an unsupervised machine-learning model that is separate from the ensemble of machine-learning models; receiving an output from the unsupervised machine-learning model indicating whether or not the new data is indicative of the event; incorporating a new observation into the dataset indicating whether or not the new data is indicative of the event based on the output from the unsupervised machine-learning model; and updating the ensemble based on the dataset with the new observation.Type: ApplicationFiled: August 2, 2019Publication date: February 6, 2020Applicant: SAS Institute Inc.Inventors: Yue Qi, Jeffrey Todd Miller, JR., Thomas Francis Mutdosch, Rory David Ness MacKenzie, Iain Douglas Jackson, Peter Rowland Eastwood, Ryan Gillespie, Adam Michael Ames, Andrew John Knotts, Robert Wayne Thompson
-
Patent number: 10535422Abstract: A computing device obtains a metric N indicating a quantity of a plurality of test cases for an output design of an experiment Each element of a test case of the output design is a test condition for testing one of factors for the experiment. The computing device obtains input indicating a quantity p of an indicated plurality of factors for the output design. The computing device determines whether there are stored instructions for generating an initial screening design for the experiment. The computing device responsive to determining that there are stored instructions, selects, using the stored instructions, the initial screening design for the experiment. The computing device determines whether to modify the initial screening design based on modification criteria comprising a secondary criterion, the metric N, and/or the quantity p. The computing device outputs an indication of the updated screening design for the output design of the experiment.Type: GrantFiled: July 10, 2019Date of Patent: January 14, 2020Assignee: SAS Institute Inc.Inventors: Ryan Adam Lekivetz, Caleb Bridges King, Joseph Albert Morgan, Bradley Allen Jones
-
Patent number: 10521734Abstract: A computing device predicts an event or classifies an observation. A trained labeling model is executed with unlabeled observations to define a label distribution probability matrix used to select a label for each observation. Unique combinations of observations selected from the unlabeled observations are defined. A marginal distribution value is computed from the label distribution probability matrix. A joint distribution value is computed between observations included in each combination. A mutual information value is computed for each combination as a combination of the marginal distribution value and the joint distribution value computed for the respective combination. A predefined number of observation vector combinations is selected from the combinations that have highest values for the computed mutual information value. Labeled observation vectors are updated to include each observation vector included in the selected observation vector combinations with a respective obtained label.Type: GrantFiled: May 7, 2019Date of Patent: December 31, 2019Assignee: SAS Institute Inc.Inventors: Xu Chen, Jorge Manuel Gomes da Silva
-
Publication number: 20190394083Abstract: A pipeline system for time-series data forecasting using a distributed computing environment is disclosed herein. In one example, a pipeline for forecasting time series is generated. The pipeline represents a sequence of operations for processing the time series to produce forecasts. The sequence of operations include model strategy operations for applying various model strategies to the time series to determine error distributions corresponding to the model strategies. The sequence of operations further include a model-strategy comparison operation for determining which of the model strategies is a champion model strategy for the plurality of time series based on the error distributions of the model strategies. The pipeline is executed to determine the champion model strategy for the time series.Type: ApplicationFiled: June 26, 2019Publication date: December 26, 2019Applicant: SAS Institute Inc.Inventors: Udo Vincenzo Sglavo, Phillip Mark Helmkamp, Jerzy Michal Brzezicki, Timothy Patrick Haley, Sujatha Pothireddy
-
Patent number: 10509847Abstract: A computing device determines hyperparameter values for outlier detection. An LOF score is computed for observation vectors using a neighborhood size value. Outlier observation vectors are selected from the observation vectors. Outlier mean and outlier variance values are computed of the LOF scores of the outlier observation vectors. Inlier observation vectors are selected from the observation vectors that have highest computed LOF scores of the observation vectors that are not included in the outlier observation vectors. Inlier mean and inlier variance values are computed of the LOF scores of the inlier observation vectors. A difference value is computed using the outlier mean and variance values and the inlier mean and variance values. The process is repeated with each neighborhood size value of a plurality of neighborhood size values. A tuned neighborhood size value is selected as the neighborhood size value associated with an extremum value of the difference value.Type: GrantFiled: May 14, 2019Date of Patent: December 17, 2019Assignee: SAS Institute Inc.Inventors: Zekun Xu, Deovrat Vijay Kakde, Arin Chaudhuri
-
Patent number: 10503846Abstract: A computing device generates representative points, each representing a potential design point for a design space. The computing device determines for the design space primary clusters, a categorical factor, and at least two levels for the categorical factor. The computing device, for each of the primary clusters, selects a design point from each sub-cluster of the respective primary cluster. The computing device, for each of the primary clusters, allocates the at least two levels of the categorical factor, such that a level of the at least two levels is allocated to each selected design point in the respective primary cluster. The computing device modifies an initial sub-design that represents the selected design points allocated a given level of the categorical factor by increasing separation between design points allocated a same level of the categorical factor. The computing device outputs to an output device a modified design for the design space.Type: GrantFiled: October 8, 2018Date of Patent: December 10, 2019Assignee: SAS Institute Inc.Inventors: Ryan Adam Lekivetz, Joseph Albert Morgan, Bradley Allen Jones
-
Publication number: 20190370836Abstract: Managing the amount of computing resources required to execute a process for determining values of a parameter associated with an object over a lifetime of the object is disclosed here. In one example, a data structure is generated. The data structure including candidate values for the parameter that comply with constraints assigned to multiple dates occurring during the lifetime of the object. The data structure is pruned by aggregating actionable periods. A first combination of candidate values associated with the aggregated actionable periods is determined that results in the minimum amount of the object being provided to the users during the lifetime. A second combination of candidate values associated with the aggregated actionable periods is determined that satisfies a return objective. The second combination of values are usable by a remote computing device to implement a value schedule for the object.Type: ApplicationFiled: May 29, 2019Publication date: December 5, 2019Applicant: SAS Institute Inc.Inventors: Natalia Summerville, Ivan Borges Oliveira, Scott Shuler, Golbarg Tutunchi, Fang Liang
-
Publication number: 20190354410Abstract: Exemplary embodiments relate to systems for building a model of changes to data items when information the data items is limited or not directly observed. Exemplary embodiments allow properties of the data items to be inferred using a single data structure and creates a highly granular log of changes to the data item. Using this data structure, the time-varying nature of changes to the data item can be determined. The data structure may be used to identify characteristics associated with a regularly-performed action, to examine how adherence to the action affects a system, and to identify outcomes of non-adherence. Fungible data items may be mapped to a remediable condition or remedy class. This may be accomplished by automatically deriving conditions and remedial information from available information, matching the conditions to remedial classes or types via a customizable mapping, and then calculating adherence for the condition on the available information.Type: ApplicationFiled: August 5, 2019Publication date: November 21, 2019Applicant: SAS Institute Inc.Inventors: Ruth Ellen Baldasaro, Jennifer Lee Hargrove, Edward Lew Rowe, Emily Louise Chapman-McQuiston
-
Patent number: 10482376Abstract: The computing device generates a classification model providing prediction data indicating predicted users in a target population who will respond to a target stimulus according to a predefined user response category. The computing device displays in GUI a graphical representation of a generated classification model and a plurality of options each specifying one of different objectives for determining a proportion of users in the target population to expose to the target stimulus. The computing device predicts proportion data indicating the proportion of users in the target population to expose to the target stimulus based on the determined location of the cut-off. The computing device issues one or more indications as to whether to use the classification model as a basis for exposing the proportion of users in the target population to the target stimulus according to the proportion data.Type: GrantFiled: December 19, 2018Date of Patent: November 19, 2019Assignee: SAS Institute Inc.Inventors: Amrut Shantaram Vaze, Michael Ryan Chipley, Leigh Anne Ward, Ashish Mishra, Steven Todd Barlow, Suchitra Balaso Chikhalkar, Sameer Waman Tatke
-
Patent number: 10474959Abstract: A computing device computes a weight matrix to compute a predicted value. For each of a plurality of related tasks, an augmented observation matrix, a plug-in autocovariance matrix, and a plug-in covariance vector are computed. A weight matrix used to predict the characteristic for each of a plurality of variables and each of a plurality of related tasks is computed. (a) and (b) are repeated with the computed updated weight matrix as the computed weight matrix until a convergence criterion is satisfied: (a) a gradient descent matrix is computed using the computed plug-in autocovariance matrix, the computed plug-in covariance vector, the computed weight matrix, and a predefined relationship matrix, wherein the predefined relationship matrix defines a relationship between the plurality of related tasks, and (b) an updated weight matrix is computed using the computed gradient descent matrix.Type: GrantFiled: June 19, 2019Date of Patent: November 12, 2019Assignee: SAS Institute Inc.Inventors: Xin Jiang Hunt, Saba Emrani, Jorge Manuel Gomes da Silva, Ilknur Kaynar Kabul