Patents Assigned to SAS Institute
-
Patent number: 9519679Abstract: An apparatus includes a renaming component to homogenized query instructions for retrieving data items from a data set organized using index labels by identifying a declaration instruction associating an object thereof with an index label, replacing the name provided to the object the with an archetypal name based on the index label, and generating change data associating the name with the archetypal name; a hashing component to take an instruction hash of the homogenized instructions; a cache control routine to find a matching instruction hash corresponding to results of earlier database queries in a results cache; and a reversal routine to, in response finding a matching instruction hash, retrieve a cached result from the results cache associated with the matching instruction hash, and replace a name of a different object therein based on the change data and the query instructions to generate a new result of the new database query.Type: GrantFiled: September 21, 2015Date of Patent: December 13, 2016Assignee: SAS Institute Inc.Inventors: Kenneth Tolman, Kimberly Buckler Botha, Paul Anthony Smiley, David R. Henderson, Andrew Anderson
-
Publication number: 20160350396Abstract: In accordance with the teachings described herein, systems and methods are provided for estimating or determining quantiles for data stored in a distributed system. In one embodiment, an instruction is received to estimate or determine a specified quantile for a variate in a set of data stored at a plurality of nodes in the distributed system. A plurality of data bins for the variate are defined that are each associated with a different range of data values in the set of data. Lower and upper quantile bounds for each of the plurality of data bins are determined based on the total number of data values that fall within each of the plurality of data bins. The specified quantile is estimated or determined based on an identified one of the plurality of data bins that includes the specified quantile based on the lower and upper quantile bounds.Type: ApplicationFiled: July 15, 2016Publication date: December 1, 2016Applicant: SAS Institute Inc.Inventors: Guy Blanc, Georges H. Guirguis, Xiangqian Hu, Guixian Lin, Scott Pope
-
Publication number: 20160350646Abstract: Electronic communications can be normalized using a neural network. For example, a noncanonical communication that includes multiple terms can be received. The noncanonical communication can be preprocessed by (I) generating a vector including multiple characters from a term of the multiple terms; and (II) repeating a substring of the term in the vector such that a last character of the substring is positioned in a last position in the vector. The vector can be transmitted to a neural network configured to receive the vector and generate multiple probabilities based on the vector. A normalized version of the noncanonical communication can be determined using one or more of the multiple probabilities generated by the neural network. Whether the normalized version of the noncanonical communication should be outputted can also be determined using at least one of the multiple probabilities generated by the neural network.Type: ApplicationFiled: June 7, 2016Publication date: December 1, 2016Applicant: SAS Institute Inc.Inventors: Samuel Paul Leeman-Munk, James Allen Cox
-
Patent number: 9507833Abstract: In accordance with the teachings described herein, systems and methods are provided for estimating quantiles for data stored in a distributed system. In one embodiment, an instruction is received to estimate a specified quantile for a variate in a set of data stored at a plurality of nodes in the distributed system. A plurality of data bins for the variate are defined that are each associated with a different range of data values in the set of data. Lower and upper quantile bounds for each of the plurality of data bins are determined based on the total number of data values that fall within each of the plurality of data bins. The specified quantile is estimated based on an identified one of the plurality of data bins that includes the specified quantile based on the lower and upper quantile bounds.Type: GrantFiled: April 29, 2016Date of Patent: November 29, 2016Assignee: SAS Institute Inc.Inventors: Georges H. Guirguis, Scott Pope, Oliver Schabenberger
-
Publication number: 20160342742Abstract: An apparatus includes a processor and storage to store instructions that cause the processor to identify at least one correlation between a diagnosis group and a medication class for each patient of a first set of patients to derive a set of models for each diagnosis group that correlates the diagnosis group to at least one medication class based on the at least one identified correlation; and for each patient of a second set of patients, employ each model of each set of models to make at least one prediction of at least one diagnosis group as indicated in the corresponding diagnosis group record based on at least one medication class indicated in the corresponding medication class record, and compare the at least one prediction to the corresponding diagnosis group record to derive a tally of at least one of true positives or false positives for each prediction.Type: ApplicationFiled: May 3, 2016Publication date: November 24, 2016Applicant: SAS Institute Inc.Inventors: Emily Chapman-McQuiston, Diane Emerton, Ruth Baldasaro, Daniel Kelly
-
Patent number: 9501522Abstract: This disclosure describes a method, system and computer-program product for parallelized feature selection. The method, system and computer-program product may be used to access a first set of features, wherein the first set of features includes multiple features, wherein the features are characterized by a variance measure, and wherein accessing the first set of features includes using a computing system to access the features, determine components of a covariance matrix, the components of the covariance matrix indicating a covariance with respect to pairs of features in the first set, and select multiple features from the first set, wherein selecting is based on the determined components of the covariance matrix and an amount of the variance measure attributable to the selected multiple features, and wherein selecting the multiple features includes executing a greedy search performed using parallelized computation.Type: GrantFiled: August 19, 2013Date of Patent: November 22, 2016Assignee: SAS Institute Inc.Inventors: Zheng Zhao, James Cox, David Duling, Warren Sarle
-
Patent number: 9495647Abstract: A system for machine training can comprise one or more data processors and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including: accessing a dataset comprising data tracking a plurality of features; determining a series of values for a regularization parameter of a sparse support vector machine model, the series including an initial regularization value and a next regularization value; computing an initial solution to the sparse support vector machine model for the initial regularization value; identifying, using the initial solution, inactive features of the sparse support vector machine model for the next regularization value; and computing a next solution to the sparse support vector machine model for the next regularization value, wherein computing the next solution includes excluding the inactive features.Type: GrantFiled: August 24, 2015Date of Patent: November 15, 2016Assignee: SAS Institute Inc.Inventors: Zheng Zhao, Jun Liu, James Allen Cox
-
Patent number: 9495260Abstract: Apparatuses, systems and methods are disclosed for tolerating fault in a communications grid. Specifically, various techniques and systems are provided for detecting a fault or failure by a node in a network of computer nodes in a communications grid, adjusting the grid to avoid grid failure, and taking action based on the failure. In an example, a system may include receiving grid status information at a backup control node, the grid status information including a project status, storing the grid status information within the backup control node, receiving a failure communication including an indication that a primary control node has failed, designating the backup control node as a new primary control node, receiving updated grid status information based on the indication that the primary control node has failed, and transmitting a set of instructions based on the updated grid status information.Type: GrantFiled: June 23, 2015Date of Patent: November 15, 2016Assignee: SAS Institute Inc.Inventor: Richard Knight
-
Patent number: 9495414Abstract: A computing device to compute clusters using random subsets of variables is provided. Each data point of a plurality of data points is associated with a variable to define a plurality of variables. A subset of the plurality of variables is randomly selected. The subset does not include all of the plurality of variables. A number of clusters into which to segment the received data is determined. Cluster data that defines each cluster of the determined number of clusters is determined by executing a clustering algorithm with the received data using only the plurality of data points defined for each observation that are associated with the randomly selected subset of the plurality of variables. The determined cluster data is stored to cluster second data into the determined number of clusters. The second data is different from the received data.Type: GrantFiled: October 28, 2015Date of Patent: November 15, 2016Assignee: SAS Institute Inc.Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Ralph Abbey, Susan Haller, Jorge Silva
-
Patent number: 9495426Abstract: Techniques for providing interactive decision trees are included. For example, a system is provided that stores data related to a decision tree, wherein the data includes one or more data structures and one or more portions of code. The system receives input corresponding to an interaction request associated with a modification to the decision tree. The system determines whether the modification requires multiple-processing iterations of the distributed data set. The system generates an application layer modified decision tree when the generating requires no multiple-processing iterations of the distributed data set. The system facilitates server layer modification of the decision tree when the modification requires multiple-processing iterations of the distributed data set. The system generates a representation of the application layer modified decision tree or the server layer modified decision tree.Type: GrantFiled: July 2, 2015Date of Patent: November 15, 2016Assignee: SAS Institute Inc.Inventors: Xiangxiang Meng, Rajendra Singh, Xiangqian Hu, Duane Hamilton, Robert Wayne Thompson
-
Patent number: 9489621Abstract: A computing device to select decorrelated variables using a graph based method is provided. A correlation value is computed between each pair of a plurality of variables to define a correlation matrix. A binary threshold value is compared to each correlation value to define a binary similarity matrix from the correlation matrix. An undirected graph comprising a subgraph that includes one or more connected nodes is defined based on the binary similarity matrix to store connectivity information for the plurality of variables. Each node of the subgraph is pairwise associated with a unique variable of the variables. (a) A least connected node is selected from the undirected graph based on the connectivity information. (b) The selected least connected node is removed from the undirected graph. (c) The connectivity information for the undirected graph is updated based on the removed node. (d) (a)-(c) are repeated until a stop criterion is satisfied.Type: GrantFiled: October 30, 2015Date of Patent: November 8, 2016Assignee: SAS Institute Inc.Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Susan Haller, Jorge Silva
-
Patent number: 9483477Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.Type: GrantFiled: September 29, 2015Date of Patent: November 1, 2016Assignee: SAS Institute Inc.Inventors: Leslie Madonna Francis, Brian Oneal Miles, Shrividya Sastry, David Lee Kuhn
-
Publication number: 20160314226Abstract: Techniques for estimated compound probability distribution are described. An apparatus comprising a configuration component, perturbation component, sample generation controller, an aggregation component, a distribution fitting component, and statistics generation component. The configuration component operative to receive a compound model specification and candidate distribution definition. The perturbation component operative to generate a plurality of models from the compound model specification. The sample generation controller operative to initiate the generation of a plurality of compound model samples from each of the plurality of models. The distribution fitting component to generate parameter values for the candidate distribution definition based on the compound model samples. The statistics generation component to generate approximated aggregate statistics.Type: ApplicationFiled: June 29, 2016Publication date: October 27, 2016Applicant: SAS Institute Inc.Inventors: Mahesh V. Joshi, Richard Potter, Jan Chvosta, Mark Roland Little
-
Patent number: 9471869Abstract: A computing device to compute composite clusters is provided. A first and a second plurality of centroid locations are computed by executing a clustering algorithm with a first portion of data and a first input parameter and a second portion of the data and a second input parameter, respectively. The first portion is different from the second portion or the first input parameter is different from the second input parameter. A plurality of composite centroid locations is computed using the computed first and second plurality of centroid locations to define a composite set of clusters. An observation is selected. A cluster of the composite set of clusters to which to assign the observation is determined using the plurality of composite centroid locations. The selecting and the determining is repeated with each observation of the plurality of observations as the observation to define cluster assignments for the plurality of observations.Type: GrantFiled: October 28, 2015Date of Patent: October 18, 2016Assignee: SAS Institute Inc.Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Ralph Abbey, Susan Haller, Jorge Silva
-
Publication number: 20160292324Abstract: Systems and methods are provided for predicting new product performance, such as by way of an interface that allows for structured judgment analysis. The disclosed systems and methods, allow for the optional intervention of an expert, for assessing which other products are most similar to the new product, for excluding certain data from a performance prediction analysis, and thus may allow use of the most similar product and useful data as the basis for forming a product prediction for the new product.Type: ApplicationFiled: February 26, 2016Publication date: October 6, 2016Applicant: SAS Institute Inc.Inventors: Michael J. Leonard, Thomas H. Dickey, Samuel Lawrence Guseman, Michele Angelo Trovero
-
Patent number: 9460071Abstract: In a computing device that defines a rule for natural language processing of text, annotated text is selected from a first document of a plurality of annotated documents. An entity rule type is selected from a plurality of entity rule types. An argument of the selected entity rule type is identified. A value for the identified argument is randomly selected based on the selected annotated text to generate a rule instance. The generated rule instance is applied to remaining documents of the plurality of annotated documents. A rule performance measure is computed based on application of the generated rule instance. The generated rule instance and the computed rule performance measure are stored for application to other documents.Type: GrantFiled: April 21, 2015Date of Patent: October 4, 2016Assignee: SAS Institute Inc.Inventors: Viswanath Avasarala, David Styles, James Tetterton, Richard Crowell, Saratendu Sethi
-
Publication number: 20160283621Abstract: Possible outcomes can be determined by combining simulation methods on a pool of input variables. Certain members of the pool are identified as members of a first set of variables (e.g., priority set), and certain other members of the pool of input variables are identified as members of a second set of variables (e.g., non-priority set). A first set of possible values for the first set of variables can be generated by applying a first simulation method. A second set of possible values for the second set of variables can be generated by applying a second simulation method that differs from the first simulation method in various ways, such as accuracy, completion time, and computational expense. A copula data structure can be used to maintain correlations between the variables of the pool of input variables when generating a hybrid set of simulated values based on the first and second simulation.Type: ApplicationFiled: February 26, 2016Publication date: September 29, 2016Applicant: SAS Institute Inc.Inventors: Zhiping Yang, Donald James Erdman, Stacey Michelle Christian, Wei Chen
-
Publication number: 20160283715Abstract: Systems and methods are provided for identifying and detecting unauthorized user activity and for decreasing the rate of false-positives. The disclosed systems and techniques may involve analysis of users' past activity data so that individual classifications and authorization decisions with respect to requested user activity are based on activity data associated with a user's use of multiple services.Type: ApplicationFiled: February 12, 2016Publication date: September 29, 2016Applicant: SAS Institute Inc.Inventors: Brian Lee Duke, Paul C. Dulany, Kannan Shashank Shah
-
Publication number: 20160275399Abstract: Systems and methods are included for adjusting a set of predicted future data points for a time series data set including a receiver for receiving a time series data set. One or more processors and one or more non-transitory computer readable storage mediums containing instructions may be utilized. A count series forecasting engine, utilizing the one or more processors, generates a set of counts corresponding to discrete values of the time series data set. An optimal discrete probability distribution for the set of counts is selected. A set of parameters are generated for the optimal discrete probability distribution. A statistical model is selected to generate a set of predicted future data points. The set of predicted future data points are adjusted using the generated set of parameters for the optimal discrete probability distribution in order to provide greater accuracy with respect to predictions of future data points.Type: ApplicationFiled: May 27, 2016Publication date: September 22, 2016Applicant: SAS Institute Inc.Inventors: Michael James Leonard, David Bruce Elsheimer
-
Patent number: 9449408Abstract: A method of visualizing high-cardinally data is provided. A graph is presented on a display. The graph includes a first axis, a second axis, and a plurality of value markers. The first axis includes a minimum value and a maximum value and the second axis includes a plurality of category values. A selection indicator identifying selection of a first value marker of the plurality of value markers is received. The first value marker indicates a value for a category value of the plurality of category values. A second plurality of category values is determined based on the category value. The graph and a second graph are presented on the display. The second graph includes a third axis, a fourth axis, and a second plurality of value markers. The third axis includes a second minimum value and a second maximum value.Type: GrantFiled: March 12, 2014Date of Patent: September 20, 2016Assignee: SAS Institute Inc.Inventors: Jordan Riley Benson, David J. Caira, Douglas R. Dotson, Lisa Hope Everdyke, Nascif A. Abousalh-Neto