Patents Assigned to SAS Institute

Techniques for query homogenization in cache operations

Patent number: 9519679

Abstract: An apparatus includes a renaming component to homogenized query instructions for retrieving data items from a data set organized using index labels by identifying a declaration instruction associating an object thereof with an index label, replacing the name provided to the object the with an archetypal name based on the index label, and generating change data associating the name with the archetypal name; a hashing component to take an instruction hash of the homogenized instructions; a cache control routine to find a matching instruction hash corresponding to results of earlier database queries in a results cache; and a reversal routine to, in response finding a matching instruction hash, retrieve a cached result from the results cache associated with the matching instruction hash, and replace a name of a different object therein based on the change data and the query instructions to generate a new result of the new database query.

Type: Grant

Filed: September 21, 2015

Date of Patent: December 13, 2016

Assignee: SAS Institute Inc.

Inventors: Kenneth Tolman, Kimberly Buckler Botha, Paul Anthony Smiley, David R. Henderson, Andrew Anderson
SYSTEMS AND METHODS FOR QUANTILE DETERMINATION IN A DISTRIBUTED DATA SYSTEM USING SAMPLING

Publication number: 20160350396

Abstract: In accordance with the teachings described herein, systems and methods are provided for estimating or determining quantiles for data stored in a distributed system. In one embodiment, an instruction is received to estimate or determine a specified quantile for a variate in a set of data stored at a plurality of nodes in the distributed system. A plurality of data bins for the variate are defined that are each associated with a different range of data values in the set of data. Lower and upper quantile bounds for each of the plurality of data bins are determined based on the total number of data values that fall within each of the plurality of data bins. The specified quantile is estimated or determined based on an identified one of the plurality of data bins that includes the specified quantile based on the lower and upper quantile bounds.

Type: Application

Filed: July 15, 2016

Publication date: December 1, 2016

Applicant: SAS Institute Inc.

Inventors: Guy Blanc, Georges H. Guirguis, Xiangqian Hu, Guixian Lin, Scott Pope
NORMALIZING ELECTRONIC COMMUNICATIONS USING A NEURAL NETWORK

Publication number: 20160350646

Abstract: Electronic communications can be normalized using a neural network. For example, a noncanonical communication that includes multiple terms can be received. The noncanonical communication can be preprocessed by (I) generating a vector including multiple characters from a term of the multiple terms; and (II) repeating a substring of the term in the vector such that a last character of the substring is positioned in a last position in the vector. The vector can be transmitted to a neural network configured to receive the vector and generate multiple probabilities based on the vector. A normalized version of the noncanonical communication can be determined using one or more of the multiple probabilities generated by the neural network. Whether the normalized version of the noncanonical communication should be outputted can also be determined using at least one of the multiple probabilities generated by the neural network.

Type: Application

Filed: June 7, 2016

Publication date: December 1, 2016

Applicant: SAS Institute Inc.

Inventors: Samuel Paul Leeman-Munk, James Allen Cox
Systems and methods for quantile determination in a distributed data system

Patent number: 9507833

Abstract: In accordance with the teachings described herein, systems and methods are provided for estimating quantiles for data stored in a distributed system. In one embodiment, an instruction is received to estimate a specified quantile for a variate in a set of data stored at a plurality of nodes in the distributed system. A plurality of data bins for the variate are defined that are each associated with a different range of data values in the set of data. Lower and upper quantile bounds for each of the plurality of data bins are determined based on the total number of data values that fall within each of the plurality of data bins. The specified quantile is estimated based on an identified one of the plurality of data bins that includes the specified quantile based on the lower and upper quantile bounds.

Type: Grant

Filed: April 29, 2016

Date of Patent: November 29, 2016

Assignee: SAS Institute Inc.

Inventors: Georges H. Guirguis, Scott Pope, Oliver Schabenberger
DISTRIBUTED CORRELATION AND ANALYSIS OF PATIENT THERAPY DATA

Publication number: 20160342742

Abstract: An apparatus includes a processor and storage to store instructions that cause the processor to identify at least one correlation between a diagnosis group and a medication class for each patient of a first set of patients to derive a set of models for each diagnosis group that correlates the diagnosis group to at least one medication class based on the at least one identified correlation; and for each patient of a second set of patients, employ each model of each set of models to make at least one prediction of at least one diagnosis group as indicated in the corresponding diagnosis group record based on at least one medication class indicated in the corresponding medication class record, and compare the at least one prediction to the corresponding diagnosis group record to derive a tally of at least one of true positives or false positives for each prediction.

Type: Application

Filed: May 3, 2016

Publication date: November 24, 2016

Applicant: SAS Institute Inc.

Inventors: Emily Chapman-McQuiston, Diane Emerton, Ruth Baldasaro, Daniel Kelly
Systems and methods for providing a unified variable selection approach based on variance preservation

Patent number: 9501522

Abstract: This disclosure describes a method, system and computer-program product for parallelized feature selection. The method, system and computer-program product may be used to access a first set of features, wherein the first set of features includes multiple features, wherein the features are characterized by a variance measure, and wherein accessing the first set of features includes using a computing system to access the features, determine components of a covariance matrix, the components of the covariance matrix indicating a covariance with respect to pairs of features in the first set, and select multiple features from the first set, wherein selecting is based on the determined components of the covariance matrix and an amount of the variance measure attributable to the selected multiple features, and wherein selecting the multiple features includes executing a greedy search performed using parallelized computation.

Type: Grant

Filed: August 19, 2013

Date of Patent: November 22, 2016

Assignee: SAS Institute Inc.

Inventors: Zheng Zhao, James Cox, David Duling, Warren Sarle
Acceleration of sparse support vector machine training through safe feature screening

Patent number: 9495647

Abstract: A system for machine training can comprise one or more data processors and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including: accessing a dataset comprising data tracking a plurality of features; determining a series of values for a regularization parameter of a sparse support vector machine model, the series including an initial regularization value and a next regularization value; computing an initial solution to the sparse support vector machine model for the initial regularization value; identifying, using the initial solution, inactive features of the sparse support vector machine model for the next regularization value; and computing a next solution to the sparse support vector machine model for the next regularization value, wherein computing the next solution includes excluding the inactive features.

Type: Grant

Filed: August 24, 2015

Date of Patent: November 15, 2016

Assignee: SAS Institute Inc.

Inventors: Zheng Zhao, Jun Liu, James Allen Cox
Fault tolerant communications

Patent number: 9495260

Abstract: Apparatuses, systems and methods are disclosed for tolerating fault in a communications grid. Specifically, various techniques and systems are provided for detecting a fault or failure by a node in a network of computer nodes in a communications grid, adjusting the grid to avoid grid failure, and taking action based on the failure. In an example, a system may include receiving grid status information at a backup control node, the grid status information including a project status, storing the grid status information within the backup control node, receiving a failure communication including an indication that a primary control node has failed, designating the backup control node as a new primary control node, receiving updated grid status information based on the indication that the primary control node has failed, and transmitting a set of instructions based on the updated grid status information.

Type: Grant

Filed: June 23, 2015

Date of Patent: November 15, 2016

Assignee: SAS Institute Inc.

Inventor: Richard Knight
Cluster computation using random subsets of variables

Patent number: 9495414

Abstract: A computing device to compute clusters using random subsets of variables is provided. Each data point of a plurality of data points is associated with a variable to define a plurality of variables. A subset of the plurality of variables is randomly selected. The subset does not include all of the plurality of variables. A number of clusters into which to segment the received data is determined. Cluster data that defines each cluster of the determined number of clusters is determined by executing a clustering algorithm with the received data using only the plurality of data points defined for each observation that are associated with the randomly selected subset of the plurality of variables. The determined cluster data is stored to cluster second data into the determined number of clusters. The second data is different from the received data.

Type: Grant

Filed: October 28, 2015

Date of Patent: November 15, 2016

Assignee: SAS Institute Inc.

Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Ralph Abbey, Susan Haller, Jorge Silva
Techniques for interactive decision trees

Patent number: 9495426

Abstract: Techniques for providing interactive decision trees are included. For example, a system is provided that stores data related to a decision tree, wherein the data includes one or more data structures and one or more portions of code. The system receives input corresponding to an interaction request associated with a modification to the decision tree. The system determines whether the modification requires multiple-processing iterations of the distributed data set. The system generates an application layer modified decision tree when the generating requires no multiple-processing iterations of the distributed data set. The system facilitates server layer modification of the decision tree when the modification requires multiple-processing iterations of the distributed data set. The system generates a representation of the application layer modified decision tree or the server layer modified decision tree.

Type: Grant

Filed: July 2, 2015

Date of Patent: November 15, 2016

Assignee: SAS Institute Inc.

Inventors: Xiangxiang Meng, Rajendra Singh, Xiangqian Hu, Duane Hamilton, Robert Wayne Thompson
Graph based selection of decorrelated variables

Patent number: 9489621

Abstract: A computing device to select decorrelated variables using a graph based method is provided. A correlation value is computed between each pair of a plurality of variables to define a correlation matrix. A binary threshold value is compared to each correlation value to define a binary similarity matrix from the correlation matrix. An undirected graph comprising a subgraph that includes one or more connected nodes is defined based on the binary similarity matrix to store connectivity information for the plurality of variables. Each node of the subgraph is pairwise associated with a unique variable of the variables. (a) A least connected node is selected from the undirected graph based on the connectivity information. (b) The selected least connected node is removed from the undirected graph. (c) The connectivity information for the undirected graph is updated based on the removed node. (d) (a)-(c) are repeated until a stop criterion is satisfied.

Type: Grant

Filed: October 30, 2015

Date of Patent: November 8, 2016

Assignee: SAS Institute Inc.

Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Susan Haller, Jorge Silva
Automated data intake system

Patent number: 9483477

Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.

Type: Grant

Filed: September 29, 2015

Date of Patent: November 1, 2016

Assignee: SAS Institute Inc.

Inventors: Leslie Madonna Francis, Brian Oneal Miles, Shrividya Sastry, David Lee Kuhn
TECHNIQUES FOR ESTIMATING COMPOUND PROBABILITY DISTRIBUTION BY SIMULATING LARGE EMPIRICAL SAMPLES WITH SCALABLE PARALLEL AND DISTRIBUTED PROCESSING

Publication number: 20160314226

Abstract: Techniques for estimated compound probability distribution are described. An apparatus comprising a configuration component, perturbation component, sample generation controller, an aggregation component, a distribution fitting component, and statistics generation component. The configuration component operative to receive a compound model specification and candidate distribution definition. The perturbation component operative to generate a plurality of models from the compound model specification. The sample generation controller operative to initiate the generation of a plurality of compound model samples from each of the plurality of models. The distribution fitting component to generate parameter values for the candidate distribution definition based on the compound model samples. The statistics generation component to generate approximated aggregate statistics.

Type: Application

Filed: June 29, 2016

Publication date: October 27, 2016

Applicant: SAS Institute Inc.

Inventors: Mahesh V. Joshi, Richard Potter, Jan Chvosta, Mark Roland Little
Determination of composite clusters

Patent number: 9471869

Abstract: A computing device to compute composite clusters is provided. A first and a second plurality of centroid locations are computed by executing a clustering algorithm with a first portion of data and a first input parameter and a second portion of the data and a second input parameter, respectively. The first portion is different from the second portion or the first input parameter is different from the second input parameter. A plurality of composite centroid locations is computed using the computed first and second plurality of centroid locations to define a composite set of clusters. An observation is selected. A cluster of the composite set of clusters to which to assign the observation is determined using the plurality of composite centroid locations. The selecting and the determining is repeated with each observation of the plurality of observations as the observation to define cluster assignments for the plurality of observations.

Type: Grant

Filed: October 28, 2015

Date of Patent: October 18, 2016

Assignee: SAS Institute Inc.

Inventors: Patrick Hall, Ilknur Kaynar Kabul, Jared Langford Dean, Ralph Abbey, Susan Haller, Jorge Silva
SYSTEMS AND METHODS FOR PREDICTING PERFORMANCE

Publication number: 20160292324

Abstract: Systems and methods are provided for predicting new product performance, such as by way of an interface that allows for structured judgment analysis. The disclosed systems and methods, allow for the optional intervention of an expert, for assessing which other products are most similar to the new product, for excluding certain data from a performance prediction analysis, and thus may allow use of the most similar product and useful data as the basis for forming a product prediction for the new product.

Type: Application

Filed: February 26, 2016

Publication date: October 6, 2016

Applicant: SAS Institute Inc.

Inventors: Michael J. Leonard, Thomas H. Dickey, Samuel Lawrence Guseman, Michele Angelo Trovero
Rule development for natural language processing of text

Patent number: 9460071

Abstract: In a computing device that defines a rule for natural language processing of text, annotated text is selected from a first document of a plurality of annotated documents. An entity rule type is selected from a plurality of entity rule types. An argument of the selected entity rule type is identified. A value for the identified argument is randomly selected based on the selected annotated text to generate a rule instance. The generated rule instance is applied to remaining documents of the plurality of annotated documents. A rule performance measure is computed based on application of the generated rule instance. The generated rule instance and the computed rule performance measure are stored for application to other documents.

Type: Grant

Filed: April 21, 2015

Date of Patent: October 4, 2016

Assignee: SAS Institute Inc.

Inventors: Viswanath Avasarala, David Styles, James Tetterton, Richard Crowell, Saratendu Sethi
Hybrid Simulation Methodologies

Publication number: 20160283621

Abstract: Possible outcomes can be determined by combining simulation methods on a pool of input variables. Certain members of the pool are identified as members of a first set of variables (e.g., priority set), and certain other members of the pool of input variables are identified as members of a second set of variables (e.g., non-priority set). A first set of possible values for the first set of variables can be generated by applying a first simulation method. A second set of possible values for the second set of variables can be generated by applying a second simulation method that differs from the first simulation method in various ways, such as accuracy, completion time, and computational expense. A copula data structure can be used to maintain correlations between the variables of the pool of input variables when generating a hybrid set of simulated values based on the first and second simulation.

Type: Application

Filed: February 26, 2016

Publication date: September 29, 2016

Applicant: SAS Institute Inc.

Inventors: Zhiping Yang, Donald James Erdman, Stacey Michelle Christian, Wei Chen
UNAUTHORIZED ACTIVITY DETECTION AND CLASSIFICATION

Publication number: 20160283715

Abstract: Systems and methods are provided for identifying and detecting unauthorized user activity and for decreasing the rate of false-positives. The disclosed systems and techniques may involve analysis of users' past activity data so that individual classifications and authorization decisions with respect to requested user activity are based on activity data associated with a user's use of multiple services.

Type: Application

Filed: February 12, 2016

Publication date: September 29, 2016

Applicant: SAS Institute Inc.

Inventors: Brian Lee Duke, Paul C. Dulany, Kannan Shashank Shah
SYSTEMS AND METHODS FOR TIME SERIES ANALYSIS TECHNIQUES UTILIZING COUNT DATA SETS

Publication number: 20160275399

Abstract: Systems and methods are included for adjusting a set of predicted future data points for a time series data set including a receiver for receiving a time series data set. One or more processors and one or more non-transitory computer readable storage mediums containing instructions may be utilized. A count series forecasting engine, utilizing the one or more processors, generates a set of counts corresponding to discrete values of the time series data set. An optimal discrete probability distribution for the set of counts is selected. A set of parameters are generated for the optimal discrete probability distribution. A statistical model is selected to generate a set of predicted future data points. The set of predicted future data points are adjusted using the generated set of parameters for the optimal discrete probability distribution in order to provide greater accuracy with respect to predictions of future data points.

Type: Application

Filed: May 27, 2016

Publication date: September 22, 2016

Applicant: SAS Institute Inc.

Inventors: Michael James Leonard, David Bruce Elsheimer
Visualizing high-cardinality data

Patent number: 9449408

Abstract: A method of visualizing high-cardinally data is provided. A graph is presented on a display. The graph includes a first axis, a second axis, and a plurality of value markers. The first axis includes a minimum value and a maximum value and the second axis includes a plurality of category values. A selection indicator identifying selection of a first value marker of the plurality of value markers is received. The first value marker indicates a value for a category value of the plurality of category values. A second plurality of category values is determined based on the category value. The graph and a second graph are presented on the display. The second graph includes a third axis, a fourth axis, and a second plurality of value markers. The third axis includes a second minimum value and a second maximum value.

Type: Grant

Filed: March 12, 2014

Date of Patent: September 20, 2016

Assignee: SAS Institute Inc.

Inventors: Jordan Riley Benson, David J. Caira, Douglas R. Dotson, Lisa Hope Everdyke, Nascif A. Abousalh-Neto

prev … 18 19 20 21 22 23 24 25 26 … next