Patents Assigned to SAS Institute
-
Publication number: 20150029213Abstract: A method of visualizing high-cardinally data is provided. A graph is presented on a display. The graph includes a first axis, a second axis, and a plurality of value markers. The first axis includes a minimum value and a maximum value and the second axis includes a plurality of category values. A selection indicator identifying selection of a first value marker of the plurality of value markers is received. The first value marker indicates a value for a category value of the plurality of category values. A second plurality of category values is determined based on the category value. The graph and a second graph are presented on the display. The second graph includes a third axis, a fourth axis, and a second plurality of value markers. The third axis includes a second minimum value and a second maximum value.Type: ApplicationFiled: March 12, 2014Publication date: January 29, 2015Applicant: SAS Institute Inc.Inventors: Jordan Riley Benson, David J. Caira, Douglas R. Dotson, Lisa Hope Everdyke, Nascif A. Abousalh-Neto
-
Publication number: 20150019554Abstract: A method of determining a number of clusters for a dataset is provided. Centroid locations for a defined number of clusters are determined using a clustering algorithm. Boundaries for each of the defined clusters are defined. A reference distribution that includes a plurality of data points is created. The plurality of data points are within the defined boundary of at least one cluster of the defined clusters. Second centroid locations for the defined number of clusters are determined using the clustering algorithm and the reference distribution. A gap statistic for the defined number of clusters based on a comparison between a first residual sum of squares and a second residual sum of squares is computed. The processing is repeated for a next number of clusters to create. An estimated best number of clusters for the received data is determined by comparing the gap statistic computed for each iteration of the number of clusters.Type: ApplicationFiled: March 4, 2014Publication date: January 15, 2015Applicant: SAS Institute Inc.Inventors: Patrick Hall, Ilknur Kaynar Kabul, Warren Sarle, Jorge Silva
-
Patent number: 8918410Abstract: Systems and methods are provided for identifying data variable rules during initial data exploration. In one example, a computer-implemented method of determining a role for a data variable is disclosed. The method comprises identifying to a plurality of data nodes a set of data records containing data values assigned to each data node, a maximum number of levels to record in a sorted data structure at the data nodes, and the data node responsible for each of a plurality of variables. The method further comprises receiving for each variable from the data node responsible for the variable a plurality of unique data values for the variable, a count for each of the unique data values and an overflow count for the variable, wherein the number of unique data values does not exceed the maximum number of levels. A role for a variable can be determined based upon the unique data values, counts and overflow count for the variable.Type: GrantFiled: February 21, 2013Date of Patent: December 23, 2014Assignee: SAS Institute Inc.Inventors: Georges H. Guirguis, Scott Pope
-
Publication number: 20140372090Abstract: A method of selecting a one-class support vector machine (SVM) model for incremental response modeling is provided. Exposure group data generated from first responses by an exposure group receiving a request to respond is received. Control group data generated from second responses by a control group not receiving the request to respond is received. A response is either positive or negative. A one-class SVM model is defined using the positive responses in the control group data and an upper bound parameter value. The defined one-class SVM model is executed with the identified positive responses from the exposure group data. An error value is determined based on execution of the defined one-class SVM model. A final one-class SVM model is selected by validating the defined one-class SVM model using the determined error value.Type: ApplicationFiled: March 6, 2014Publication date: December 18, 2014Applicant: SAS Institute Inc.Inventors: Taiyeong Lee, Ruiwen Zhang, Yongqiao Xiao, Jared Langford Dean
-
Publication number: 20140351196Abstract: Systems and methods for determining an optimal splitting scheme for a node in a classification decision tree. A computing system may receive input data related to a decision tree to be generated from a data set. The input data identifies a target attribute of the data set and a set of candidate attributes of the data set to be used as nodes in the decision tree. The computing system may determine, using a clustering algorithm and the set of candidate attributes, a number of potential splitting schemes to be used to split a node in the decision tree. The computing system may calculate a splitting measurement for each of the plurality of potential splitting schemes. The computing system may select an optimal splitting scheme from the plurality of potential splitting schemes for each node in the decision tree based on the splitting measurement.Type: ApplicationFiled: May 21, 2014Publication date: November 27, 2014Applicant: SAS Institute Inc.Inventors: Xiangqian Hu, Xunlei Wu, Xiangxiang Meng, Oliver Schabenberger
-
Patent number: 8887128Abstract: Systems and methods are provided for automated generation of a customized software product. A system includes a computer-readable medium encoded with a project parameters data structure, where the project parameters data structure includes a plurality of project requirement records, and a project prototype. One or more data processors are configured to process a plurality of initial characteristics for the customized software product, populate the project parameters data structure at least based on the initial characteristics, and generate the project prototype based on the project parameters data structure. The one or more data processors are further configured to output a requirements matrix data structure at least based on the project parameters data structure and the project prototype and to generate the customized software product at least based on the requirements matrix data structure and the project prototype.Type: GrantFiled: March 15, 2013Date of Patent: November 11, 2014Assignee: SAS Institute Inc.Inventors: Brian Oneal Miles, Julius Alton King, Adheesha Sanjaya Arangala, Jin-Whan Jung
-
Publication number: 20140330536Abstract: Techniques to simulate statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, where statistics of the statistical test are based on parameter vectors to follow a probability distribution, a statistic simulator component to generate statistics for the parameter vectors from the simulated data, each parameter vector represented with a single point in a grid of points, the statistic simulation component to distribute portions of the simulated data or simulated statistics across multiple nodes of a distributed computing system in accordance with a column-wise or column-wise-by-group distribution algorithm, and a code generator component to create a computational representation arranged to generate an approximate probability distribution for each point in the grid of points from the simulated statistics, the approximate probability distribution to comprise an empirical cumulative distribution function (CDF).Type: ApplicationFiled: May 6, 2014Publication date: November 6, 2014Applicant: SAS Institute Inc.Inventors: Xilong Chen, Mark Roland Little
-
Publication number: 20140330826Abstract: Systems and methods for data reduction of a data set are included. A computing system may group data points in a data set into a number of data point bubbles represented by a number of representative points. A data point bubble may include a one or more data points from the data set and a representative point from the data set. The computing system may calculate a cluster assignment for the representative point by executing a clustering algorithm using the number of representative points.Type: ApplicationFiled: May 5, 2014Publication date: November 6, 2014Applicant: SAS Institute Inc.Inventors: Xiangqian Hu, Xunlei Wu, Xiangxiang Meng, Oliver Schabenberger
-
Publication number: 20140330884Abstract: This disclosure describes methods, systems, computer-readable media, and apparatuses for calculating a summary statistic. Calculating the summary statistic can be performed by identifying multiple subsets of a set of variable observations and assigning the subsets to grid-computing devices such that no two of the subsets are assigned to a same one of the grid-computing devices. A parallel processing operation that involves multiple processing phases at each of the grid-computing devices is then coordinated. The parallel processing operation includes each of the grid-computing devices inventorying the respectively assigned subset and generating inventory information representative of the respectively assigned subset. Subsequently, the inventory information generated by the grid-computing devices is received, and a summary statistic is determined by synthesizing the received inventory information.Type: ApplicationFiled: May 5, 2014Publication date: November 6, 2014Applicant: SAS Institute Inc.Inventor: Gang Meng
-
Publication number: 20140324738Abstract: A computer-program causing a computing device to perform an association measurement between a target variable and each non-target variable of a data set; select non-target variables for inclusion in a visualization based on the degree of association; perform correspondence analysis between target values of the target variable and non-target values of each selected non-target variable; order target value markers within a target row based on the degrees of closeness; order non-target value markers within each non-target row based on the degrees of closeness; determine a width of each target value marker based on a frequency of occurrence of its target value in the data set; determine a width of each non-target value marker based on a frequency of occurrence of its non-target value in the data set; and cause generation of the visualization with connection markers emanating from the target value markers and extending among the non-target value markers.Type: ApplicationFiled: April 24, 2014Publication date: October 30, 2014Applicant: SAS Institute Inc.Inventors: Krishnan PR, Prasad Pawar
-
Publication number: 20140324762Abstract: A method of determining a false and/or a true positive rate is provided. A true count value and a false count value are initialized for probability bins. For a plurality of records, a truth of event occurrence and a probability of occurrence are read; a probability bin that includes the probability of occurrence is determined; the true count value of the determined probability bin is incremented when the truth of event occurrence indicates true; and the false count value of the determined probability bin is incremented when the truth of event occurrence indicates false. A true positive rate and a false positive rate are computed for each probability bin based on the true count value, the false count value, a determined total number of true event occurrences, and a determined total number of false event occurrences.Type: ApplicationFiled: March 13, 2014Publication date: October 30, 2014Applicant: SAS Institute Inc.Inventor: Lawrence E. Lewis
-
Publication number: 20140297997Abstract: Various embodiments are generally directed to techniques for reducing syntax requirements in application code to cause concurrent execution of multiple iterations of at least a portion of a loop thereof to reduce overall execution time in solving a large scale problem. At least one non-transitory machine-readable storage medium includes instructions that when executed by a computing device, cause the computing device to parse an application code to identify a loop instruction indicative of an instruction block that includes instructions that define a loop of which multiple iterations are capable of concurrent execution, the instructions including at least one call instruction to an executable routine capable of concurrent execution; and insert at least one coordinating instruction into an instruction sub-block of the instruction block to cause sequential execution of instructions of the instruction sub-block across the multiple iterations based on identification of the loop instruction.Type: ApplicationFiled: December 30, 2013Publication date: October 2, 2014Applicant: SAS Institute Inc.Inventors: Jack Joseph Rouse, Leonardo Bezerra Lopes, Robert William Pratt
-
Publication number: 20140280220Abstract: A method of determining a storage device on which to store received data is provided. Data is received. A score indicating a value associated with the received data is computed. A storage device is determined from a plurality of types of storage devices on which to store the received data based on the computed score. The received data is sent to the determined storage device.Type: ApplicationFiled: March 12, 2014Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Gary Spakes, Scott Meredith Chastain, Bryan Christopher Harris
-
Publication number: 20140280343Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are encoded and subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are encoded and subdivided into a second sliding series of the defined number of characters. Whether or not the first sliding series and the second sliding series are similar is determined by comparing the encoded and subdivided first characters to the encoded and subdivided second characters using a fuzzy matching algorithm.Type: ApplicationFiled: September 3, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: James Edward Georges, David Lee Kuhn, Edward Lew Rowe, John Michael Kichak, Karcsi Fritz Lehr
-
Publication number: 20140280331Abstract: A method of performing a query on a cube of data is provided. An access key associated with a user is created at a computing device. The access key defines the user's access to a cube of data distributed onto a plurality of computing devices with each computing device of the plurality of computing devices storing a different portion of the cube of data. A plurality of access masks is stored in association with the portion of the cube of data stored on the computing device. A process space associated with the user is created. A query on the cube of data is received by the computing device. The query is associated with the user. The query is processed while masking the created access key with the stored plurality of access masks, wherein the masking controls access to the stored portion of the cube of data. A result of the processed query is sent to a requesting computing device.Type: ApplicationFiled: August 6, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Stacey M. Christian, Donald James Erdman
-
Publication number: 20140280239Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are subdivided into a second sliding series of the defined number of characters. A similarity score between the first sliding series and the second sliding series is calculated. Whether or not the first sliding series and the second sliding series are similar is determined based on the calculated similarity score.Type: ApplicationFiled: August 8, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: James Edward Georges, David Lee Kuhn, Edward Lew Rowe, John Michael Kichak, Karcsi Fritz Lehr
-
Publication number: 20140280986Abstract: A method of acknowledging receipt of an event block object is provided. First connection information for connecting to an event stream processing (ESP) engine executing at a first computing device is received. A first connection to the ESP engine is established using the received first connection information. Second connection information for connecting to a publishing client executing at a second computing device is received. A second connection to the publishing client is established using the received second connection information, wherein the first connection differs from the second connection. An event block object is received from the ESP engine using the established first connection, wherein the event block object includes a unique identifier for the event block object. Successful processing of the event block object is determined.Type: ApplicationFiled: August 2, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Gerald Donald Baulier, Scott J. Kolodzieski, Vincent L. Deters
-
Publication number: 20140280330Abstract: A method of performing a query on a cube of data is provided. A cube of data is distributed onto a plurality of computing devices with each computing device of the plurality of computing devices storing a different portion of the cube of data. A perturbation rule configured for application to the cube of data and associated with a user is received. A process space associated with the user is created. The received perturbation rule is compiled in association with the created process space. A query on the portion of the cube of data stored at the computing device is received. The received query is associated with the created process space. The query is processed while applying the compiled perturbation rule to data extracted from the portion of the cube of data stored at the computing device. A result of the processed query is sent to a requesting computing device.Type: ApplicationFiled: July 24, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Stacey M. Christian, Donald James Erdman, Scott T. Gray
-
Publication number: 20140282246Abstract: Various embodiments are generally directed to techniques for increasing the accuracy with which list items may be selected on a touch screen. A machine-readable storage medium includes instructions that when executed cause a computing device to present a list of multiple list items on a touch screen, each associated with a touch area and including a wide area marking a location of the touch area and a narrow area narrower than the wide area, the wide and narrow areas defining a presentation area wherein the wide areas of adjacent first and second list items are positioned at different first and second widthwise positions, respectively, and wherein the touch areas of the first and second list items coincide with the wide areas of the first and second list items, respectively. Other embodiments are described and claimed.Type: ApplicationFiled: February 11, 2014Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Qing Gong, Huifang Wang
-
Publication number: 20140279527Abstract: Methods, systems, computer-readable media, and apparatuses for detecting unauthorized activity are disclosed. Detecting unauthorized activity is done by accessing first data that represents activity involving a first service provided to a customer, accessing second data that represents activity involving a second service provided to a customer. The activity involving the second service and the activity involving the first service both include authorized customer activity, and the activity associated with the second service further includes unauthorized activity. The first data is filtered using a filtering criteria and a portion of the first data is selected to be retained. The second data and the retained portion of the first data are analyzed, and the analysis includes classifying the activity associated with the second service in a way that distinguishes the unauthorized activity from the authorized activity associated with the second service.Type: ApplicationFiled: October 24, 2013Publication date: September 18, 2014Applicant: SAS Institute Inc.Inventors: Brian Lee Duke, Paul C. Dulany, Vijay Desai, Kannan Shashank Shah