SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD RISK FROM AUDIO SIGNALS

Systems, methods, and media for determining fraud risk from audio signals and non-audio data are provided herein. Some exemplary methods include receiving an audio signal and an associated audio signal identifier, receiving a fraud event identifier associated with a fraud event, determining a speaker model based on the received audio signal, determining a channel model based on a path of the received audio signal, using a server system, updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identified, and updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. patent application Ser. No. 13/415,809 filed on Mar. 8, 2012, and entitled “SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD RISK FROM AUDIO SIGNALS.”

U.S. patent application Ser. No. 13/415,809 is a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 13/290,011, filed on Nov. 4, 2011, entitled “SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD PATTERNS AND CREATING FRAUD BEHAVIORAL MODELS,” which is a continuation-in-part of U.S. patent application Ser. No. 11/754,974, (now U.S. Pat. No. 8,073,691) filed on May 29, 2007, entitled “METHOD AND SYSTEM FOR SCREENING USING VOICE DATA AND METADATA,” which in turn claims the benefit of and priority to U.S. Provisional Applications 60/923,195, filed on Apr. 13, 2007, entitled “Seeding Techniques and Geographical Optimization Details for a Fraud Detection System that uses Voiceprints,” and 60/808,892, filed on May 30, 2006, entitled “Optimizations for a Fraud Detection System that uses Voiceprints.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 11/754,975, filed on May 29, 2007, entitled “Method and System to Seed a Voice Database,” which in turn claims the benefit of and priority to U.S. Provisional Applications 60/923,195, filed on Apr. 13, 2007, entitled “Seeding Techniques and Geographical Optimization Details for a Fraud Detection System that uses Voiceprints,” and 60/808,892, filed on May 30, 2006, entitled “Optimizations for a Fraud Detection System that uses Voiceprints.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 12/352,530, filed on Jan. 12, 2009, entitled “BUILDING WHITELISTS COMPRISING VOICEPRINTS NOT ASSOCIATED WITH FRAUD AND SCREENING CALLS USING A COMBINATION OF A WHITELIST AND BLACKLIST,” which in turn claims the benefit of and priority to U.S. Provisional Applications 61/197,848, filed Oct. 31, 2008, entitled “Voice biometrics based fraud management system,” and 61/010,701, filed Jan. 11, 2008, entitled “Optimizations & extensions of a system to detect fraud using voiceprints.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 12/856,200, filed on Aug. 13, 2010, entitled “SPEAKER VERIFICATION-BASED FRAUD SYSTEM FOR COMBINED AUTOMATED RISK SCORE WITH AGENT REVIEW AND ASSOCIATED USER INTERFACE,” which in turn claims the benefit of and priority to U.S. Provisional Application 61/335,677, filed on Jan. 11, 2010, entitled “Method for correlating fraud audio to textual fraud reports using word spotting.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 12/856,118, filed on Aug. 13, 2010, entitled “METHOD AND SYSTEM FOR GENERATING A FRAUD RISK SCORE USING TELEPHONY CHANNEL BASED AUDIO AND NON-AUDIO DATA,” which in turn claims the benefit of and priority to U.S. Provisional Applications 61/335,677, filed on Jan. 11, 2010, entitled “Method for correlating fraud audio to textual fraud reports using word spotting.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 12/856,037, filed on Aug. 13, 2010, entitled “METHOD AND SYSTEM FOR ENROLLING A VOICEPRINT IN A FRAUDSTER DATABASE,” which in turn claims the benefit of and priority to U.S. Provisional Applications 61/335,677, filed on Jan. 11, 2010, entitled “Method for correlating fraud audio to textual fraud reports using word spotting.”

U.S. patent application Ser. No. 13/415,809 and each of the aforementioned Non-Provisional U.S. Patent Applications is a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 11/404,342, filed on Apr. 14, 2006, entitled “Method and system to detect fraud using voice data,” which in turn claims the benefit of U.S. Provisional Application 60/673,472, filed on Apr. 21, 2005, entitled “Detecting Fraudulent Use of Financial Account Numbers Using Voiceprints.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims the benefit of and priority to U.S. patent application Ser. No. 13/278,067, filed on Oct. 20, 2011, entitled “Method and System for Screening Using Voice Data and Metadata,” which in turn is a continuation of and claims the benefit of and priority to U.S. patent application Ser. No. 11/754,974, filed on May 29, 2007, entitled “METHOD AND SYSTEM FOR SCREENING USING VOICE DATA AND METADATA,” which in turn claims the benefit of and priority to U.S. Provisional Applications 60/923,195, filed on Apr. 13, 2007, entitled “Seeding Techniques and Geographical Optimization Details for a Fraud Detection System that uses Voiceprints,” and 60/808,892, filed on May 30, 2006, entitled “Optimizations for a Fraud Detection System that uses Voiceprints.”

U.S. patent application Ser. No. 13/415,809 is also a continuation-in-part and claims benefit of and priority to U.S. patent application Ser. No. 13/415,816, filed on Mar. 8, 2012, entitled “SYSTEMS, METHODS, AND MEDIA FOR GENERA TING HIERARCHICAL FUSED RISK SCORES.”

All of above applications and patents are hereby incorporated by reference herein in their entirety.

FIELD OF THE TECHNOLOGY

Embodiments of the disclosure relate to determining fraud risk from audio signals, and more specifically, but not by way of limitation, to systems, methods, and media for extracting and using characteristics of audio signals to determine fraud risk. Signal processing may include comparing components such as speaker models, channel models, and/or operational models of a candidate audio signal to a plurality of types of data stored in various fraudster databases. Matches between the audio signal and data stored in the various fraudster databases may indicate that an audio signal is associated with fraud.

BACKGROUND

Fraud such as credit card fraud and identity fraud are common. To deal with fraud, enterprises such as merchants and banks use a variety of fraud detection systems. However, these fraud detection systems are susceptible to becoming obsolete within a short time because fraudsters change their methods of perpetrating fraud in order to maneuver past such fraud detection systems.

SUMMARY

According to some embodiments, the present technology may be directed to methods that comprise: (a) receiving an audio signal and an associated audio signal identifier without regard to fraud activities; (b) receiving a fraud event identifier associated with a fraud event; (c) determining a speaker model based on the received audio signal, using a server system; (d) determining a channel model based on a path of the received audio signal, using the server system; (e) updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identifier; and (f) updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.

According to other embodiments, the present technology may be directed to methods for screening an audio sample that include: (a) maintaining a list of channel models in a server system, each channel model belonging to a disqualified candidate and representing a path of an audio signal associated with an identifier that has been matched to information associated with an instance of fraud; (b) receiving a screening request, the screening request comprising an audio sample for a candidate; (c) comparing the audio sample with the channel models in the list of channel models in the server system; and (d) sending a channel score to the third party, the channel score indicating at least a partial match between the audio sample and a channel model in the list of channel models.

According to some embodiments, the present technology may be directed to systems for analyzing audio. The systems may comprise: (a) a memory for storing executable instructions for analyzing audio; (b) a processor for executing the instructions; (c) a communications module stored in memory and executable by the processor to receive an audio signal and an associated audio signal identifier, and to receive a fraud event identifier associated with a fraud event; (d) an audio analysis module stored in memory and executable by the processor to extract signatures from the received audio signal; and (e) an enrollment module stored in memory and executable by the processor to compare the audio signal identifier and the fraud event identifier and based on the comparison to store in a fraudster database any of: (i) a channel model extracted from the audio signal using the analysis module; (ii) a speaker model extracted from the audio signal using the analysis module; and (iii) combinations thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed disclosure, and explain various principles and advantages of those embodiments.

The methods and systems disclosed herein have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.

FIG. 1 illustrates a pictorial representation of an exemplary implementation of a system for fraud detection;

FIG. 2 illustrates an exemplary audio analysis system for processing call data;

FIG. 3 shows a flowchart of an exemplary method for processing audio signals; and

FIG. 4 illustrates an exemplary computing system that may be used to implement embodiments according to the present technology.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. It will be apparent, however, to one skilled in the art, that the disclosure may be practiced without these specific details. In other instances, structures and devices are shown at block diagram form only in order to avoid obscuring the disclosure.

Generally speaking, the present technology is directed to systems, methods, and media for analyzing call data against speaker models and channel models derived from audio signals and/or non-audio data to determine a fraud risk for a call event. That is, when a caller engages an enterprise telephonically (e.g., the a call event) the call data of the call event may be recorded and analyzed to determine if the call event is likely to be associated with fraud.

The term “speaker model” may be understood to comprise a voice model and/or a language model. A voice model may be understood to include structural features associated with a speaker such as tenor, timbre, frequency, overtones, and so forth. A language model may be understood to comprise features such as word choice, accent, language, word order, and so forth. The combination of a voice model and a language model may provide a robust model to uniquely identify a speaker given an audio signal.

In some embodiments, call data including features and/or characteristics of the call data received from candidates may be compared to fraudster models that are stored in a fraudster database. The fraudster model may comprise any combination of a speaker model, a channel model, and an operational model for a given fraudster. The operational model for a fraudster may comprise data such as aliases/names utilized, ANIs used, geographical area of operation (e.g., shipping address, zip code, etc.), fraudulent activities, and so forth. Each fraudster may be associated with a fraud identifier that uniquely identifies a fraudster and allows the fraudster to be tracked.

Facets of the analysis of the call event may include determining audio signal characteristics of the audio signal of the call event such as speaker characteristics and channel characteristics. These audio signal characteristics may be analyzed by comparing them to fraud indicators to determine a fraud risk for the call event. It will be understood that audio signal audio and/or characteristics (voice and non-voice portions) of a call event may be analyzed by determining whether one or more audio signal data and/or characteristics match a characteristic of a landline, VoIP (Voice over Internet Protocol), a cellular phone, or other any other telecommunications path that may be traversed by an audio signal. In some instances the present technology may decipher the specific type of communications method (e.g., CDMA, GSM, VoIP, etc.) employed in the transmission of the audio signal. In additional embodiments the present technology may determine a call traversal path by analyzing the delay, jitter, or other artifacts inherent in the call path. Further details regarding determining a call traversal path feature are disclosed in PinDrOp: Using Single-Ended Audio Features To Determine Call Provenance, Converging Infrastructure Security (CISEC) Laboratory, Georgia Tech. Information Security Center (GTISC); Authors: Vijay A. Balasubramaniyan, Aamir Poonawalla, Mustaque Ahamad, Michael T. Hunter, and Patrick Traynor—which is hereby incorporated herein by reference in its entirety.

Advantageously, once the present technology has characterized the audio signal, the information can be used for improving risk scoring in one or more of the following non-limiting examples. In some instances audio signal characteristics and/or non-audio data may be utilized to detect changes in behavior of an account, for example, by comparing the audio characteristic of past calls (for a given account) with the current calls, with risk being higher depending on the amount and type of a change. Alternatively, the present technology may compare the audio signal characteristics with what should be expected based on an automatic number identification (ANI) associated with the account. Databases may be compiled in which the ANI can be looked up to determine the type of phone a given ANI is associated with, and thus, it may be possible to determine what expected audio signal characteristics should be present in the audio signal. This information can be compared to the observed characteristics, and calls that have a mismatch with expected values may be rated higher for risk.

Additionally, the present technology may be utilized to associate noise characteristics with fraudster channel models to reduce false positives and aid in selecting fraudster models that should be scanned based on this association. These methods may be employed to “partition” or “segment” the fraudster voiceprint database to reduce the total number of potential fraudster voiceprints that may be compared against the voiceprint of the candidate audio sample.

General applications of the present technology allow for the generation and utilization of a historical set of speaker models and/or channel models for a particular customer account. The history may include a speaker model and/or channel model for each call event for a given customer account. When a fraud report or fraud event identifier is received, a time stamp associated with the fraud event may be utilized to determine speaker models and/or channel models that are proximate the time stamp of the fraud event. These and other advantages of the present technology are described infra with reference to the collective drawings, FIGS. 1-4.

Referring now to FIG. 1, a pictorial representation of an exemplary implementation of a system for fraud detection is shown, in accordance with various embodiments of the present disclosure. As can be seen from FIG. 1, an enterprise call center 100, a fraud detection system 102 (hereinafter “FDS 102”), and a plurality of callers 104 are shown. The call center 100 may receive and process calls on behalf of an enterprise. The enterprise may include a merchant, an insurance company, an affiliate of a company, a bank, a telecommunication company, a cellular service provider, a credit card company, a credit card service company, and the like. The call center may be located at the enterprise and/or at a separate entity. In some embodiments, the enterprise is cloud-based.

According to some embodiments, the call center 100 may receive calls from the plurality of callers 104 (hereinafter “the callers 104”) for goods and/or services provided by the enterprise. The callers 104 may call the call center 100 using a VoIP/Public Switched Telephone Network (PSTN)/mobile network 106A. The calls from the callers 104 may enter an automatic call distributor 108, which distributes calls across individual agents 110a-n. Calls events may be recorded by a recording device 112 of the call center 100 for processing in real time and/or later for fraud detection. It will be understood that the callers 104 may include legitimate customers and fraudsters.

The callers 104 may request call center agents (who receive phone calls) to process transactions related to goods/services. In some embodiments, the call center 100 may apply one or more business rules to decide to call, to determine whether to process a transaction directly or to have a fraud check performed on the caller.

The term “call data” for a call event or a segment of the call event may be understood to include not only audio data (e.g., audio signals, or call audio data) for the call event, but non-audio data for the call event. The term “call audio data” for the call event or segment of the call event may be understood to include the audio portion of the call data (e.g., audio signals). “Call audio data,” “audio sample,” “audio signal,” and “audio data” may be used interchangeably. The above-described examples of audio signal data are to be understood to be non-limiting, and one of ordinary skill in the art will appreciate that many other types of audio signal may likewise be utilized in accordance with the present technology. Additionally, audio information or data may be extracted from call audio data including both speaker models that represent the voice of a speaker and channel models that represent a communication profile of an audio path for a channel used by the speaker. The communications profile may include noise models, background noise, transfer path functions (as will be described in greater detail infra), as well as other representative characteristics that may be determined for a communications channel that would be known to one of ordinary skill in the art.

Examples non-audio data include identification (e.g., the phone number the caller called from), a dialed number identification service information (e.g., phone number the caller dialed), agent identification (e.g., the agent that handled the call), timestamp-date and time of call, type of call (e.g., subject matter of the call), an account or order identification (e.g., some unique transaction or account identifier that the call was in reference to), and a shipping zip code (e.g., if a product was to be delivered to a particular location), and so forth and any other available data that may be relevant to the call.

Additional examples of non-audio data include in various combinations a call identification that includes a unique identifier that identifies the call, an automatic number identification that represents the number that initiated a call event, a dialed number identification service that comprises a dialed number (e.g., telephone number, short code, etc.), an agent identification that specifies the call agent associated with the call event, a queue identifier that identifies the telephony queue into which a call event has been directed by the call center 100 (e.g., sales, technical support, fraud review, etc.), a timestamp that indicates a date and time when the call event was initiated, a call center identifier that indicates the call center which initially received the call event, and/or the like.

For a call in reference to an account and/or transaction, examples of non-audio data include an account number that specifies the account number that the call event was in reference to, a transaction number that specifies a transaction that the call was in reference to, names associated with the account (first, last, etc), a social security number or other government issued identification number, an address (current and/or previous), telephone number (current and/or previous), email address, account type (business, consumer, reseller, etc.), account opening date, credit limit, list of transactions associated with the account.

Examples of transaction non-audio data include a transaction identifier that uniquely identifies the transaction, a timestamp specifying a date and time for the transaction, a transaction disposition (e.g., change of address, account balance check, account payment details, account plan change, and so forth), a shipping address, and combinations thereof.

For a call in reference to an order, examples of non-audio data include an order number such as a unique order identification, a list of items ordered, an order value, a timestamp, a name, a shipping address, an email address, a phone number, a shipping method, billing details, and combinations thereof. Any of the above non-audio data may be used as an audio signal identifier.

All of the aforementioned types of data including audio and/or non-audio data may be employed to generate risk scores for a call event, as will be described in greater detail infra.

Many types of customer metadata may be determined from an evaluation of the above mentioned call data. Exemplary types of metadata include account, transaction, and/or order metadata, along with call metadata. Additional data may also be extracted from non-audio data, such as patterns or relationships.

It will be understood that the channel characteristics for a segment of call audio data may be sufficiently unique to determine that separate segments of call audio data belong to two separate speakers. For example, a customer calling into an enterprise may have channel characteristics that are inherently distinctive relative to the channel model associated with call agents of the enterprise. Therefore, differences in channel characteristics may alone suffice as a basis for diarizing and separating segments of call audio data.

The term “speaker model” may be understood to include a voice model representing the unique characteristics of an individual's voice, and/or a language model representing linguistic characteristics of the speaker. The voice model may include a collection of features that are extracted from an audio signal, of the individual's voice, and encoded within a specific statistical framework. In various embodiments, these features include cadence, tone, rate of speech, spectral characteristics, and/or other descriptive information about the voice and vocal track of the speaker that describes the speaker (separately from the words spoken). Other synonyms for a voice model may include, but are not limited to, a voice signature, a voice print, a voice portion of a speaker model, and also in some instances, simply a speaker voice. In various embodiments, the language model is comprised of features or characteristics (such as the words spoken and speech choices made by the speaker) and a statistical framework for encoding those features. Examples of a statistical framework include the probability of an occurrence of a string of words, and how that probability is calculated. In various embodiments, the language model includes language(s) spoken, word choice, word order, accent, grammar, diction, slang, jargon, rate of speech, and/or the like. It is noteworthy that in some instances information in addition to a speaker model (voice model and language model) can be extracted from call audio data. For example, a channel model may be extracted from call audio data, as described elsewhere herein. Further, word spotting or word recognition may be used to extract data, for example, name, account number, social security number, address, and/or the like from call audio data.

In some embodiments, all callers are recorded automatically, and an audio signal and/or non-audio data is stored for all calls. In other embodiments, a portion of the calls are recorded and/or stored. Additionally, the audio signal may be time stamped. Call audio data may be streamed for processing in real time and/or recorded and stored for processing.

The present technology may also enroll the stored voice signals determined to correspond to a fraudster into a blacklist that includes speaker/channel models determined to be associated with fraudsters. For additional details regarding the enrollment of speaker models into a blacklist see, e.g., U.S. patent application Ser. Nos. 11/404,342, 11/754,974, 11/754,975, 12/352,530, 12/856,037, 12/856,118, 12/856,200, which are all hereby incorporated by reference herein in their entireties. Similarly, the present technology may enroll the stored channel signals determined to correspond to a fraudster into a blacklist that includes channel models determined to be associated with fraudsters.

A call database 114 may store call data such as audio and non-audio data. Customer accounts for each customer may be stored in or linked to the call database 114. In some embodiments, various elements of the call data (including audio data and/or non-audio data) are stored in multiple separate databases and linked across those databases. For example, non audio data in one database may be linked to the customer account in another database using a call identifier that associates a particular call event with a customer account. In another example, a call event regarding an inquiry about a particular order may be linked to the order and/or an account associated with the order. Both legitimate and fraudulent call data may be linked to the customer account. In some embodiments, the call database 114 is a collection of multiple databases, for example, customer account data base, order database, customer support database, RMA (returned merchandise authorization) database, warranty database, white list, black list, customer history database, and/or the like.

In some embodiments, the call center 100 may include a fraud management system 116 that receives data indicative of potential or actual fraudulent activities from the FDS 102. The fraud management system 116 may utilize the fraud data provided by the FDS 102, along with other enterprise-specific information, to process and remedy fraudulent account activity.

A file transfer server 118 of the call center 100 may communicate recorded, live, or stored audio signals to the FDS 102 using Internet/LAN 106B. In some instances the audio signals and/or non-audio data may be streamed to the FDS 102 via the file transfer server 118. The Internet/LAN 106B may utilize a secure communications protocol. File transfer server 118 may communicate audio signal and/or non-audio data to an audio processing system, hereinafter “system 200” via an application programming interface (“API”) or any other suitable data transmission protocols, which may be secure or insecure. The audio signal and/or non-audio data may be communicated via Internet or LAN. Additional operational details of the system 200 are described in greater detail with regard to FIG. 2.

It will be understood that the FDS 102 may detect any type of fraud. However, for the purposes of brevity, the present disclosure focuses on fraud perpetrated by fraudsters utilizing telephonic devices. While not shown, the FDS 102 may include additional modules or engines that determine fraud and generate fraud reports. Additional details regarding the FDS 102 have been omitted so as not to obscure the description of the present technology. See, e.g., U.S. Patent Application Attorney Docket Number PA5872US, filed concurrently herewith on Mar. 8, 2012, entitled “SYSTEMS, METHODS, AND MEDIA FOR GENERATING HIERARCHICAL FUSED RISK SCORES.”

The enrolled speaker models and/or channel models in one or more fraudster databases/blacklists may be used as a corpus that may be queried against for comparing voice and/or channel data of a candidate audio sample.

The enrollment of speaker models into a fraudster database uses one or more precursor fraud databases. A precursor fraud database may be seeded with audio samples and associated audio sample identifiers collected without regard to fraudulent activity associated with the audio samples. The audio sample identifiers may be matched with identifiers in a fraud report. Speaker models extracted from audio in the precursor fraud database that is associated with the matched audio sample identifiers may be enrolled into the fraudster database. In various embodiments, the audio sample identifiers include any type of information that links the audio signal with the fraud identifiers. The audio sample identifiers include one or a combination of a call identifier, a customer account, a timestamp, identity information (name, social security number, etc.), agent information, and/or a communications device, such as a cellular telephone, a landline, or computing system that communicates via VoIP protocols. Information for a communications device may include data such as ANI, IMEI, caller identification, and so forth. As will be discussed below, channel models extracted from audio in the precursor fraud database that is associated with the matched audio sample identifiers may be enrolled into the fraudster database in a manner similar to speaker models.

Further details regarding precursor fraud databases well as the enrollment of fraudster voice signature/speaker models into a fraudster database/blacklist using precursor fraud databases are described in U.S. patent application Ser. Nos. 11/404,342, 11/754,974, 11/754,975, 12/352,530, 12/856,037, 12/856,118, 12/856,200, all of which are hereby incorporated by reference in their entirety herein. Channel model enrollment may be performed in a similar manner to speaker model enrollment, as described in these U.S. Patent Applications.

A channel model may be understood to include information that corresponds to the traversal path traveled by an audio sample. This information may be referred to as a “distortion” of a source of the received audio signal. Other terms for referring to this information include “noise” and “artifact.” An example of noise includes random and/or systematic features or artifacts in the audio sample that are present due to background or ambient noise generated from one or more sources. For example, noise features for an agent may include background voices from other call agents that are proximate the agent. Examples of artifacts include filtering, error recovery, packet handling, segmentation, beat frequencies, and so forth. One of ordinary skill in the art will appreciate that terms noise, artifact, distortion, and similar terms may be utilized interchangeably in some contexts.

In some embodiments, an audio signal and/or non-audio data for call events is stored in a precursor database for enrollment into a fraudster database, see e.g., U.S. patent application Ser. Nos. 11/404,342, 11/754,975 and 12/856,037, which are all hereby incorporated by reference herein in their entirety.

FIG. 2 illustrates the system 200 which may be utilized to process candidate audio samples to determine potential fraud. Additionally, the system 200 may enroll channel and/or speaker models that have been determined as being associated with fraud into one or more fraudster databases such as the fraudster database, fraudster voice database and/or fraudster channel database. In various embodiments one or more fraudster database, fraudster voice database and/or fraudster channel database may be employed. Generally speaking, the system 200 may include a diarization module 202 and an analysis module 204.

It is noteworthy that the system 200 may include additional modules, engines, or components, and still fall within the scope of the present technology. As used herein, the term “module” may also refer to any of an application-specific integrated circuit (“ASIC”), an electronic circuit, a processor (shared, dedicated, or group) that executes one or more software or firmware programs, a combinational module circuit, and/or other suitable components that provide the described functionality. In other embodiments, individual modules of the system 200 may include separately configured web servers.

In some embodiments, the system 200 may be implemented in a cloud computing environment. Generally speaking, a cloud computing environment or “cloud” is a resource that typically combines the computational power of a large grouping of processors and/or that combines the storage capacity of a large grouping of computer memories or storage devices. For example, systems that provide a cloud resource may be utilized exclusively by their owners, such as Google™ or Yahoo™; or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.

The cloud may be formed, for example, by a network of servers with each server providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users). Typically, each user may place workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.

The present technology may leverage the computational resources of distributed computing (e.g., cloud computing systems) to facilitate efficient processing of call data.

It is envisioned that the system 200 may cooperate with the FDS 102 or may, in some embodiments, function as a stand-alone audio processing system that may be utilized by an enterprise, separately from the FDS 102.

In other embodiments, a portion (or potentially all portions) of system 200 may be integrated into FDS 102, while in other embodiments, the constituent sub-modules/components of the system 200 may be remotely distributed from one another in a remote computing arrangement, wherein each of the modules may communicate with one another via the Internet 106B utilizing any one (or combination) of a number of communications protocols or communications mechanisms (e.g., API, HTTP, FTP, etc.).

An audio signal is received by the diarization module 202 from any of one or more originating sources, such as file transfer server 118, or may be received directly from callers 104 (see FIG. 1). Again, the audio signal may include not only voice data, but also channel data, and metadata associated with the voice data and the channel data.

Upon receiving call data, the diarization module 202 may diarize the audio signal into one or more segments. It will be understood that a segment of audio signal may include segments comprising voice data, channel data, and metadata for a unique speaker.

It is noteworthy that in some instances, the system 200 may receive diarized audio signals from the enterprise, such as when the diarization module 202 is associated with the call center 100. The system 200 may receive the diarized audio signals as recorded data or streamed data. Moreover, the call center 100 may record incoming call data (also referred to as the calling leg) and the outgoing call data (also referred to as the called leg) separately from one another. These two separate legs of call data may be stored and transmitted or optionally streamed to the FDS 102. Either of these two separate legs may include more than one voice. For example, agent 1 in the called leg may ask agent 2 in the called leg to speak with the caller on the calling leg.

According to some embodiments, the analysis module 204 comprises a communications module 206, an audio analysis module 208, an enrollment module 210, and a scoring module 212. It is noteworthy that the analysis module 204 may include additional or fewer modules, engines, or components, and still fall within the scope of the present technology. Additionally, the functionalities of two or more modules may be combined into a single module.

Generally speaking, the analysis module 204 may be executed upon receiving a fraud event identifier that indicates that a fraud event has occurred. Non-limiting examples of fraud event identifiers may include a fraud report that comprises one or more fraud events. That is, a fraud report may include multiple instances of fraud events. A fraud event identifier may specify details regarding an instance of fraud, such as the customer account which was defrauded, or other identifying information that allows the system to link a fraud event to enterprise related information, such as customer accounts, call queues, phone numbers, and so forth. The fraud event may also include a time stamp that identifies an approximate day and/or time that the fraud event occurred.

Once an audio signal has been received, the audio signal may be processed by execution of the audio analysis module 208. Generally speaking, the audio analysis module 208 may parse through the data included in the audio signal and determine both speaker models and channel models for the audio signal. The term “determine” may also include extract, calculate, analyze, evaluate, and so forth.

Again, the speaker model may be generated by initially generating a voice model and a language model for the call data. If the call data has already been diarized it may be inferred that the voice model, language model and also the speaker model are associated with a unique speaker. The speaker model may provide a robust and multifaceted profile of the speech of a particular speaker.

With regard to analyzing the audio signal, the audio analysis module 208 may employ speaker recognition and/or speech recognition. In general, speaker recognition may attempt to recognize the identity of the speaker (e.g., recognition of the voice of the speaker), whereas speech recognition refers to the process of recognizing what words have been spoken by the speaker.

In addition to determining a speaker model and a channel model for the audio signal, the audio analysis module 208 may also evaluate non-audio data, such as an audio signal identifier associated with the audio signal. It will be understood that each call event may be assigned an audio signal identifier that uniquely identifies a call event. The identifier may be used for tracking the call event and data associated with the call event during evaluation and/or enrollment into a fraudster database.

As mentioned above, a channel model may include information regarding the path that was traversed by an audio sample (e.g., the path between the caller and the call agent or enterprise system). The audio analysis module 208 may evaluate and model the delay present in the audio signal to characterize the path taken by the audio signal. In addition to modeling delay, the audio analysis module 208 may model jitter, echo, artifacts (such as artifacts introduced by audio compression/encoding techniques), error recovery, packet loss, changes to the signal bandwidth, spectral characteristics, and/or other audio artifacts that occur at switching boundaries. With particular regard to VoIP paths, discrete devices (e.g., routers, gateways, servers, computing devices, etc.) involved in the transmission of VoIP data may also imprint artifacts in an audio sample. The channel model also can model handset characteristics such as microphone type.

The audio analysis module 208 may also be adapted to utilize voice changer detection. That is, if a voice changer has been utilized in the generation of a candidate audio sample, audio artifacts may persist within the audio signal as it propagates to the enterprise. Audio signal characteristics may be correlated with signal tampering to detect the use of a voice changer.

In sum, the channel model may include a representation of the many types of artifacts and/or distortions (e.g., degradations, modifications, etc.) of the audio signal as it traverses along a given path. These distortions may be utilized to determine if the call originated from a particular source (e.g., geographic region such as a country), passed through a cellular telephone network, or many other types of distorting processes/features that impose unique noise signatures on the audio signal.

According to some embodiments, the enrollment module 210 is executed to perform a comparison of the audio signal identifier and/or other associated non-audio data in a precursor database, with the fraud event identifier and/or other associated non-audio data in the fraud report. If there is a match then the enrollment module 210 may update a fraudster voice database to include a speaker model associated with the audio signal identifier. Similarly, the enrollment module 210 may update a fraudster channel database to include a channel model associated with the audio signal identifier. The speaker model and/or channel model may be determined or extracted by the audio analysis module 208 before or after the enrollment module 210 is executed to perform a comparison of the audio signal identifier and the fraud event identifier. In some embodiments, the enrollment module 210 may find a match between a fraud event identifier and an audio signal identifier for multiple speaker models in the precursor database.

If a match is found between an audio signal identifier (and/or other non-audio data) and a fraud event identifier, the speaker models, voice models, language models and/or the channel models associated with the audio signal identifier may be automatically enrolled into a blacklist.

In some instances, a speaker model and/or a channel model for a call event may be automatically enrolled into a whitelist. For example, a customer may be explicitly prompted to enroll by stating a particular phrase multiple times. In another example, no match may be found between an audio signal identifier (and/or other non-audio data) and a set of fraud events. The speaker model and channel model extracted from the call data for the audio signal identifier may then be automatically enrolled into the whitelist. Enrollment that is conducted without the involvement of the speaker may be referred to as “passive enrollment.”

The enrollment module 210 may also store a channel model for a candidate audio sample in a whitelist database when an at least partial match between an audio signal identifier (and/or other non-audio data) for the candidate audio sample and a fraud event identifier cannot be determined. Stated otherwise, because the channel model extracted from a candidate audio sample does not match any fraud event identifiers stored in a fraudster database, it can be inferred that the candidate audio sample is not associated with a fraudster. For further details regarding the use of a whitelist see, e.g., U.S. patent application Ser. No. 12/352,530 which is hereby incorporated herein by reference in its entirety.

According to some embodiments, enrollment of a speaker model or a channel model may be affected by comparing a time stamp associated with a fraud event to a time stamp associated with the audio sample. That is, audio samples with time stamps that are temporally adjacent to time stamps of fraud events may more likely be associated with fraudsters than audio samples with time stamps that are temporally remote from fraud events. Audio samples with time stamps that are temporally remote from fraud events may not be automatically enrolled into the fraudster database, but may be flagged as subject to further review before being enrolled.

In some instances, the present technology may utilize active/dynamic scoring to further process call data to reduce the likelihood that a speaker model or channel model of an audio signal is mistakenly enrolled into a fraudster database.

According to other embodiments, the enrollment module 210 may maintain a list of channel models, where each channel model belongs to a disqualified candidate (e.g., fraudster). As mentioned previously, the channel model may represent a path traversed by an audio signal. The audio signal may be provided with an identifier (and/or non-audio data) that links the audio signal to a customer account, a communications device, and/or a specific fraudster. Stated otherwise, when a disqualified candidate is determined, information indicative of that disqualified candidate may be stored in a fraudster database.

In other examples, the scoring module 212 may compare the candidate audio sample to one or more speaker models stored in a fraudster database to generate a voice match score. These voice match scores represent an at least partial match between the candidate audio sample and one or more speaker models stored in the fraudster voice database. These match scores may represent the degree of similarity between the candidate audio sample and the one or more speaker models.

According to additional embodiments, the scoring module 212 may be executed to generate various types of risk scores for constituent parts of the call data. With reference to generating risk scores for a candidate audio sample (e.g., signal), the scoring module 212 may compare the candidate audio sample to a variety of call data components that have been stored in fraudster databases. For example, the scoring module 212 may compare the candidate audio sample to one or more channel models stored in a fraudster database to generate a match score and/or a channel risk score. Generally speaking, the risk score for a call event may represent the likelihood that a call event is associated with an instance of fraud, or even a particular fraudster. For example, a number between 0 and 1000 or between 0 and 10 may be generated by the scoring module 212; the higher the number, the higher the risk. The scoring module 212 may employ any of a variety of functions and/or mathematical relations for computing match scores and risk scores. Examples include averaging, weighted averaging, min/max selection, weighted min/max selection, correlation, likelihood function, and/or the like. In some embodiments a likelihood function is defined as the likelihood that a particular candidate call is fraud, or the likelihood that a particular candidate call is a specific fraudster, etc. The risk score may be used to evaluate both the call audio data (e.g., audio signals, extracted speaker characteristics, extracted channel characteristics) and call non-audio data such as account, transactional, order, or other call related records and metadata.

It will be understood that in some instances, the particular fraudster voice database that is selected may be based upon a comparison between the candidate audio sample and a channel model stored in the fraudster channel database. This type of analysis helps to “partition” the fraudster database into subsets or segments, where the candidate audio sample may be compared to more relevant samples, rather than generally comparing the audio sample to the entire fraudster database. Such partitioning may reduce the time required to determine a risk score for a speaker model, thus, enhancing real-time detection of fraudsters.

Stated otherwise, many speaker models may be maintained in a collection of various fraudster databases and/or fraudster voice databases. The scoring module 212 may be prevented from scanning each and every database until an at least a partial match is determined. Thus, multiple fraudster databases may be scanned for a match, while allowing the scoring module 212 to select a subset of the databases to utilize based on the scan, rather than using all data included in the entire collection of fraudster databases for scoring. Similarly, a fraudster database including multiple subsets, partitions or segments and may be scanned globally. Based on the results of the global scan, the scoring module 212 may select a segment or some of the multiple segments of the databases for scoring rather than all segments included in the fraudster database.

These optimization techniques may be employed utilizing other risk scores. For example, a fraudster channel database may be selected based upon a voice match score. In other embodiments, the fraudster voice database may be selected based upon the channel match score. In another embodiment, fraudster models may be maintained in a single fraudster database, and an analysis of the call data may aid in determining a subset of fraudster models to be used for comparison to the candidate audio sample. A subsequent generation of channel and voice match scores may be based on the comparison of the call data against the subset of the fraudster models, rather than the entire fraudster database. One of ordinary skill in the art will appreciate that other permutations and/or variations of the optimization concepts described herein may likewise be utilized in accordance with the present disclosure.

In other embodiments, the scoring module 212 may combine a channel match score and a voice match score to create an audio sample risk score. These match scores may be fused together to generate a fused risk score that represents a more comprehensive risk score for the candidate audio sample than would be available by considering the channel match score and the voice match score alone. Specific details for generating fused risk scores for call data are described in co-pending U.S. Patent Application Attorney Docket Number PA5872US, filed concurrently herewith on Mar. ______ 2012, entitled “SYSTEMS, METHODS, AND MEDIA FOR GENERATING HIERARCHICAL FUSED RISK SCORES,” which is hereby incorporated by reference herein in its entirety.

The scoring module 212 may also be configured to select a whitelist database, based on a channel match score. It is noteworthy that an entry in a whitelist includes one or more qualified candidates that are associated with a customer account. More specifically, in some instances a speaker model for a qualified candidate may be stored in the whitelist database. A candidate audio sample may be compared to speaker models included in the whitelist database to determine if the candidate is a qualified candidate. If the scoring module 212 is unable to match the candidate audio sample with a speaker model included in the database, this does not automatically indicate that the candidate is a disqualified candidate.

By way of non-limiting example, a husband and wife may be customers associated with the same credit card account. The whitelist database may only include a speaker model for the wife. Therefore, when the husband calls the credit card entity, an audio sample collected for the husband and compared against the whitelist database may not match the speaker model associated with the account. An alert may be provided to the caller agent that the speaker model for the candidate does not match, but upon gathering additional information, the caller agent may ultimately verify the candidate as a legitimate account holder. The collected audio sample for the husband may be stored in the whitelist database and associated with the customer account.

Additionally, the scoring module 212 may be executed to compare a candidate audio sample to the whitelist database to generate a whitelist match score. It may then incorporate this match score into the comprehensive audio sample risk score.

When a screening request is received by the communications module 206, the audio analysis module 208 may be executed to process an audio sample included in the request. More specifically, the audio analysis module 208 may compare the audio sample with channel models included in the list maintained by the enrollment module 210 in the fraudster channel database.

One or more channel match scores may be generated by the scoring module 212 that indicates an at least partial match between the audio sample and one or more channel models in the fraudster channel database.

Similarly to channel models, the audio sample may be compared to a list of speaker models that are associated with disqualified candidates in the fraudster voice database and/or to qualified candidates in the whitelist database. Voice match scores may be generated based on the comparisons. Audio sample match scores may be generated based on the voice match score and the channel match score. Voice match scores, channel match scores, and/or audio sample match scores may be used to generate risk scores for the audio sample, or may be provided to a third party for review. In some instances, a voice match score, channel match score, and/or audio sample match score may be provided to a call agent that is currently speaking with the speaker from which the audio sample was captured. That is, the system 200 may operate in near-real-time such that risk scores based on audio samples may be obtained during a transaction between a caller agent and a candidate. Risk scores generated from various match scores (voice match scores, channel match scores, and/or audio sample match scores) may be generated and provided to the caller agent to assist the caller agent in conducting the transaction. Upon receiving a risk score that indicates a high degree of risk, the caller agent may prompt the candidate for further information, may flag the call event for further review, and so forth. Conversely, upon receiving a risk score that indicates a low degree of risk, the caller agent may approve the current transaction. It is noteworthy to mention that the risk score may be utilized along with other scores to determine a risk level for a call event. For example, a risk score may indicate a level of risk rather than indicate that a particular caller is either a fraudster or a legitimate caller. Thus, if the risk score is high, that may indicate that the call event should be evaluated more carefully. The risk score may be combined with other risk scores using various relations or functions.

FIG. 3 illustrates a flowchart of an exemplary method for processing audio signals. The method may include a step 305 of receiving an audio signal and an associated audio signal identifier (and/or non-audio data). The audio signal may be received without regard to fraud activities. The audio signal may be received from a call center, or may be included in a diarized segment extracted from call data. The method may also include a step 310 of receiving a fraud event identifier associated with a fraud event. The fraud event identifier may include a timestamp that indicates an approximate time that a fraud event occurred.

Next, the method may include a step 315 of determining a speaker model and a channel model based on the received audio signal. The channel model may represent distortion included in the audio signal that uniquely identifies details of the audio signal such as country of origin, communications protocols and paths, and the like. The combination of the received audio signal, speaker characteristics and the channel characteristics provide a robust set of data that may be compared against various fraudster databases to determine if the caller associated with the audio file is a fraudster.

The method may also include a step 320 of updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identifier, along with a step 325 of updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.

The method may also include various steps for generating different types of match scores and risk scores for the speaker model, channel model, and the audio signal. For example, the method may include an optional step 330 of receiving a candidate audio sample and a step 335 of determining a channel match score based on a match between candidate audio sample and a channel model in the fraudster channel database.

Additionally, the method may include a step 340 of determining a voice match score based on a match between candidate audio sample and a speaker model in the fraudster voice database, along with a step 345 of determining an audio sample risk score based on the channel match score and the voice match score.

It will be understood that the method may include additional or fewer or steps that those listed above. Additionally, optional steps have been shown as dotted lined objects in the Figures.

FIG. 4 illustrates an exemplary computing system 400 that may be used to implement an embodiment of the present technology. The computing system 400 of FIG. 4 may be implemented in the contexts of the likes of computing systems, clouds, modules, engines, networks, servers, and so forth. The computing system 400 of FIG. 4 includes one or more processor units 410 and main memory 420. Main memory 420 stores, in part, instructions and data for execution by processor unit 410. Main memory 420 may store the executable code when in operation. The system 400 of FIG. 4 further includes a mass storage device 430, portable storage devices(s) 440, output devices 450, input devices 460, a graphics display 470, and peripherals 480.

The components shown in FIG. 4 are depicted as being connected via a single bus 490. The components may be connected through one or more data transport means. Processor unit 410 and main memory 420 may be connected via a local microprocessor bus, and the mass storage device 430, peripheral(s) 480, portable storage device 440, and display system 470 may be connected via one or more input/output (I/O) buses.

Mass storage device 430, which may be implemented with a magnetic disk drive and/or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 410. Mass storage device 430 may store the system software for implementing embodiments of the present technology for purposes of loading that software into main memory 420. The mass storage device 430 may also be used for storing databases, such as the fraudster voice database, the fraudster channel database, and the precursor database.

Portable storage device 440 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk, digital video disc, or USB storage device, to input and output data and code to and from the computing system 400 of FIG. 4. The system software for implementing embodiments of the present technology may be stored on such a portable medium and input to the computing system 400 via the portable storage device 440.

Input devices 460 provide a portion of a user interface. Input devices 460 may include an alphanumeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additionally, the system 400 as shown in FIG. 4 includes output devices 450. Suitable output devices include speakers, printers, network interfaces, and monitors.

Display system 470 may include a liquid crystal display (LCD) or other suitable display device. Display system 470 receives textual and graphical information, and processes the information for output to the display device.

Peripherals 480 may include any type of computer support device to add additional functionality to the computing system. Peripheral device(s) 480 may include a modem or a router.

The components provided in the computing system 400 of FIG. 4 are those typically found in computing systems that may be suitable for use with embodiments of the present technology and are intended to represent a broad category of such computer components that are well known in the art. Thus, the computing system 400 of FIG. 4 may be a personal computer, hand held computing system, telephone, mobile computing system, workstation, server, minicomputer, mainframe computer, or any other computing system. The computer may also include different bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems may be used including Unix, Linux, Windows, Macintosh OS, Palm OS, Android, iOS (iPhone OS), VMWare OS, and other suitable operating systems.

It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the technology. Computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU), a processor, a microcontroller, or the like. Such media may take forms including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of computer-readable storage media include a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic storage medium, a CD-ROM disk, digital video disk (DVD), any other optical storage medium, RAM, PROM, EPROM, a FLASHEPROM, any other memory chip or cartridge.

While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the broad disclosure and that this disclosure is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In an area of technology such as this, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modifiable in arrangement and detail as facilitated by enabling technological advancements without departing from the principals of the present disclosure.

In the foregoing specification, specific embodiments of the present disclosure have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present disclosure. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The disclosure is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.

In the foregoing specification, the invention is described with reference to specific embodiments thereof, but those skilled in the art will recognize that the invention is not limited thereto. Various features and aspects of the above-described invention can be used individually or jointly. Further, the invention can be utilized in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded as illustrative rather than restrictive. It will be recognized that the terms “comprising,” “including,” and “having,” as used herein, are specifically intended to be read as open-ended terms of art. Moreover, the phrase “at least one of . . . and” and “at least one of . . . or” will both be understood to allow for individual selection of any of the listed features, or any combination of the individual features.

Claims

1. A method for screening callers in a call center, the method comprising:

maintaining a set of channel models, wherein each channel model represents noise, artifacts, distortions, degradations, or modifications of call audio data associated with an instance of fraud;
receiving a screening request, the screening request comprising an audio sample from a caller;
extracting a caller's channel model from the audio sample;
comparing the caller's channel model with channel models in the set of channel models;
generating a channel match score based on the comparison, the channel match score indicating the level of risk that the caller is a fraudster; and
providing the channel match score to a call agent that is currently speaking with the caller.

2. The method according to claim 1, wherein the comparing the caller's speaker model with channel models in the set of channel models, comprises:

selecting channel models from the set of channel models based on past calls associated with the caller's account; and
comparing the caller's channel model with the selected channel models.

3. The method according to claim 1, further comprising:

maintaining a set of speaker models, wherein each speaker model represents voice characteristics and linguistic characteristics of a fraudster;
extracting a caller's speaker model from the audio sample;
comparing the caller's speaker model with speaker models in the set of speaker models;
generating a voice match score based on the comparison, the voice match score indicating the level of risk that the caller is a fraudster; and
providing the voice match score to the call agent that is currently speaking with the caller.

4. The method according to claim 3, wherein the comparing the caller's speaker model with speaker models in the set of speaker models, comprises:

selecting, based on the channel match score, speaker models from the set of speaker models; and
comparing the caller's speaker model with the selected speaker models.

5. The method according to claim 3, further comprising:

generating a risk score based on the voice match score and the channel match score; and
providing the risk score to the call agent that is currently speaking with the caller, the risk match score indicating the level of risk that the caller is a fraudster.

6. The method according to claim 1, wherein the caller's channel model comprises noise, artifacts, distortions, degradations, or modifications resulting from a telecommunications path between the caller and the call center.

7. The method according to claim 6, wherein the noise, artifacts, distortions, degradations, or modifications indicate a landline, VoIP phone, or cellular phone.

8. The method according to claim 6, wherein the noise, artifacts, distortions, degradations, or modifications indicate a CDMA, GSM, or VOIP communication method.

9. The method according to claim 6, wherein the noise, artifacts, distortions, degradations, or modifications indicate a geographic region of the caller.

10. The method according to claim 1, wherein the caller's channel model comprises noise, artifacts, distortions, degradations, or modifications resulting from one or more devices used by the caller.

11. The method according to claim 10, wherein the one or more devices comprises a voice changer.

12. The method according to claim 10, wherein the one or more devices comprises a microphone in a handset used by the caller.

13. A system for screening callers in a call center, the system comprising:

a call database storing a set of channel models, wherein each channel model represents noise, artifacts, distortions, degradations, or modifications of call audio data associated with an instance of fraud; and
a computing system in communication with the call database comprising a processor, a display system, and a memory, wherein the memory stores computer-readable instructions causing the processor to perform operations comprising: receiving a screening request, the screening request comprising an audio sample from a caller, extracting a caller's channel model from the audio sample, comparing the caller's channel model with channel models in the set of channel models, generating a channel match score based the comparison, the channel match score indicating the level of risk that the caller is a fraudster, and transmitting the channel match score to the display system used by a call agent that is currently speaking with the caller.

14. The system according to claim 13, wherein the caller's channel model comprises noise, artifacts, distortions, degradations, or modifications resulting from a telecommunications path between the caller and the call center.

15. The system according to claim 14, wherein the noise, artifacts, distortions, degradations, or modifications indicate a landline, VoIP phone, or cellular phone.

16. The system according to claim 14, wherein the noise, artifacts, distortions, degradations, or modifications indicate a CDMA, GSM, or VOIP communication type.

17. The system according to claim 14, wherein the noise, artifacts, distortions, degradations, or modifications indicate a geographic region of the caller.

18. The system according to claim 13, wherein the caller's channel model comprises noise, artifacts, distortions, degradations, or modifications resulting from one or more devices used by the caller.

19. The system according to claim 18, wherein the one or more devices comprises a voice changer.

20. A non-transitory tangible computer readable storage medium containing computer readable program code that when executed by a processor of a computing device cause the computing device to perform operations comprising:

maintaining a set of channel models, wherein each channel model represents noise, artifacts, distortions, degradations, or modifications of call audio data associated with an instance of fraud;
receiving a screening request, the screening request comprising an audio sample from a caller;
extracting a caller's channel model from the audio sample;
comparing the caller's channel model with channel models in the set of channel models;
generating a channel match score based on the comparison, the channel match score indicating the level of risk that the caller is a fraudster; and
providing the channel match score to a call agent that is currently speaking with the caller.
Patent History
Publication number: 20170133017
Type: Application
Filed: Oct 13, 2016
Publication Date: May 11, 2017
Inventors: Anthony Rajakumar (Fremont, CA), Torsten Zeppenfeld (Emerald Falls, CA), Lisa Guerra (Los Altos, CA), Vipul Vyas (Palo Alto, CA)
Application Number: 15/292,659
Classifications
International Classification: G10L 17/06 (20060101); G06Q 20/40 (20060101); G10L 17/26 (20060101); H04M 15/00 (20060101); G10L 17/02 (20060101);