2015年10月14日 星期三

有關 字串 發音 搜尋 phonetic algorithm Soundex 模糊 搜尋 演算法 自動完成


 approximate string matching
In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately







This article is about the phonetic algorithm. For the Rock n' Soul band, see the SoundEx.
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.[1] The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software such as DB2, PostgreSQL,[2] MySQL,[3] Ingres, MS SQL Server[4] and Oracle[5]) and is often used (incorrectly) as a synonym for "phonetic algorithm".[citation needed] Improvements to Soundex are the basis for many modern phonetic algorithms.[6]
Soundex是一種語音算法,利用英文字的讀音計算近似值,值由四個字符構成,第一個字符為英文字母,後三個為數字。在拼音文字中有時會有會念但不能拼出正確字的情形,可用Soundex做類似模糊匹配的效果。例如Knuth和Kant二個字符串,它們的Soundex值都是「K530」。其在電腦大師高德納名著《計算機程序設計藝術》都有詳細的介紹。

The Soundex Indexing System (U.S. National Archives and Records Administration)

沒有留言: