Levenshtein Distance

Given a misspelled string s, a spell checker should return words in the dictionary which are close to s.

In 1965, Vladimir Levenshtein defined the distance between 2 words as the minimum number of “edits” it would take to transform the misspelled word into a correct word, where a single edit is the insertion, deletion, or substitution of a single character.

Example: Singapore
1. Insertion ‘a’ : Singapoare

2. Deletion (or missing) ‘r’: Singapo_e

3. Substitution ‘e’ to ‘a’: Singapora

Given 2 strings (represented as arrays of characters) A and B.
Length of A = a
Length of B = b
Distance =D(A, B) = The minimum number of edits to transform A to B.
\boxed{  D(A, B) = 1 + \min  \begin{cases} D(A[0: a-2] , B[0: b-2]) \\  D(A[0: a-2], B)\\  D(A , B[0: b-2])\\  \end{cases}  }

image

Note 1: Below is the C++ code for computing D(A,B):

20131027-023754.jpg

Note 2:
A = Orchestra
B = Carthorse
D(A,B) = 8
Longest: ‘r, h, s’

20131027-024724.jpg

Remark: a common misspell error ‘Transposition’ could also be considered as an edit:

Signapore -> Singapore

This is the “better” Damerau-Levenshtein Distance:

http://en.m.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance

[References]:

http://en.m.wikipedia.org/wiki/Levenshtein_distance

Elements of Programming Interviews: 300 Questions and Solutions

elementsofprogramminginterviews.com

NLB Call No: 005.1 AZI

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s