Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

string - Word-level edit distance of a sentence

Is there an algorithm that lets you find the word-level edit distance between 2 sentences? For eg., "A Big Fat Dog" and "The Big House with the Fat Dog" have 1 substitute, 3 insertions

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In general, this is called the sequence alignment problem. Actually it does not matter what entities you align - bits, characters, words, or DNA bases - as long as the algorithm works for one type of items it will work for everything else. What matters is whether you want global or local alignment.

Global alignment, which attempt to align every residue in every sequence, is most useful when the sequences are similar and of roughly equal size. A general global alignment technique is the Needleman-Wunsch algorithm algorithm, which is based on dynamic programming. When people talk about Levinstain distance they usually mean global alignment. The algorithm is so straightforward, that several people discovered it independently, and sometimes you may come across Wagner-Fischer algorithm which is essentially the same thing, but is mentioned more often in the context of edit distance between two strings of characters.

Local alignment is more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The Smith-Waterman algorithm is a general local alignment method also based on dynamic programming. It is quite rarely used in natural language processing, and more often - in bioinformatics.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...