/ HappyDoc3-r3_1 / happydoclib / docset / docset_TAL / TAL / ndiff.py / SequenceMatcher
Methods
|
|
|
ratio
|
ratio ( self )
Return a measure of the sequences' similarity (float in [0,1]).
Where T is the total number of elements in both sequences, and
M is the number of matches, this is 2*M / T.
Note that this is 1 if the sequences are identical, and 0 if
they have nothing in common.
|
|
get_matching_blocks
|
get_matching_blocks ( self )
|
|
set_seq1
|
set_seq1 ( self, a )
|
|
set_seq2
|
set_seq2 ( self, b )
|
|
__chain_b
|
__chain_b ( self )
|
|
quick_ratio
|
quick_ratio ( self )
Return an upper bound on ratio() relatively quickly.
|
|
get_opcodes
|
get_opcodes ( self )
|
|
__helper
|
__helper (
self,
alo,
ahi,
blo,
bhi,
answer,
)
|
|
real_quick_ratio
|
real_quick_ratio ( self )
Return an upper bound on ratio() very quickly
|
|
find_longest_match
|
find_longest_match (
self,
alo,
ahi,
blo,
bhi,
)
Find longest matching block in a[alo:ahi] and b[blo:bhi].
If isjunk is not defined:
Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
alo <= i <= i+k <= ahi
blo <= j <= j+k <= bhi
and for all (i',j',k') meeting those conditions,
k >= k'
i <= i'
and if i == i', j <= j'
In other words, of all maximal matching blocks, return one
that starts earliest in a, and of all those maximal matching
blocks that start earliest in a, return the one that starts
earliest in b.
If isjunk is defined, first the longest matching block is
determined as above, but with the additional restriction that
no junk element appears in the block. Then that block is
extended as far as possible by matching (only) junk elements on
both sides. So the resulting block never matches on junk except
as identical junk happens to be adjacent to an "interesting"
match.
If no blocks match, return (alo, blo, 0).
|
|
set_seqs
|
set_seqs (
self,
a,
b,
)
|
|
__init__
|
__init__ (
self,
isjunk=None,
a='',
b='',
)
Members:
a
first sequence
b
second sequence; differences are computed as "what do
we need to do to a to change it into b ?"
b2j
for x in b, b2j[x] is a list of the indices (into b)
at which x appears; junk elements do not appear
b2jhas
b2j.has_key
fullbcount
for x in b, fullbcount[x] == the number of times x
appears in b; only materialized if really needed (used
only for computing quick_ratio())
matching_blocks
a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
ascending & non-overlapping in i and in j; terminated by
a dummy (len(a), len(b), 0) sentinel
opcodes
a list of (tag, i1, i2, j1, j2) tuples, where tag is
one of
replace a[i1:i2] should be replaced by b[j1:j2]
delete a[i1:i2] should be deleted
insert b[j1:j2] should be inserted
equal a[i1:i2] == b[j1:j2]
isjunk
a user-supplied function taking a sequence element and
returning true iff the element is "junk" -- this has
subtle but helpful effects on the algorithm, which I'll
get around to writing up someday <0.9 wink>.
DON'T USE! Only __chain_b uses this. Use isbjunk.
isbjunk
for x in b, isbjunk(x) == isjunk(x) but much faster;
it's really the has_key method of a hidden dict.
DOES NOT WORK for x in a!
|
|
|