HappyDoc Generated Documentation Class: SequenceMatcher

HappyDoc3-r3_1 / happydoclib / docset / docset_TAL / TAL / ndiff.py / SequenceMatcher 

Methods   
  ratio 
ratio ( self )

Return a measure of the sequences' similarity (float in [0,1]).

Where T is the total number of elements in both sequences, and M is the number of matches, this is 2*M / T. Note that this is 1 if the sequences are identical, and 0 if they have nothing in common.

  get_matching_blocks 
get_matching_blocks ( self )
  set_seq1 
set_seq1 ( self,  a )
  set_seq2 
set_seq2 ( self,  b )
  __chain_b 
__chain_b ( self )
  quick_ratio 
quick_ratio ( self )

Return an upper bound on ratio() relatively quickly.

  get_opcodes 
get_opcodes ( self )
  __helper 
__helper (
        self,
        alo,
        ahi,
        blo,
        bhi,
        answer,
        )
  real_quick_ratio 
real_quick_ratio ( self )

Return an upper bound on ratio() very quickly

  find_longest_match 
find_longest_match (
        self,
        alo,
        ahi,
        blo,
        bhi,
        )

Find longest matching block in a[alo:ahi] and b[blo:bhi].

If isjunk is not defined:

Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where alo <= i <= i+k <= ahi blo <= j <= j+k <= bhi and for all (i',j',k') meeting those conditions, k >= k' i <= i' and if i == i', j <= j' In other words, of all maximal matching blocks, return one that starts earliest in a, and of all those maximal matching blocks that start earliest in a, return the one that starts earliest in b.

If isjunk is defined, first the longest matching block is determined as above, but with the additional restriction that no junk element appears in the block. Then that block is extended as far as possible by matching (only) junk elements on both sides. So the resulting block never matches on junk except as identical junk happens to be adjacent to an "interesting" match.

If no blocks match, return (alo, blo, 0).

  set_seqs 
set_seqs (
        self,
        a,
        b,
        )
  __init__ 
__init__ (
        self,
        isjunk=None,
        a='',
        b='',
        )

Members: a first sequence b second sequence; differences are computed as "what do we need to do to a to change it into b?" b2j for x in b, b2j[x] is a list of the indices (into b) at which x appears; junk elements do not appear b2jhas b2j.has_key fullbcount for x in b, fullbcount[x] == the number of times x appears in b; only materialized if really needed (used only for computing quick_ratio()) matching_blocks a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k]; ascending & non-overlapping in i and in j; terminated by a dummy (len(a), len(b), 0) sentinel opcodes a list of (tag, i1, i2, j1, j2) tuples, where tag is one of replace a[i1:i2] should be replaced by b[j1:j2] delete a[i1:i2] should be deleted insert b[j1:j2] should be inserted equal a[i1:i2] == b[j1:j2] isjunk a user-supplied function taking a sequence element and returning true iff the element is "junk" -- this has subtle but helpful effects on the algorithm, which I'll get around to writing up someday <0.9 wink>. DON'T USE! Only __chain_b uses this. Use isbjunk. isbjunk for x in b, isbjunk(x) == isjunk(x) but much faster; it's really the has_key method of a hidden dict. DOES NOT WORK for x in a!


This document was automatically generated Tue Dec 5 08:30:41 2006 by HappyDoc version 3.1