JMP

XMPPTwitterReddit
Featured Image

Computing International Call Rates with a Trie

singpolyma@singpolyma.net

A few months ago we launched International calling with JMP.  One of the big tasks leading up to this launch was computing the rate card: that is, how much calls to different destinations would cost per minute.  While there are many countries in the world, there are even more calling destinations.  Our main carrier partner for this feature lists no fewer than 59881 unique phone number prefixes in the rates they charge us.  This list is, quite frankly, incomprehensible.  One can use it to compute the cost of a call to a particular number, but it gives no confidence about the cost of calls in general.  Many items on this list are similar, and so I set out to create a better list.

My first attempt was a simple one-pass algorithm.  This would record each prefix with its price and then if a longer prefix with a different price were discovered it would add that as well.  This removes the most obvious effectively-duplicate data, but still left a very large list.  I added our markup and various rounding rules (since increments of whole cents are easier to understand in most cases anyway, for example) which did cut down a bit further, but it became clear that the one-pass was not going to be sufficient.  Consider:

  1. +00 at $0.01
  2. +0010 at $0.02
  3. +0011 at $0.02
  4. +0012 at $0.02
  5. +0013 at $0.02
  6. +0014 at $0.02
  7. +0015 at $0.02
  8. +0016 at $0.02
  9. +0017 at $0.02
  10. +0018 at $0.02
  11. +0019 at $0.02

There are many sets of prefixes that look like this in the data.  Of course the right answer here is that +001 is $0.02, which is much easier to understand than this list, but the algorithm cannot know that until it has seen all 10 overlapping prefixes.  Even worse:

  1. +00 at $0.01
  2. +0010 at $0.02
  3. +0011 at $0.02
  4. +0012 at $0.02
  5. +0013 at $0.02
  6. +0014 at $0.02
  7. +0015 at $0.03
  8. +0016 at $0.02
  9. +0017 at $0.02
  10. +0018 at $0.02
  11. +0019 at $0.02

From this input we would like:

  1. +00 at $0.01
  2. +001 at $0.02
  3. +0015 at $0.03

So just checking if the prefixes we have so far are a fully-overlapped set is not enough.  Well, no problem, it’s not that much data, perhaps I can implement a brute-force approach and be done with it.

Brute force is very slow.  On this data it completed, but as I found I kept wanting to tweak rounding rules and other parts of the overlap detection the speed became really problematic.  So I was searching for a non-bruteforce way that would be optimal across all prefixes and fast enough to re-run often in order to play with the effects of rounding rules.

Trie

As I was discussing the problem with a co-worker, trying to speed up lookups we were thinking about trees.  Maybe a tree where traversal to the next level was determined by the next digit in the prefix?  As we explored what this would look like, it became obvious that we were inventing a Trie.  So I grabbed a gem and started monkeypatching things.

Most Trie implementations are about answering yes/no questions and don’t store anything but the prefix in the tree.  I wanted to be able to “look down” from any node in the tree to see if the data was overlapping, and so storing rates right in the nodes seemed useful:

def add_with(chars, rate)
    if chars.empty? # leaf node for this prefix
        @rate = rate
        terminal!
    else
        add_to_children_tree_with(chars, rate)
    end
end

But sometimes we have a level that doesn’t have a rate, so we need to compute its rate from the majority-same rate of its children:

def rate
    # This level has a known rate already
    return @rate if @rate

    groups =
        children_tree.each_value.to_a         # Immediate children
        .select { |x| x.rate }                # That have a rate
        .combination(2)                       # Pairwise combinations
        .select { |(x, y)| x.rate == y.rate } # That are the same
        .group_by { |x| x.first.rate }        # Group by rate
    unless groups.empty?
        # Whichever rate has the most entries in the children is our rate
        @rate = groups.max_by { |(_, v)| v.length }.first
        return @rate
    end

    # No rate here or below
    nil
end

This algorithm is naturally recursive on the tree, so even if the immediate children don’t have a rate they will compute from their children, etc.  And finally a traversal to turn this all back into the flat list we want to store:

def each
    if rate
        # Find the rate of our parent in the tree,
        # possibly computed in part by asking us
        up = parent
        while up
            break if up.rate
            up = up.parent
        end

        # Add our prefix and rate to the list unless parent has it covered
        yield [to_s, rate] unless up&.rate == rate
    end

    # Add rates from children also
    children_tree.each_value do |child|
        child.each { |x| yield x }
    end
end

This (with rounding rules, etc) cut the list from our original of 59881 down to 4818.  You can browse the result.  It’s not as short as I was hoping for, but many destinations are manageable now, and thanks to a little bit of Computer Science we can tweak it in the future and just rerun this quick script.

Creative Commons Attribution ShareAlike