Sunday, April 22, 2012

Depth First Search and Breadth First Search (graph search series)

I shall be writing a series of posts on graph search algorithms, starting from this one.

Depth first search and breadth first search are among the simplest of algorithms and are generally used as a basis for implementing more sophisticated searches rather than just to be used as is. These algorithms are used as a way to traverse the nodes of trees or general graphs in a systematic way according to how the nodes are connected.

Depth first search starts from the root node, or simply the start node in a general graph, process the node, take a new neighboring node and repeat. When there are no more new neighboring nodes to take, go back to the node that got you to the current node, take another new node and repeat. If you go back to the root then you have traversed all nodes and the process ends.

If "O" is an unprocessed node, "X" is a processed node and "(X)" is the current node, here's what would happen to this tree:
_____+-------O-------+
 +---O---+       +---O---+
 O       O       O       O

     +------(X)------+
 +---O---+       +---O---+
 O       O       O       O

     +-------X-------+
 +--(X)--+       +---O---+
 O       O       O       O

     +-------X-------+
 +---X---+       +---O---+
(X)      O       O       O

     +-------X-------+
 +--(X)--+       +---O---+
 X       O       O       O

     +-------X-------+
 +---X---+       +---O---+
 X      (X)      O       O

     +-------X-------+
 +--(X)--+       +---O---+
 X       X       O       O

     +------(X)------+
 +---X---+       +---O---+
 X       X       O       O

     +-------X-------+
 +---X---+       +--(X)--+
 X       X       O       O

     +-------X-------+
 +---X---+       +---X---+
 X       X      (X)      O

     +-------X-------+
 +---X---+       +--(X)--+
 X       X       X       O

     +-------X-------+
 +---X---+       +---X---+
 X       X       X      (X)

     +-------X-------+
 +---X---+       +--(X)---+
 X       X       X       X

     +------(X)------+
 +---X---+       +---X---+
 X       X       X       X

     +-------X-------+
 +---X---+       +---X---+
 X       X       X       X

For this to work you need enough memory for remembering the path you have took to get to the current node, assuming that there is a natural order for accessing the next nodes from the current node (that is, you will not revisit the same next node from the current node twice).

Breadth first search starts from the root node, or simply the start node in a general graph, and enqueues it into a queue. Then the next node in the queue is dequeued, the node is processed, all of its neighboring nodes are enqueued into the queue and repeat. If the queue is empty because no processed node has any neighboring nodes to add anymore, all the nodes were traversed and the process ends.

If "O" is an unprocessed node, "X" is a processed node and "(X)" is the current node, here's what would happen to this tree:
_____+-------O-------+
 +---O---+       +---O---+
 O       O       O       O

     +------(X)------+
 +---O---+       +---O---+
 O       O       O       O

     +-------X-------+
 +--(X)--+       +---O---+
 O       O       O       O

     +-------X-------+
 +---X---+       +--(X)--+
 O       O       O       O

     +-------X-------+
 +---X---+       +---X---+
(X)      O       O       O

     +-------X-------+
 +---X---+       +---X---+
 X      (X)      O       O

     +-------X-------+
 +---X---+       +---X---+
 X       X      (X)      O

     +-------X-------+
 +---X---+       +---X---+
 X       X       X      (X)

     +-------X-------+
 +---X---+       +---X---+
 X       X       X       X

For this to work you need enough memory for remembering the queue of nodes that are next to process. The breadth first search is great for working on graphs with infinitely long paths as it does not get stuck traversing the same path but checks all the paths simultaneously as they are gradually traversed. The nodes on the queue are called the "fringe" as they are the edges of this ever widening circle of exploration.

Here is the pseudo code. We shall be using a successor function which gives a list of the next nodes from a given node in order to keep the algorithms as generic as possible.

Iterative version of depth first search:
sub depthFirstSearch(startNode, successorFunction, nodeProcessor):
  stack = [[startNode]]
  while stack is not empty:
    currSiblings = stack.pop()
    currNode = currSiblings.pop()
    if currSiblings is not empty:
      stack.push(currSiblings)

    nodeProcessor(currNode)
    
    nextSiblings = successorFunction(currNode)
    if nextSiblings is not empty:
      stack.push(nextSiblings)

Of course depth first search is naturally recursive so we can do it like this:
sub depthFirstSearch(startNode, successorFunction, nodeProcessor):
  sub _depthFirstSearch(currNode):
    nodeProcessor(currNode)
    for each nextNode in successorFunction(currNode):
      _depthFirstSearch(nextNode)

  _depthFirstSearch(startNode)

Breadth first search is more naturally iterative than recursive:
sub breadthFirstSearch(startNode, successorFunction, nodeProcessor):
  queue = [startNode]
  while queue is not empty:
    currNode = queue.dequeue()

    nodeProcessor(currNode)

    nextNodes = successorFunction(currNode)
    if nextNodes is not empty:
      queue.enqueueAll(nextNodes)

However, this is what it would look like recursively:
sub breadthFirstSearch(startNode, successorFunction, nodeProcessor):
  sub _breadthFirstSearch(queue):
    if queue is not empty:
      currNode = queue.dequeue()

      nodeProcessor(currNode)

      nextNodes = successorFunction(currNode)
      if nextNodes is not empty:
        queue.enqueueAll(nextNodes)

      _breadthFirstSearch(queue)

  _breadthFirstSearch([startNode])

Friday, April 20, 2012

Finding the median value using MySQL sql

The median of a finite list of numbers is defined as the number in the middle of the list when the list is sorted, or the average (arithmetic mean) of the two numbers in the middle if the size of the list is even.

Since, as I described in my previous post, averages are meaningless when it comes to something like marks, I've decided to start using medians instead since it would be more useful to know in which half of the class you stand with your mark, and medians are much more intuitive than means.

Here is the SQL to find the median in MySQL (adapted from http://stackoverflow.com/a/7263925)

SET @rownum := -1;

SELECT
   AVG(t.mark)
FROM
(
   SELECT
      @rownum := @rownum + 1 AS rownum,
      results.mark AS mark
   FROM
      results
   ORDER BY results.mark
) AS t
WHERE
   t.rownum IN (
      CEIL(@rownum/2),
      FLOOR(@rownum/2)
   )
;

Here's a brief explanation:

In MySQL we cannot use expressions in LIMIT, so instead we have to simulate a LIMIT by using a counter with user defined variables. Each row in the table will have a row number associated with it starting from 0 in the variable @rownum.

We then choose from these numbered rows the ceiling and floor of "(number of rows - 1)/2" which will be the same middle value if the number of rows is odd or the middle two values if the number of rows is even. The variable will still retain its value after all the rows are numbered so we can use it to know the number of rows.

Since the average of one number is the number itself, we can simply calculate the average of whatever is returned without checking. If the number of rows is odd then we shall find the average of the middle value which will remain the same and if the number of rows is even then we shall find the average of the middle two values will will be the correct median.

Don't forget that if you're using this in PHP you have to use two "mysql_query" calls, one for the first SET and another for the SELECT as you cannot use ";" in queries.