Monday, November 7, 2011

A more complete proof that the square root of 2 is irrational.

I don't know about you but I never quite liked the usual proof by contradiction that the square root of 2 is irrational. It seems incomplete in some way. I never felt convinced by it. Here's what I feel is the missing piece of the puzzle.

Assume that the square root of 2 is rational. So,
√2 = a/b
=>
2 = (a/b)^2
=>
2 = a^2 / b^2
=>
2 b^2 = a^2

So far so good. The usual proof continues with the following statement:

Since a^2 is equal to a natural number multiplied by 2, a^2 is an even number. But for a^2 to be even, a must be even too (see proof in the appendix at the end). So that means that there is a natural number k where a = 2k.

Since a = 2k and 2 b^2 = a^2,
2 b^2 = a^2
=>
2 b^2 = (2k)^2
=>
2 b^2 = 4 k^2
=>
b^2 = 2 k^2

Just like for a, b must also be an even number.

The proof usually ends right there, claiming that since a and b are both even numbers, then the fraction a/b is not simplified and irreducible, contradicting that a/b exists. But let's see where the proof takes us if we just keep on going.

If both a and b are even, then the fraction a/b can be simplified by dividing both a and b by 2, that is, if a = 2k and b = 2l, then we can say that √2 = k/l. But after doing this we can reapply the same reasoning on k and l and we'll discover that k and l are also both even numbers, and we can do it again and again ad infinitum.

So, which natural numbers can be divided by 2 infinitely? Only 1 number can do that, zero. But replacing zero for both a and b will not make their quotient a real number, or if you want to define 0/0, it will not result in a number whose square equals 2. So there is no fraction a/b which gives √2.

So there you have it, a proof that goes on till the end.

===========================
APPENDIX
Now on to the proof that an even square can only come from an even number squared:

Let a^2 be an even number.

a can either be even or odd, that is there must exist an n where
a = 2n or a = 2n + 1
If a = 2n, a^2 = (2n)^2 = 4 n^2 = 2(2 n^2), which is an even number
If a = 2n + 1, a^2 = (2n + 1)^2 = 4 n^2 + 4n + 1 = 2(2 n^2 + 2n) + 1, which is an odd number

So an even number squared will give an even number and an odd number squares will give an odd number. Hence, a square even number can only come from an even number squared.

Wednesday, November 2, 2011

Wisdom hierarchy vs Bloom's taxonomy

So lately I've been reading about two subjects that I noticed are very related, the Wisdom hierarchy ( http://www.systems-thinking.org/dikw/dikw.htm) and Bloom's taxonomy (http://www.odu.edu/educ/roverbau/Bloom/blooms_taxonomy.htm). First I need to explain each.

Wisdom hierarchy
This is a hierarchy of how wisdom is obtained and describes the relationship between data, information, knowledge, understanding and finally wisdom.

Data
Data is symbols and signals which can be observed and analysed, but perhaps not be processed and organized.
An example of this is seeing the symbols "3", "×", "4", "=" and "12". Those symbols may not mean anything to you if you don't know arithmetic.

Information
Information is data which is given meaning and use. It answers "what", "where", "who" and "when" questions, that is, simple shallow questions. It is when relationships are formed between the different data and context is given to the data. The data has meaning but perhaps it cannot be used.
So now "3×4=12" has a meaning. It means that if you multiply the numbers 3 and 4, the result is equal to 12. You may know what the symbols mean but you may not be in a form that is useful.

Knowledge
Knowledge is a mass of information which is organized in a way to be useful. It answers "how" questions, that is how can I use the information. The information may be useful but perhaps you don't understand why it is related and how to generate new information from it.
So now we have organized every multiplication of two numbers we learned into a multiplication table. If we want to know what a particular multiplication equals, we know how to do that, we simply look it up our multiplication table. You may know how to multiply numbers together but you may not know why when numbers are multiplied they give a particular number as a result.

Understanding
Understanding is when you understand the knowledge, when you find a pattern to the organization and can use the pattern to generate new information. It answers "why" questions, that is, why is the information organized as it is in the knowledge. The knowledge may be understood but perhaps it cannot be judged and compared with other knowledge.
So now we understand that multiplication is repeated addition. Now we can add to our knowledge new information which is generated from our understanding rather than from the external world (such as having to ask someone). You may understand how to do multiplication but you may not be able to compare different methods to doing multiplication.

Wisdom
Wisdom is when you can pass judgement and make decisions to determine what is the best method to use. The question it could answer is "which" questions, that is, which is best.
We now can decide which method we should use to multiply two numbers, be it by looking up the multiplication table, by repeated addition or by long multiplication.

Bloom's taxonomy
This is a way of categorizing exam questions in a hierarchy such that as you go up the pyramid, the higher the level of thought required to answer the question.

Remembering
Remembering type questions are those that only require the student to remember things, without expecting any understanding.
An example question would be "What does the symbol × represent?".

Understanding
Understanding type questions are those that require the student to know what the things they know actually mean.
An example question would be "Explain what the expression 2×3=6 means in your own words.".

Applying
Applying type questions are those that require the student to be able to use what they know in a situation.
An example question would be "How many apples would you have if you had 2 baskets with 3 apples in each?".

Analyzing
Analyzing type questions are those that require the student to break down a problem into parts and see how they are related to each other.
An example question would be "What is the next number in the sequence 21, 42, 63, __".

Evaluating
Evaluating type questions are those that require the student to justify a decision.
An example question would be "Which multiplication method would you use to multiply 128 by 64 and why?".

Creating
Creating type questions are those that require the student to create something new to the student.
An example question would be "If all you have is the product of the sum and difference of two numbers and one of the numbers, how can you find the other number?".

Together
It is clear that there is a relationship between the two hierarchies. We could say that:
Remembering type questions test the student having memorized data.
Understanding type questions test if the student has derived information from data.
Applying type questions test if the student has developed a useful knowledge from the information and if the knowledge can be readily used.
Analysis type questions test if the student has understood the basis of their knowledge and can derive new information from it.
Evaluating type questions test if the student has obtained any wisdom on the subject and hence can make sound judgement about it.

The last question type, creating, is not covered by the Wisdom hierarchy and perhaps it predicts yet another higher level form of cognition, perhaps called "creativity", which is when you use knowledge, understanding and wisdom together to derive new knowledge, understanding and wisdom, where knowledge provides the raw material to act on, understanding provides the ways to rearrange the knowledge and wisdom guides you into choosing a solution path which is most likely to give good results. Once this is done you will have learned from experience and would have added new knowledge, a deeper understanding of that knowledge together with new ways of using it and you would be able to make better judgement in the future.

Monday, September 19, 2011

A lousy tutorial to C# drag and drop

Drag and drop is a neat way to allow the user the "transfer" data into a winform control. Here's how to enable drag and drop in a windows form in C#:

1. Create your source and target controls. In this case we're using 2 ListViews where the one on the right (called listView1) is the source of the drag and the one on the left (called listView2) is target of the drop. However note that the source of the drag can even be from the windows explorer by drag dropping a file into the control.

2. Set the AllowDrop property of the control which will receive the drop to true.

3. Set up an event in the control which will be dragged from, that senses that a drag has been initiated, such as the ItemDrag event or the DragLeave event and call the DoDragDrop method. The DoDragDrop method is to be passed the data to be transferred (the object you want the receiving control to get).

4. Set up the DragOver event in the control which will be dropped on, to check if the data being dragged over can be accepted and change the cursor to an invalid cursor picture if not. You can check the type of the data being dragged by using the GetDataPresent method.

5. Set up the DragDrop event in the control which will be dropped on, to actually do something with the data once it has been dropped. This event will only fire if the DragOver event did not set Effect to None.

And there you have it. A lousy tutorial. However, here's something worth mentioning:

If you are transferring an object which could be one of several types and you want to view the object as its base type, then this will not work, as the GetData method will just return null if you pass it a base class for a type. The only was I found to get around this was by created a proxy object which will contain the object of interest casted as its base class, and then what you receive the proxy object in the receiving control you just get the object of interest from it. Like so:

Sunday, July 17, 2011

Quicksort partitioning

During my years in school and at university I was always exposed to only one algorithm of quicksort partitioning. Let's take a look at some ways to partition a list for sorting.

Quicksort itself works by taking a list, picking an element from the list which is referred to as the "pivot" and splitting the list into two sub lists, the first containing all the elements smaller than the pivot and the second containing all the elements greater than the pivot. Once you have these two sub lists you can sort each one independently of the other since elements in one list will not be moved into the other after the sort is complete.

This splitting into two lists is called "partitioning". Partitioning once will not sort the list but it will allow you to either use a different sorting algorithm on each sub list (partition) or to recursively partition the two partitions until you end up with a partition of 1 or 0 elements, which is necessarily sorted.

For example, partitioning the list [7,3,6,4,1,7,3] using 4 as a pivot will give us a first partition of [3,1,3] and a second partition of [7,6,7]. The pivot itself, along with other duplicates of it, may or may not go in one of the partitions, depending on how the partitioning is done. If it does not go into one of the partitions, then the sort will place the pivot between the 2 partitions after they have been sorted.

Partitioning by filtering

The most intuitive way to partition is by creating 2 new lists, going through the unsorted list and copying elements from the unsorted list into one of the 2 lists. This is memory expensive however as you end up needing twice as much space as the unsorted list takes. The following partitioning algorithms are "in-place" and hence do not need any new lists.

Partitioning by moving the pivot

This is the partitioning algorithm I was familiar with at school. It's quite intuitive but slow when compared to the next algorithm. The way this works is by putting the pivot into its sorted place, that is, the place where it will be after the whole list has been sorted. All the elements smaller than the pivot will be on its left and all the elements larger than the pivot will be on its right. Therefore you would have created 2 partitions, the left side of the pivot and the right side.

The algorithm uses a pivot pointer which keeps track of where the pivot is and an index pointer which is used to compare the pivot to other elements. The pivot pointer starts by being at the right end of the list (you can choose a pivot and swap it with the last element if you don't want to stick to the element which happens to be there) and the index pointer starts by being at the left end of the list. The index moves towards the pivot pointer until it encounters an element which is not on the correct side of the pivot, upon which the index and the pivot and swapped and the the index pointer and pivot pointer swap locations. Once the index pointer and pivot pointer meet, the pivot is in its sorted location and the left and right side of the pivot are partitions.

Pseudo code:
function partition(arr, left, right)
  pivotPtr = right
  indexPtr = left
  while pivotPtr != indexPtr
    if indexPtr < pivotPtr //if index pointer is to the left of the pivot
      while arr[indexPtr] <= arr[pivotPtr] and indexPtr < pivotPtr
        indexPtr++ //move index pointer towards the pivot
      if indexPtr < pivotPtr
        swap(arr[indexPtr], arr[pivotPtr])
        swap(indexPtr, pivotPtr)
    else //if index pointer is to the right of the pivot
      while arr[indexPtr] >= arr[pivotPtr] and indexPtr > pivotPtr
        indexPtr-- //move index pointer towards the pivot
      if indexPtr > pivotPtr
        swap(arr[pivotPtr], arr[indexPtr])
        swap(pivotPtr, indexPtr)
  return pivotPtr

Partitioning by dividing

In the previous partitioning algorithm, we had to constantly swap the pivot in order to eventually put it in its place. This is however unnecessary as partitioning does not require the pivot to be in its sorted place, only that we have 2 partitions, even if the pivot itself is in one of the partitions (it doesn't matter in which one as it could be eventually placed in its sorted place in either partition).

This time we will not care where the pivot is, as long as we know its value. We will need 2 pointers, a high and a low pointer, which will be moving towards each other. The low pointer will expect to encounter only elements which are smaller than the pivot and the high point will expect to encounter only elements which are larger than the pivot. When both pointers encounter a wrong element, they swap the elements and continue moving towards each other. When they eventually meet, all the elements to the left of the meeting point will be smaller than or equal to the pivot and all the elements to the right of the meeting point will be greater than or equal to the pivot.

Since both pointers will be moving toward each other before swapping, this algorithm will do less swaps than the previous one and hence will be much faster. In fact a simple experiment will show that it does half the number of swaps.


Pseudo code:
function partition(arr, left, right, pivot)
  lo = left
  hi = right
  while lo < hi
    while arr[lo] <= pivot and lo < hi
      lo++ //move low pointer towards the high pointer
    while arr[hi] >= pivot and hi > lo
      hi-- //move high pointer towards the low pointer
    if lo < hi
      swap(arr[lo], arr[hi])
  /*
    Since the high pointer moves last, the meeting point should be on an element that is greater than the pivot, that is, the meeting point marks the start of the second partition.
    However if the pivot happens to be the maximum element, the meeting point will simply be the last element and hence will not have any significant meaning.
    Therefore we need to make sure that the returned meeting point is where the starting point of the second partition is, including if the second partition is empty.
  */
  if arr[lo] < pivot
    return lo+1
  else
    return lo

Thursday, June 2, 2011

Negation Introduction

When I was in university, one of my favorite units was Mathematics of Discrete Structures lectured by Dr Gordon Pace. In propositional logic, we learned about the negation introduction rule of inference which was this:

P => Q, P => ~Q
---------------
     ~P

and which was used like this:

1.  | P   Assumed
... | ...
10. | Q

11. | P   Assumed
... | ...
20. | ~Q
21. ~P   Negation introduction from lines 1-10 and 11-20

In other words, if P implies that Q is true and also P implies that Q is not true, then P must itself not be true.

I never actually understood the logic behind it however. We were told that this how a proof by contradiction works. An example of a proof by contradiction, in words, is this:
If it were raining, the streets would be wet. But the streets are not wet. Therefore it is not raining.

The fact that if a particular proposition is true, it would lead to another proposition which we know is false, implies that the first proposition must itself be false.

The problem with the rule of inference we used for negation introduction is that the way I interpret it linguistically is not at all how a proof by contradiction is usually told. If we had to turn the rule of inference into words, it would be this:
If P implies that something is both true and false, then P is itself false.

A rule of inference which is closer to a proof by contradiction would be this:

Q, P => ~Q
----------
    ~P

which would be used like this:

...
10. Q
11. | P    Assumed
... | ...
20. | ~Q
21. ~P     Negation introduction from lines 10 and 11-20

This is closer to a proof by contradiction.
If P implies something which is not true, then P itself is not true.

The question is, can we use this new rule of inference instead of the usual one?

When can the usual rule of inference be more powerful than the proposed one? Only when both Q and ~Q can ONLY be derived using P. Hence P must be assumed in both cases which cannot be done with the proposed rule of inference.

But this can easily be solved by using the lemma Q /\ ~Q => X and tautologies. If P can derive both Q and ~Q, then we can also derive Q /\ ~Q from which further derive anything we want, including something false. Worst case, if we don't know what to derive which is false, we can derive the negation of a tautology, such as ~(A => A). A => A is a tautology and we can always just insert it in a proof.

So using this trick, we'd do the following:

1.  | P          Assumed
... | ...
10. | Q
... | ...
20. | ~Q
21. | Q /\ ~Q    Conjunction introduction from lines 10 and 20
22. | ~(A => A)  Q /\ ~Q => X lemma from line 21
23. A => A       A => A lemma
24. ~P           Negation introduction from lines 1-22 and 23

And therefore, where ever we can use the usual rule of inference, we can also use the proposed rule of inference.

QED :)

Friday, April 22, 2011

Artificial Intelligence

This is an essay I wrote back when I was in university about artificial intelligence. I decided to put it up in my geek blog. Enjoy! By the way, I later learned about the halting problem which makes my last paragraph impossible to be realized if the program is expected to be right every time.

Artificial Intelligence

Intelligence is the ability to do something without previously knowing how and use the new knowledge on other problems, or rather, to learn without being taught. Someone who is stupid and therefore not intelligent is someone who requires instructions in order to do (apparently) everything, much like a computer. So what is artificial intelligence? How can a computer simulate this process?

Well if this is possible then it would imply the end of programming, since essentially there would only be one program which will learn how to do anything the user wants. So basically an intelligent program has 3 phases: You tell it what you want it to do, it figures out how to do it and finally does it. The program can decide that it cannot do what it was told. However the most intelligent program is that which accepts the most specifications. It can also decide that it does not have the necessary resources to do what it was told. Again, the most intelligent program is the one which does the most things with the resources it has.

Learning requires external information. Therefore the program must be allowed to gather information about the problem and perhaps completed research by other sources. This is best accomplished via the Internet, although elicitation with the user is a must. The most intelligent program is that which makes best use of past knowledge and requiring least new knowledge (the ability to reuse knowledge). Of course if the program does not gather new information and relies too much on past knowledge it will end up being “closed minded” and this is not desirable as it will result in a closed region of knowledge with no new ideas.

Just like a computer is built without knowing what it’s going to be used for, so must be an intelligent program. A truly intelligent program would not be made for specific tasks such as recognising images or playing a game. It should be the most reusable application ever programmed. It should not be simply a part of another program, but be the entire program (except perhaps the interface).

So it seems that artificial intelligence means a program which translates specifications to solutions without the programmer knowing the solution. The key point here is that the programmer doesn’t know the solution (OK, no one in the development process knows it). This facilitates programming since the programmer need not know the solution before writing the program but rather leaves it to the program to find the solution. It’s also great when we as humans still have not found a solution (or an efficient one) to a particular problem.

What may the future hold for AI? As already mentioned, one day there will be a published algorithm for true AI which is able to learn how to solve any solvable problem, given a specification which has a high level of expressiveness. The first to benefit would be the humanoid robots which in turn would benefit humans in a variety of ways, which need not be mentioned. However will this mean the end of all human jobs, skilled and unskilled alike? Probably what will happen is that the developed countries will slow down the development of AI in the market in order to preserve jobs. But it might be possible for undeveloped countries to acquire some intelligent robots (through missionaries for example) which will help in some way or another. Perhaps in the developed world some hard to find professions will be filled in by robots, but I doubt they will be popular. However it can be possible that one day no one will work anymore and communism will take over as financial classes will be eradicated. All work will be done by robots and people will live a leisurely life, receiving provisions and resources equally. I doubt this will be allowed.

One application of AI I would pursue would be that of fabricating assignments. It would accept the specs of the assignment in raw format as given to the student, possibly with the addition of some course notes for reference, and the ID number of the student. The routine would understand the spec and using the ID number will generate a unique assignment with compiled code (if any) and documentation and comments. If I were to manage to realise such a routine, I would charge my fellow class mates to write their assignments, except that I wouldn’t do anything except feed the program the spec and ID number. Since every student will receive a unique assignment there will be no fear of plagiarism and detecting that the assignment was generated would be practically impossible unless the lecturers would obtain a copy of the routine and compare the given work with the generated one. Come to think of it, a uniquely generated id number would be better. The reason why AI makes sense to be used is because of the shear difficulty to find an algorithm which solves the problem.

Another application would be the semantic interpretation of a given code. The program would accept a given piece of code (given that it is accepted by the compiler) and produce a description in simple English what happens when it executes, at a given level of abstraction. Of course this can be extended to the interpretation of a given executable file. If this is possible then viruses and all malware would be easily detected with no need for updates. The other way round would also be nice where given a description in simple English, the program generates annotated code, ready to be compiled.

Thursday, April 21, 2011

JQuery event / manipulation not working

This is just a silly mistake I made which I thought I should share with the internetz. When assigning JQuery events to html elements, be sure to do it only after they have been loaded. If you just put them straight into a script file the elements wouldn't have loaded when it executes and hence the event won't work.

Make sure that all html DOM manipulation is done after the html loads by putting the javascript code inside a

$(document).ready(function() {
   event and manipulation stuff here :D
});

Photoshop slices using divs creates gaps

If you've ever used Photoshop to create websites by drawing them and then slicing out the buttons and if you've ever used divs and css positioning to layout the slicing (done automatically by Photoshop, just follow this) then you probably found out that something like this happens:


This happened to me whilst doing my personal website (www.marctanti.com). The solution was found here. Apparently this happens because when the doctype of the html page is set to strict, images are by default set to display:inline which makes them vertically aligned so that the bottom of the image is in line with where the base line of the text is. In the case of sliced web pages, the slices are placed inside divs so the images are aligned with where the base line of text inside the div would be if there was any. But the divs make room for the "descenders" of text, that is, the extra room needed to display the bottom of the letters 'y' and 'g' for example. Therefore the images will not be aligned with the very bottom of the divs.

So to fix this problem we need to make the images aligned to the very bottom of the containing divs by setting the display css of each slice to block. I gave each slice image a class attribute called slice and then added the following css:

.slice
{
 display:block;
}

That fixed it.

Wednesday, March 30, 2011

Database Normalization (1-3 NF)

This is a tutorial for those who are confused about the normal forms due to the extreme confusion you find on the web about the subject. If you want to know what normalization is and why to do it, wikipedia has a great article detailing this information:
http://en.wikipedia.org/wiki/Database_normalization

1NF
1NF is arguably the most ambiguous and confusing normal form on the web. The first normal form is just about making multi-valued fields organized into multiple rows. There are two types of multi-valued fields in unnormalized tables:

This-

IdStudentSubject 1Subject 2
1HarryCharmsPotions
2RonCharmsPotions

And this-

IdStudentSubjects
1HarryCharms, Potions
2RonCharms, Potions

Or as is shown in most examples, this-

IdStudentSubject
1HarryCharms
Potions
2RonCharms
Potions

In both cases we are tying to shove in a list of values, that is a one-to-many or many-to-many relationship with the row, into the row itself. For reasons detailed in the wikipedia article, this should be fixed by creating a separate row for each value and repeating the values in the others fields, like so:

IdStudentSubject
1HarryCharms
1HarryPotions
2RonCharms
2RonPotions

The table is now in 1NF were it not for the primary key losing its uniqueness. The id field must be unique in order to identify each distinct student. To solve this we can do one of two things:

Either we create an additional key field for the subjects and make the primary key of the whole table be a composite key of both student and subject ids, thus making the composite key unique-

StudIdStudentSubjIdSubject
1Harry1Charms
1Harry2Potions
2Ron1Charms
2Ron2Potions

Or we could just prepare for the other normal forms and start splitting the tables now as is usually done in examples-

StudIdStudentSubjId
1Harry1
1Harry2
2Ron1
2Ron2

SubjIdSubject
1Charms
2Potions

Notice that we still need to make the foreign key in the students table part of the primary key in order to preserve uniqueness. If you're wondering why we didn't use a weak entity (bridge table / junction table), that's because it's done in 2NF.

If we had more than one multi-valued field, we'd just create a Cartesian product, like so:

StudIdStudentSubjIdSubjectTeacIdTeacher
1Harry1Charms1Filius
1Harry2Potions2Slughorn
1Harry2Potions3Snape
2Ron1Charms1Filius
2Ron2Potions2Slughorn
2Ron2Potions3Snape

Resulting in 3 tables:

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

SubjIdSubject
1Charms
2Potions

TeacIdTeacher
1Filius
2Slughorn
3Snape

Complete example:

IdStudentSubjIdSubjectRoomRoomHallTeacIdTeacher
1Harry1Charms101A1Filius
2Potions202B2Slughorn
3Snape
2Ron1Charms101A1Filius
2Potions202B2Slughorn
3Snape

Turns into:

StudIdStudentSubjIdSubjectRoomRoomHallTeacIdTeacher
1Harry1Charms101A1Filius
1Harry2Potions202B2Slughorn
1Harry2Potions202B3Snape
2Ron1Charms101A1Filius
2Ron2Potions202B2Slughorn
2Ron2Potions202B3Snape

Notice that the room is assumed to be dependant on the subject such that each subject is taught in its own room.

We may opt to leave the table as it is as it quite complex to break down into smaller tables without any guidance. However if we are to break the table down, the room information would be included in with the subject table since we said that the room is dependant on it, yielding the following:

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

TeacIdTeacher
1Filius
2Slughorn
3Snape

2NF
2NF is only applicable on tables with composite keys. If a table does not have a composite key, then it is already in 2NF. To make a table in 2NF, first you make sure it is in 1NF and then you split it into separate tables depending on which part of the composite key the fields depend on. For example in the students' table above, the student name does not depend on the subject id, it depends only on part of the composite key, that is, the student id. So the student name field should go into a separate table which describes the students (together with the primary key of course).

StudIdSubjId
11
12
21
22

StudIdStudent
1Harry
2Ron

SubjIdSubject
1Charms
2Potions

And there is it, the weak entity we are so used to in many to many relationships.

Complete example (following from previous):

StudIdStudentSubjIdTeacId
1Harry11
1Harry22
1Harry23
2Ron11
2Ron22
2Ron23

Turns into:

StudIdSubjIdTeacId
111
122
123
211
222
223

StudIdStudent
1Harry
2Ron

Hence yielding:

StudIdSubjIdTeacId
111
122
123
211
222
223

StudIdStudent
1Harry
2Ron

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

TeacIdTeacher
1Filius
2Slughorn
3Snape

Notice that if in 1NF we did not break down the table, we'd result with the same set of tables by now.

3NF
3NF is the normal form we are used to. All we do is check that every field in a 2NF table depends directly on the primary key. If it doesn't or if it depends on a non-primary key field, you place it in its own table. For example if we had the following table:

SubjIdSubjectRoomRoomHall
1Charms101A
2Potions202B

The RoomHall field is directly dependent on the Room field and not on the SubjId primary key field, so the RoomHall field should go into a table on its own together with the Room field. In fact the room where the subject is thought is not a direct property of the subjects entity but is an entity on its own and hence should be separated into a rooms entity and only referenced by a foreign key.

SubjIdSubjectRoom
1Charms101
2Potions202

RoomRoomHall
101A
202B

Complete example (following from previous):

SubjId Subject Room RoomHall
1 Charms 101 A
2 Potions 202 B

Turns into:

SubjId Subject Room
1 Charms 101
2 Potions 202

Room RoomHall
101 A
202 B

Hence yielding:

StudId SubjId TeacId
1 1 1
1 2 2
1 2 3
2 1 1
2 2 2
2 2 3

StudId Student
1 Harry
2 Ron

SubjId Subject Room
1 Charms 101
2 Potions 202

Room RoomHall
101 A
202 B

TeacId Teacher
1 Filius
2 Slughorn
3 Snape

Links
http://portal.dfpug.de/dFPUG/Dokumente/Partner/Hentzenwerke/Visual%20FoxPro%20Certification%20Exam%20Study%20Guide%20Chapter%2002.pdf

Monday, March 14, 2011

Reading Related Tables As Nested Rows

Problem
A problem I used to face with relational databases is how to handle 1-to-many or many-to-many rows in SQL select statements. Let's say you have 2 tables with a many-to-many relationship between them, such as a Movies table and an Actors table. How do I present a list of movies together with their associated actors?

The naive way to do this is by using nested loops with a query for movies in the top loop and a query for actors in the second loop. In PHP it would look something like...

$moviesResult = mysql_query("SELECT id, title FROM movies"); 
while($moviesRow = mysql_fetch_assoc($result)) 
{ 
    echo("<h1>" . $moviesRow['title'] . "</h1>"); 

    echo("<ul>"); 
    $actorsResult = mysql_query("SELECT actors.name FROM actors INNER JOIN movies_actors ON actors.id = movies_actors.actorid WHERE movies_actors.movieid = " . $moviesRow['id'] . ";"); 
    while($actorsRow = mysql_fetch_assoc($actorsResult)) 
    { 
        echo("<li>" . $actorsRow['name'] . "</li>"); 
    } 
    echo("</ul>"); 
}

This approach however will kill your database. Ideally you should minimize the number of queries sent.

Another approach would be to join the movies table with the actors table and then read it with nested loops.

$result = mysql_query("SELECT movies.id AS id movies.title AS title, actors.name AS actor FROM movies INNER JOIN movies_actors ON movies.id = movies_actors.movie INNER JOIN actors ON actors.id = movies_actors.actor"); 
$row = mysql_fetch_assoc($result); 
while($row) 
{ 
    $currMovieId = $row['id']; 
    echo("<h1>" . $row['title'] . "</h1>"); 
    echo("<ul>"); 
    do 
    { 
        echo("<li>" . $row['name'] . "</li>"); 
    } while ($row = mysql_fetch_assoc($result) && $row['id'] == $currMovieId);
    echo("</ul>"); 
}

This works but as soon as you add another table to the join, determining which rows contain new information can be a nightmare, not to mention all the rows which just contain repeated information due to the cartesian product. For example:

IdTitleActorGenre
1The MatrixKeanu ReevesAction
1The MatrixKeanu ReevesAdventure
1The MatrixKeanu ReevesSci-Fi
1The MatrixLaurence FishburneAction
1The MatrixLaurence FishburneAdventure
1The MatrixLaurence FishburneSci-Fi
1The MatrixCarrie-Anne MossAction
1The MatrixCarrie-Anne MossAdventure
1The MatrixCarrie-Anne MossSci-Fi
2The Matrix ReloadedKeanu ReevesAction
2The Matrix ReloadedKeanu ReevesAdventure
2The Matrix ReloadedKeanu ReevesSci-Fi
2The Matrix ReloadedLaurence FishburneAction
2The Matrix ReloadedLaurence FishburneAdventure
2The Matrix ReloadedLaurence FishburneSci-Fi
2The Matrix ReloadedCarrie-Anne MossAction
2The Matrix ReloadedCarrie-Anne MossAdventure
2The Matrix ReloadedCarrie-Anne MossSci-Fi
3The Matrix RevolutionsKeanu ReevesAction
3The Matrix RevolutionsKeanu ReevesAdventure
3The Matrix RevolutionsKeanu ReevesSci-Fi
3The Matrix RevolutionsLaurence FishburneAction
3The Matrix RevolutionsLaurence FishburneAdventure
3The Matrix RevolutionsLaurence FishburneSci-Fi
3The Matrix RevolutionsCarrie-Anne MossAction
3The Matrix RevolutionsCarrie-Anne MossAdventure
3The Matrix RevolutionsCarrie-Anne MossSci-Fi

Are we supposed to receive all those rows just to display the below?

1The MatrixKeanu Reeves, Laurence Fishburne, Carrie-Anne MossAction, Adventure, Sci-Fi
2The Matrix ReloadedKeanu Reeves, Laurence Fishburne, Carrie-Anne MossAction, Adventure, Sci-Fi
3The Matrix RevolutionsKeanu Reeves, Laurence Fishburne, Carrie-Anne MossAction, Adventure, Sci-Fi

Solution
The best solution I found is a hybrid of the above 2 solutions. You make a different query for each table and then read all the rows returned by each query, placing the information in the list it belongs to.

$moviesResult = mysql_query("SELECT id, title FROM movies;"); 
$actorsResult = mysql_query("SELECT movies_actors.movieid AS movieid, actors.name AS name FROM actors INNER JOIN movies_actors ON actors.id = movies_actors.actorid;");
$genresResult = mysql_query("SELECT movies_genres.movieid AS movieid, genres.name AS name FROM genres INNER JOIN movies_genres ON genres.id = movies_genres.genreid;"); 

$actorRow = mysql_fetch_assoc($actorsResult); 
$genreRow = mysql_fetch_assoc($genresResult); 
while($movieRow = mysql_fetch_assoc($moviesResult)) 
{ 
    $currMovieId = $movieRow['id']; 
    echo("<h1>" . $movieRow['title'] . "</h1>"); 

    echo("<ul>"); 
    while ($actorRow && $actorRow['movieid'] == $currMovieId) 
    { 
        echo("<li>" . $row['name'] . "</li>"); 
        $actorRow = mysql_fetch_assoc($actorsResult); 
    } 
    echo("</ul>"); 

    echo("<ul>"); 
    while ($genreRow && $genreRow['movieid'] == $currMovieId) 
    { 
        echo("<li>" . $row['name'] . "</li>"); 
        $genreRow = mysql_fetch_assoc($genresResult); 
    } 
    echo("</ul>"); 
}

In this way we only make 3 queries, equal to the number of tables, and we only read as many rows as needed.

Sunday, March 13, 2011

SQL Joins Tutorial

This is a simple tutorial aimed towards those who, like me, completely misunderstood how to, and why to, use joins in SQL. It is not meant to be an advanced reference but a tutorial for beginners who know some SQL but can't understand joins.

Huh?
Let's say that you have a database with tables which are related by foreign keys. For example, a Customers table, a Products table and an Orders table. Each row in the Orders table is linked to a row in the Customers table (the customer who made the order) and to a row in the Products table (the product which was ordered).

Now let's say that you want to query the database to view all the rows in the Orders table but with the customer's name and product's name instead of their primary key. If you learned SQL the same way I did, you'd probably do something like this:

SELECT orders.date, customers.name, products.name
FROM orders, customers, products
WHERE orders.customer = customers.id AND orders.product = products.id;

This is called an implicit join. In fact you are doing an unconditioned join between all 3 tables and then just filtering out the joined rows which are not made of related table rows. What does this mean?

If our 3 tables contain the following data:

Orders:
IdDateCustomerProduct
101/01/0112
22/02/0232
303/03/0333

Customers:
IdName
1John
2Michael
3Terry

Products:
IdName
1Cheese
2Parrot
3Spam

Then the query will cause this to happen:

Orders.DateCustomers.NameProducts.Name
01/01/01JohnCheese
01/01/01JohnParrot
01/01/01JohnSpam
01/01/01MichaelCheese
01/01/01MichaelParrot
01/01/01MichaelSpam
01/01/01TerryCheese
01/01/01TerryParrot
01/01/01TerrySpam
02/02/02JohnCheese
02/02/02JohnParrot
02/02/02JohnSpam
02/02/02MichaelCheese
02/02/02MichaelParrot
02/02/02MichaelSpam
02/02/02TerryCheese
02/02/02TerryParrot
02/02/02TerrySpam
03/03/03JohnCheese
03/03/03JohnParrot
03/03/03JohnSpam
03/03/03MichaelCheese
03/03/03MichaelParrot
03/03/03MichaelSpam
03/03/03TerryCheese
03/03/03TerryParrot
03/03/03TerrySpam

And after generating all that it will then select the rows which make sense relation wise, according to the WHERE statement.

Orders.DateCustomers.NameProducts.Name
01/01/01JohnParrot
02/02/02TerryParrot
03/03/03TerrySpam

The big table is called a Cartesian product, which means that all possible combinations of rows between the tables are created, yielding a number of rows equal to the multiplication of the number of rows in each table (3 * 3 * 3 = 27 in this case). As you can tell, this will get really huge really quickly as the table get bigger.

Enter explicit joins.

Joins
In SQL, the FROM statement doesn't mean "a list of tables used in the SELECT statement". In the FROM statement you mention just one table. This table could either be one of the tables you created in the database or a "custom" table, such as one which is a joined version of several tables. The JOIN statement in SQL is not a keyword on par with the FROM statement, as I used to think due to the way it is indented in most tutorials I've seen. It is in fact a binary operator between 2 tables, just like "+" and "=" as binary operators.

The JOIN operator takes 2 tables and returns a new table, which is a joined version of the 2 tables. You can then join another table to that new table and keep on adding more tables to the "composite" table. There are several joins available in standard SQL which will be described later, but if we should use the INNER JOIN type in an example, this is how joins are used in general:

SELECT orders.date, customers.name, products.name
FROM (orders INNER JOIN customers ON orders.customer = customers.id) INNER JOIN products ON orders.product = products.id;

As you can see, the join is used between tables and can even be bracketed in order to state which tables should be joined up first. The ON statement is used to immediately join up the rows which are related, avoiding the cartesian product table in the first place. This will therefore be processed much more efficiently as the third table will only be joined up after the first two tables have been joined correctly, rather than joining everything together without a condition.

In the implicit join example, the tables where joined up with an inner join without the ON condition, hence losing all the advantage of joins. Now that you know how joins work, let's see what are the differences between the 4 join types described in w3schools.com.

Join types
The main difference between the join types is how they handle rows which don't match the ON condition.

INNER JOIN:
Strictly follows the ON condition between tables such that any rows in one table which cannot be matched up with another row in the other table will be left out. This is the kind of join we are used to.

SELECT orders.date, customers.name, products.name
FROM (orders INNER JOIN customers ON orders.customer = customers.id) INNER JOIN products ON orders.product = products.id;

results in

Orders.DateCustomers.NameProducts.Name
01/01/01JohnParrot
02/02/02TerryParrot
03/03/03TerrySpam

As you can see, not all customers or all products where mentioned. Only the ones which were mentioned in the ORDERS table and hence were related in some way were returned.

OUTER JOIN:
This is where things get a bit hairy and hence require that you make an effort to understand. The outer join makes sure that all rows in both tables are returned, even if there is no matching row in the other table. What happens to those tables which have no matching rows? They are joined to NULL values. As long as they are returned, it doesn't matter what they're joined to right?

You can also use LEFT JOIN and RIGHT JOIN to say whether you want to return all the rows of the table on the left of the join operator only or all the rows of the table on the right of the join operator only.

Now in order to use these joins to create a complex joined table, you have to be sure that a join will not return columns with NULL values which will be used in an ON condition of another join.

SELECT orders.date, customers.name
FROM orders RIGHT JOIN customers ON orders.customer = customers.id;

results in

Orders.DateCustomers.Name
01/01/01John
NULLMichael
02/02/02Terry
03/03/03Terry

SELECT orders.date, products.name
FROM orders RIGHT JOIN products ON orders.product = products.id;

results in

Orders.DateCustomers.Name
NULLcheese
02/02/02parrot
01/01/01parrot
03/03/03spam

SELECT orders.date, products.name, customers.name
FROM (orders LEFT JOIN customers ON orders.customer = customers.id) RIGHT JOIN products ON orders.product = products.id;

results in

Orders.DateCustomers.NameProducts.Name
NULLNULLcheese
02/02/02terryparrot
01/01/01johnparrot
03/03/03terryspam

As you can see, depending on how the outer joins are used, we can make a particular table return all its rows, even if it isn't related to any other table.

Final note
I'd like to leave a few notes and links before concluding.

First of all, what I said about the implicit join not being efficient next to an explicit one is not necessarily true since the SQL optimizer may fix it before executing it.

Secondly, if you can avoid using brackets in the the joins it would be better as that will allow the SQL optimizer to make adjustments to the query without it having to make sure to preserve the precedence which you unneccessarily imposed.

In a future post, I will describe a trick which I used to efficiently view multiple one-to-many related tables which I also had trouble with in SQL until recently. But for now I will leave you with these links:

http://www.w3schools.com/sql/sql_join.asp
http://onlamp.com/pub/a/onlamp/2004/09/30/from_clauses.html
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

Sunday, February 13, 2011

Generic Genetic Algorithms

I have finally finished my C# generic genetic algorithm library which is based on my university thesis "A Generic Genetic Algorithm using Phenotype Building Functions" (http://www.scribd.com/doc/48697452/A-Generic-Genetic-Algorithm-using-Phenotype-Building-Functions). You can find it at http://code.google.com/p/genericga/. The wiki page explains how to use it.