This week we learned about sorting. There are many different ways to sort a list, some more efficient than others. It's no coincidence that our knowledge of complexity can be applied to sorting.
There are many different ways to sort a list: selection sort, insertion sort, quicksort, and mergesort.
Selection sort works by finding the smallest item in a the unsorted sublist of the list and switching it with the item and the current position.
For example, if we had a list of [4, 5, 2, 1]
It would start by looking at the whole list, finding the smallest number and then switching it with the first number:
1, 5, 2, 4
Then it would look at the list from index 1 to 3:
And repeat the process:
1, 2, 5, 4
Look at the list from index 2 to 3:
1, 2, 4, 5
Insertion sort works by taking one element from the list and placing it in the correct place in the sorted sublist of the list.
For example, if we had the same list of [4, 5, 2, 1]
It would compare the first two numbers and switch them if necessary, here they are already sorted:
4, 5, 2, 1
Then it would compare the 2 to the 5 and to the 4, and correctly insert it in to the sorted part of the list:
2, 4, 5, 1
Then insert the 1 in to the right place:
1, 2, 4, 5
Both insertion sort and selection sort are O(n^2), which isn't all that efficient.
Quicksort is much faster than insertion and selection sort, but in a worst case scenario, it is also O(n^2).
I originally had some trouble understanding quicksort. Quicksort relies on recursion, and decides on a *pivot*. A pivot is supposed to help divide the list into two sublists, and recursion is carried out on those two lists. How do we decide on a good pivot? Does it matter what pivot we choose? If we choose the left most element for the pivot, it would result in the worst case scenario. For shorter lists, it would be okay to choose a random pivot, because the list is small and choice of pivot won't really affect the efficiency. On the other hand, if we had a really large list, choosing a random index, might divide the list quite unevenly, so it would be better to choose the median of the list.
Sunday, March 30, 2014
Saturday, March 22, 2014
How Complex is Complexity?
This week we learned about complexity. Prior to learning about complexity, the name complexity itself would perhaps suggest that it is a complex subject (because who wouldn't want to explicitly state the difficulty of a particular topic in the subject name itself.) It turns out its focus is not to scare computer science students into thinking it is an extremely hard topic of the course, but it is actually to determine how complex certain functions are, ie their difficulty/efficiency.
In lecture, we wrote a function that was to return all the anagrams of a word, or a sequence of letters, since it didn't necessarily have to be a word itself. There were many approaches, some more efficient than others. One approach was to come up with all the permutations of the sequence of letters, and then see which permutations were actual words in the dictionary. This obviously is not very efficient considering the fact that if we had a sequence of letters that was 7 letters long, it would result in 5040 permutations. Imagine how long it would take for a sequence of letters longer than that. The other approach was the signature approach, which I thought was incredibly clever! The signature approach takes into account the fact that all anagrams have the same letters, so if the sequence was to be arranged in a specific way, it would be clear which words would be anagrams. So, as far as efficiency is concerned, this would be much better.
We know a function is more efficient than another function when its runtime is relatively shorter than that of another function. But there are other factors, such as the computer itself, and the compiler... Although I would have to ask what exactly a compiler is? It turns out that the compiler is the reason all our programming code actually works, the reason it actually produces any results. It is basically a program that takes a certain programming language, such as python, as input, and turns it into code that the computer's processor uses to actually process the code we typed up. If we wanted to see if a function is efficient or not, we would have to see how the function reacts as its input increases. We learned in lecture that step is defined as a computation that is done in a fixed amount of time. So then to determine a function's efficiency, we only need figure out how many steps a function takes according to its input size.
Of all the subjects that we've learned so far in the course, complexity really challenges what we know. What I mean by that is, it is not just enough to be able to write code that works, you should write code that is efficient. This can often times prove to be quite challenging. So I guess complexity is complex afterall.
In lecture, we wrote a function that was to return all the anagrams of a word, or a sequence of letters, since it didn't necessarily have to be a word itself. There were many approaches, some more efficient than others. One approach was to come up with all the permutations of the sequence of letters, and then see which permutations were actual words in the dictionary. This obviously is not very efficient considering the fact that if we had a sequence of letters that was 7 letters long, it would result in 5040 permutations. Imagine how long it would take for a sequence of letters longer than that. The other approach was the signature approach, which I thought was incredibly clever! The signature approach takes into account the fact that all anagrams have the same letters, so if the sequence was to be arranged in a specific way, it would be clear which words would be anagrams. So, as far as efficiency is concerned, this would be much better.
We know a function is more efficient than another function when its runtime is relatively shorter than that of another function. But there are other factors, such as the computer itself, and the compiler... Although I would have to ask what exactly a compiler is? It turns out that the compiler is the reason all our programming code actually works, the reason it actually produces any results. It is basically a program that takes a certain programming language, such as python, as input, and turns it into code that the computer's processor uses to actually process the code we typed up. If we wanted to see if a function is efficient or not, we would have to see how the function reacts as its input increases. We learned in lecture that step is defined as a computation that is done in a fixed amount of time. So then to determine a function's efficiency, we only need figure out how many steps a function takes according to its input size.
Of all the subjects that we've learned so far in the course, complexity really challenges what we know. What I mean by that is, it is not just enough to be able to write code that works, you should write code that is efficient. This can often times prove to be quite challenging. So I guess complexity is complex afterall.
Saturday, March 8, 2014
Hunting High and Low (in Trees, That Is)
This week we learned about trees, yet again. But not just trees, binary search trees. We already know what a binary tree is. It's a tree where each node can have a maximum of two children. So then what is a binary search tree? A binary search tree helps us to find values efficiently in a set of data, without going through the whole tree/data! Well, how would that work? It works because of the way that binary search trees are organized. A binary search tree is organized so that everything less than the node is the left child/subtree of that node, and everything greater than the node is the right child/subtree of that node, starting at the root node. Binary search trees are greatly efficient when each node has two children. This makes sense because each time, half the tree is being is eliminated. But even though a binary search tree is supposed to be efficient, it doesn't always end up that way. For example, cases where each node has a left child, or where each node has a right child, end up being the same as going through the entire set of data, which is what we don't want because it isn't efficient!
It's clear why a binary search tree is an efficient way of searching through a set of data, but wouldn't the set first need to be set up as a binary search tree in order to search through it efficiently? If that is the case, how efficient would the process of converting a set of data to a BST be? If given a list of of data, one would simply insert the first element in to the tree as the root, if the next item was greater, the first item would become the left subtree of the new root and so forth and of course this would be done recursively. So it doesn't seem like it would be too hard!
It's clear why a binary search tree is an efficient way of searching through a set of data, but wouldn't the set first need to be set up as a binary search tree in order to search through it efficiently? If that is the case, how efficient would the process of converting a set of data to a BST be? If given a list of of data, one would simply insert the first element in to the tree as the root, if the next item was greater, the first item would become the left subtree of the new root and so forth and of course this would be done recursively. So it doesn't seem like it would be too hard!
Sunday, March 2, 2014
Linked Lists... again... and again!
So far everything we've learned after recursion has close ties with recursion. Coincidence? Probably not. As the title suggests, linked lists are quite closely related to recursion, similar to trees. Initially, I did not understand the purpose of linked lists, what exactly was their purpose? But again, thinking back to an earlier lab we had done, it all became very clear. Thinking back to that lab, we needed to implement the Queue class in a way so that the time would not be dependent on the size of the queue, or in this case, list, since we were using a list to implement the Queue class. The answer to this is linked lists. A linked list contains a head node, which has the value, and the rest contains a reference to another linked list. This is similar to the nodes and references representation of trees that we learned not so long ago. My only real question about linked lists would have to be how they actually help to keep the running time constant? Since linked lists contain references to linked lists, it is obvious that it would be easy to insert and delete, since you are just removing that reference not the actual element. Looking into the matter, I found that with linked lists you start with space for just one element allocated (locating a block of unused memory of sufficient size). And then you can simply add on more elements without the need for copying or relocating, hence the reduction of time.
This week's lab proved to be quite challenging, most likely because of the lack of preparation, also in part due to the stress of the midterm. Since the lab proved to be challenging, it is clear perhaps the lab should be redone, and the notes on linked lists should be reread.
The midterm was quite fair, it wasn't too hard, or too easy. And, although I studied quite a bit for a it, I found that I didn't do as well as I had hoped to do. This is a clear indication that next time I should prepare a study schedule in advance and instead of going over terminology, should do harder practice problems that will really require me to think.
This week's lab proved to be quite challenging, most likely because of the lack of preparation, also in part due to the stress of the midterm. Since the lab proved to be challenging, it is clear perhaps the lab should be redone, and the notes on linked lists should be reread.
The midterm was quite fair, it wasn't too hard, or too easy. And, although I studied quite a bit for a it, I found that I didn't do as well as I had hoped to do. This is a clear indication that next time I should prepare a study schedule in advance and instead of going over terminology, should do harder practice problems that will really require me to think.
Subscribe to:
Posts (Atom)