Background
For the past week, I have been working on the String Calculator Kata in Python. I watched Gary Bernhardt's version of it and then set out to tackle it myself. Not surprisingly, my solution was much like Gary's.Of course, watching a master was a bit of a cheat. I hadn't really figured it out. I just saw what he did and mimicked it. I certainly understood what he was doing and I understood the solution I subsequently developed, but I skipped a very important part of the process; the discovery and real learning.
My wife has taken an interest in software development and is considering a possible career change. Along with reading books like "Apprenticeship Patterns", "The Inmates are Running The Asylum", and "Hackers and Painters", she is learning Python.
I decided that the String Calculator Kata would be a good exercise for us to pair on. I could show her some tricks and help her along. This would be a great learning exercise for the both of us.
Kata
Take One
Our first attempt was over the phone with a screen share. I did all the talking and typing while she watched, listened, and provided me with the occasional, "uh-huh" to let me know she was still there. All in all, I'd say it was a fair experience at best. In my opinion, I had shown her quite a bit, but had actually taught her nothing. She wasn't going to benefit from doing a Kata by rote. She needed to understand how the moves came to be and why they were being done.This was going to take more time.
Take Two
Our second attempt was in person, side by side, sharing a single computer. I still did most of the talking and all of the typing, but the dynamic was very different. Jennifer was able to ask questions, point at the screen, and interact unlike with the screen share. And I took a much more deliberate approach to the Kata. Making no assumptions about the language; acting as if I were learning it all for the first time as well.This was a good learning experience for both of us. I learned new things about Python and I felt connected to the developer I once was; the guy who would explore a language for the best way to do something; the guy who would spend hours on the same few lines of code in an effort to get them perfect. I've been doing Kata on and off for about a year with a recent deliberateness. But it wasn't until I used it to teach someone else that I achieved the next level of enlightenment.
After some more thought, I decided to recount the entire experience on my blog. I've already discussed Kata in a couple of posts. The string calculator can be found in Scheme and Python on my github. But none of these capture my thought process. I am not saying my thought process is right. Hell, I have no idea. But I think sharing it and soliciting feedback can't hurt.
The full experience
We start, of course, with a single test.calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('')
This fails as expected. And we write our add method.
calc.py
def add(input_string): return 0
We add the next test
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1')
Which forces the change:
calc.py
def add(string): if string: return 1 else: return 0
But this doesn't cover any single number, it covers a particular single number. So let's update the test.
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10')Our code fails again and we need to figure out a way to return the value of a string. Naturally, I asked the googles. They don't know anything themselves, but they always know somebody who knows something.
A tiny bit of poking around, and I found there is a string.atoi(s[, base]) function, but it was deprecated in verison 2.0 in favor of the built-in int() function. This is what we are looking for:
calc.py
def add(string): if string: return int(string) else: return 0Now the test passes, but I want to get cute. I want to eliminate the if/else. I happen to know Python has lambda capability. So I do some research and refactor to:
calc.py
def add(string): return string and int(string) or 0Cute. Yep.
Back to business. We've covered the first couple of criteria, but we now need to handle more than one item. So we write the test (of course).
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10') def test_multiple_numbers_returns_sum(): assert 3 == add('1,2')
The familiar taste of failure leads us in search of an answer. How do we get the values out of a string? As it turns out, str is a sequence type in Python. So we can refer to its elements fairly easily.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return int(string[0]) + int(string[2])This gives a string index out of range error on our earlier passing tests. This won't do. We need to refer to the individual values only if there is a comma in the string. Looking back at our reference for sequence types, we find the in operator, which is perfect for this case.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): if ',' in string: return int(string[0]) + int(string[2]) else: return int(string)This works. Let's see how far we can get using lambda expressions (for fun). Refactor to lambda:
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return ',' in string and int(string[0]) + int(string[2]) or int(string)But what happens when the integers are more than a single digit? This calls for a new test.
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10') def test_multiple_numbers_returns_sum(): assert 3 == add('1,2') assert 15 == add('10,5')This fails with a ValueError. It appears to be trying to get the int() value of a comma. So our hard-coded string offsets won't do. Then what else can we do here?
Perusal of the standard string methods, leads us to the built-in partition method that was added in version 2.5. If it was recently added, it must be what we need....
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string_parts = string.partition(',') return int(string_parts[0]) + int(string_parts[2])And the tests pass. But my wife points out this won't work for strings with more than two digits.
Hmmmm....
Let's confirm the assumption.
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10') def test_multiple_numbers_returns_sum(): assert 3 == add('1,2') assert 15 == add('10,5') assert 25 == add('10,3,12')She was right. We get another ValueError. This time we are trying to get the int() of '3,12'. I see an immediate solution that should work. A little recursion and we are good to go.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string_parts = string.partition(',') return int(string_parts[0]) + _sum_string(string_parts[2])And once again, our tests pass.
At this point, I am actively holding myself back. I don't like this form of recursion. If I'm going to do it, I want it all contained in a single function. I really want to use split(), map(), and sum(), but I must remember that we've not discovered these yet. We are solving the problem with what we currently "know".
Roy's first two requirements down, now we move on to custom delimiters; starting with a new-line as a delimiter.
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10') def test_multiple_numbers_returns_sum(): assert 3 == add('1,2') assert 15 == add('10,5') assert 25 == add('10,3,12') def test_newline_is_a_delimiter(): assert 6 == add('1\n2,3')Now we get a ValueError on an attempt to get int() for '1\n2'. Let's update the code to get this to pass.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string = _normalize_delimiters(string) string_parts = string.partition(',') return int(string_parts[0]) + _sum_string(string_parts[2]) def _normalize_delimiters(string): return string.replace('\n', ',')We are back to green. We are moving on to the fourth requirement; custom delimiters. This requirement reads as follows:
- Allow the Add method to handle a different delimiter:
- to change a delimiter, the beginning of the string will contain a separate line that looks like this: “//[delimiter]\n[numbers…]” for example “//;\n1;2” should return three where the default delimiter is ‘;’ .
- the first line is optional. all existing scenarios should still be supported
calc_test.py
import nose.tools from calc import add def test_empty_string_is_0(): assert 0 == add('') def test_single_number_returns_value(): assert 1 == add('1') assert 10 == add('10') def test_multiple_numbers_returns_sum(): assert 3 == add('1,2') assert 15 == add('10,5') assert 25 == add('10,3,12') def test_newline_is_a_delimiter(): assert 6 == add('1\n2,3') def test_allow_custom_delimiter(): assert 3 == add('//;\n1;2')And now the code.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string = _normalize_delimiters(string) string_parts = string.partition(',') return int(string_parts[0]) + _sum_string(string_parts[2]) def _normalize_delimiters(string): if string.startswith('//'): string_parts = string.partition('\n') delimiter = string_parts[0] delimiter = delimiter[2:] string = string_parts[2].replace(delimiter, ',') return string.replace('\n', ',')This fails, a bit unexpectedly. What went wrong here? The custom delimiter is not being parsed.
A little inspection and we realize that _sum_string() is to blame. Or more accurately, we see that we need to normalize the delimiters earlier in the process. So we move that call from _sum_delimited_string() into _sum_string(), which seems to make more sense anyway.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): string = _normalize_delimiters(string) return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string_parts = string.partition(',') return int(string_parts[0]) + _sum_string(string_parts[2]) def _normalize_delimiters(string): if string.startswith('//'): string_parts = string.partition('\n') delimiter = string_parts[0] delimiter = delimiter[2:] string = string_parts[2].replace(delimiter, ',') return string.replace('\n', ',')We are green again, but I don't think I can take it anymore. Look at all those [n] references in the code. And while I am not terribly concerned about it, the recursion could give us trouble on huge strings. There has to be a better way to do this.
So let's go back to the documentation and see if there is anything else we can use to assist us along. One of our problems is the partition breaks the string into three chunks. Maybe something that gives us just the values and strips out the delimiter. Or maybe something that breaks it into multiple chunks.
We discover split(). This appears to have some promise. It can chunk up the entire string into a list. That's nice. Maybe we can ditch the recursive call.
First round, let's get rid of the partition() call and replace it with split().
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): string = _normalize_delimiters(string) return ',' in string and _sum_delimited_string(string) or int(string) def _sum_delimited_string(string): string_parts = string.split(',', 1) return int(string_parts[0]) + _sum_string(string_parts[1]) def _normalize_delimiters(string): if string.startswith('//'): delimiter, string = string.split('\n', 1) delimiter = delimiter[2:] string = string.replace(delimiter, ',') return string.replace('\n', ',')But this didn't seem to do anything for us. The code doesn't really look any better. And it still "feels" messy. We've two different functions that both claim to sum a string, but only one of them actually does. And these functions call one another. Let's push the responsibility back into a single function and see if we can clean this up.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): string = _normalize_delimiters(string) if ',' in string: string_parts = string.split(',', 1) return int(string_parts[0]) + _sum_string(string_parts[1]) else: return int(string) def _normalize_delimiters(string): if string.startswith('//'): delimiter, string = string.split('\n', 1) delimiter = delimiter[2:] string = string.replace(delimiter, ',') return string.replace('\n', ',')That is better. But isn't that recursive call really just iterating over the list and converting everything to an int() and then adding them up? Looks a bit like a Scheme function with string_parts[0] and string_parts[1] in place of car and cdr. I know this is crying out for a map, but how do I explain this to Jennifer? How do we naturally stumble upon the solution?
So we focus in on the split() function. What does this do? It returns a list of the words in a string. So we dig into lists a bit to learn what they are all about. And we find there are a few functions that are very useful with lists: filter(), map(), and reduce(). Of these, map() and reduce() look like they might do us some good.
We can use map() to convert our list of strings to a list of integers. Then, we can use reduce() to add them all up. Sound good? Yes. Yes it does. But when we check the syntax on reduce(), we see a note that says to use the sum() function for adding items in a list.
And we've arrived at our intended destination.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): string = _normalize_delimiters(string) if ',' in string: return sum(map(int, string.split(','))) else: return int(string) def _normalize_delimiters(string): if string.startswith('//'): delimiter, string = string.split('\n', 1) delimiter = delimiter[2:] string = string.replace(delimiter, ',') return string.replace('\n', ',')That looks a little better. And all tests still pass. Now, let's refactor some more to make it a bit more terse.
calc.py
def add(string): return string and _sum_string(string) or 0 def _sum_string(string): string = _normalize_delimiters(string) return ',' in string and sum(map(int, string.split(','))) or int(string) def _normalize_delimiters(string): if string.startswith('//'): delimiter, string = string.split('\n', 1) string = string.replace(delimiter[2:], ',') return string.replace('\n', ',')
We decided this was a good breaking point for the session. We'd covered a lot of new material in a relatively short period of time.
Tonight, we do it again, starting over from scratch. And I suspect we will skip some of the steps as we already know map() is our desired route.
solution set calculator
No comments:
Post a Comment