Adventures of a Protoss in Seattle: September 2011

Saturday, September 17, 2011

Why node.js? And Why not?

I've been thinking about this recently as I've been coming down from my node.js high (which fluctuates depending on my mood).

The node.js high is the thrill of two dreams colliding in an orgy of programming goodness (1) sharing code between client and server in a meaningful way and (2) a opinionated stance on IO without threads.

Sharing is caring
For those among us that have dealt with the web, it is very frustrated to have to replicate some business logic and templates between the front-end and the back-end. We have to do this to deliver the best possible customer experience. I've spent a lot of time trying to figure how to make good "write once" Ajax solutions. If you go 100% Ajax, then you fuck up SEO. If you go 100% Back-end, then you realize a less than ideal user experience. If the programming the language is the same down to how data is validated, transformed, and rendered, then the back-end and front-end can share a great deal of business critical code. I believe in the possibility to deliver a 100% "SEO" friendly and Awesome user experience.

I feel this current division between SEO (what most businesses needs to thrive) and UX (what makes it easier and fluid to use) is difficult to execute on. This is where the general idea of "Server Side JavaScript" delivers some very interesting ideas and potential. While I haven't seen the best platform yet, there is an opportunity to wow us.

Threads are Hard
I like multi-threaded programming for the intellectual challenges it poses, but I feel like they are mostly a waste for a single core. A single CPU can be much better managed the way node.js does it with an event loop (or you can use libev or libevent like a real programmer, yes, yes). Once you get to multi-cores thou, you a really talking about a distributed algorithm. The same intellectual challenges found in multi-threaded programming are isomorphic to those found in distributed algorithms (just with higher latency).

Hard is bad. I like hard because I'm that nerd in math class that does math problems for fun. Most people want to go home to their kids, play a video, and then get laid after eating a sandwich. I go home to do math (which baffles my wife).

So, anything that keeps threads away and those hard problems are bay can be seen as nice. I'll be honest, node.js delivers well on this front. I've done some very impressive things with node.js that would make me shutter with threads, and then I load test it; it makes me happy. Out of the box, I get a nice way to express data flow with continuations. It's awesome. No race conditions (unless I'm doing something distributed), No deadlocks, No locks. It just works.

node.js feels like an Apple product.

Coming Down from the High
JavaScript was not designed for enterprise servers. I'll address this on two fronts, and I'll skip the 1.9GB limit because that's just a detail that real programmers can fix and then submit a patch.

The first front is that continuation passing of things that may leak has no safety in terms of releasing resources. If I intend to perform some IO and expect the function to hand off control to the continuation I pass, then I would like a guarantee that the continuation is going to be called.

This is a huge hole. Code written in node.js has to trust node.js 100% since that code surrenders its own control flow. Instead of crashing, node.js can silently hang customers which is not good at all. Having experience this issue more than once, building on node.js now feels like building on sand. Sure, I can go and fix the platform, but it feels like adding duct tape. I'm not sure if trusting node.js 100% is harder than threads or not.

The second front is type safety. I'll give some background about myself. I started in C which I'd call "moderately typed" because you do stupid things all the type. Then I moved up to C++ which was a little bit better. Then I did a bit of Java, and hated it. Then tried C# 1.0, it was OK. Then I did OCaml, and I loved it. I dabbled in Erlang, skipped it (it didn't make me happy). Did a bunch of C# 3.0+, and I loved it. Did tons of PHP and JavaScript, it was ok.Wrote some ruby junk, it didn't make me happy. I fell in love with Lisp and building things with AST. I found myself content with JavaScript, and I was riding the node.js high. Now I work in Java, and my side-projects are a combination of Java and JavaScript. node.js is my new Perl, and I love it for that. I'd prefer C#, but I don't want to spend $ for Visual Studio nor do I want to do C# without an IDE.

Having given the litany of my past, I have no investment in either static typing or dynamic typing. Either are fine. However, type safety is very important to me these days. I want to use type safety as way of expressing both the ways things should work and the ways things misbehave. Things misbehave all the time for valid reasons.

I hate the way Java's try/catch let forces you do a bunch of silly things, but I love the idea that I can use it force myself and others to make sure they considered every case. I hate null because it lets you fuck up so much up. This is one reason I love OCaml's union types. I don't have to program with null. I can give every output of my function a meaningful value that forces the consumer to deal with. I've considered resurrecting my node.ocaml project to work on this, but I'd need to write a highly durable IO story for some of my work which I don't really want to do yet.

Future
There are a bunch of reasons I want to create yet another programming language, mix the things I want in OCaml, make it asynchronous so it compiles to finite state machines, remove null, make it easy to deal with error, make it pretty, add control structures for handling failure, build a better type system, better control over the heap, hook it up to the cloud, make deployment painless.

The problem with the directions I want to go is that the advances I want are marginally useful and expensive in the grand scheme of things. The things I want are better, but we live in a world which values "worse is better". When I put on the business man hat, then I agree with this statement. When I put on my academic beautiful code hat, then this statement makes me want to go into my cave and be a hermit.

node.js is worse, but that makes it better. The problems I have now will be patched. After enough investment, people will consider it reliable. People will build reliable libraries. Yet, I want to build stuff and not the platform anymore. While node.js is my new perl replacement (and it's fucking awesome at that), doing hard-core server side stuff for me is going on the back-burner.

But I wonder, are the gains there real or marginal? I don't know. I do know that node.js (like ruby was) is a risk that could either pay off big (as ruby has) or fizzle out. It depends on you.

Saturday, September 10, 2011

Why No OCaml?

I just saw Kevin Murphy's "Why OCaml?" appear on Hacker News, so I'd thought I'd provide some counter-points to the nine year old article.

Popularity: I can't hire people to build stuff with OCaml, nor are there many non-finance jobs that pay decent enough to work with OCaml. Besides, JavaScript is kind of sexy and Java is getting closures.

Type Checking: Java has type checking too, and at the same level of rigor. Java doesn't have the fancy polymorphic type inference thou. I'll be honest, I'm a type inference monkey; I love it. But, after a year, the code is very difficult to read without a bunch of documention to explain what the fuck each thing is. Ok, great, so I don't mind having type annotations. I love C#'s var syntax (didn't Java 7 get this?).

Compilers: OCaml's compiler is way out of date. Multicore support is lacking. There were some memory limit issues (if I recall).

Speed: According to the shootout, OCaml is losing to Java. This makes me sad as I was a Java hater. It looks like we live in a C/Java world. I do find the shootout results interesting for Java. This makes me sad because I was invested in OCaml for about a week as I worked night and day on node.ocaml.

Syntax: I'm older, and more mature now. Now, I like my code to be a bit more verbose and less conflated. I like the idea of having a syntax that makes things a bit over-ly verbose now. This is both because I'm older and need to get things done rather than reverse engineer the syntax, and I need new engineers to get up to speed faster.

Technology - Keeping poor people poor

Technology makes us better (mostly).

It's a simple promise. Someone is going to innovate and solve a problem or optimize an existing problem. This process creates wealth, but it also creates a poverty.

Let's consider the case of the baker. A baker would mix, kneed, and bake bread. Let's say the price of bread costs one bitcoin and it takes one hour to bake a loaf of bread. The baker has 10 families to feed with their oven. Everyday, the baker gets up and prepares bread in an hour and then loafs around for ten hours to exchange loafs and interact with customers. Each family gets their bread and pays one bitcoin. Now the baker traded ten bit coins for eleven hours of his time. This is good revenue that enables his family to live well in the cottage down the street.

This lasts about ten years until those families have double. Now, there are twenty families. He doesn't want to spend twenty hours loafing about to bake bread. He has the decision: innovate or expand. He is rather primitive and hires an assistant who then takes over the night shift. Eventually, the assistant gets pissed off about his less than fair wages and then builds his own bakery. Now, the society has two bakers. Years pass again, now there are fourty families. Each baker is stressed and tries the same thing, and the cycle repeats until there are four bakers.

Now, someone invents a high capacity oven that can bake four loafs per hour. It's a innovation that changes the world. These bakers have a choice since there is more bread than what is needed. The families know of the same innovation and now expect a lower price from the bakers because they are not working as hard. So, they lower the price to half a bitcoin. In lowering the price, they have to increase their volume to maintain their life style. Now, there is competition. Now, location, quality, selection, branding, and other ways to differentiate become a part of equation.

So, location is the key differentiation. Now to get bread, most families go to the original baker since he is at the heart of the town where most work is. Those other bakers? Well, they are fucked. Now they are unemployed in a useless profession until the families spawn more. Some of them try to work for the original baker who is now dominating the entire market. Each of them know that they can be replaced, so the market losers try to win in the new job market. The baker can't hire everybody, and why should he? Two people working half as hard as they used to satisify the same demand. The employee baker is now making more money what he was making (although not as much as the boss employee) and he is doing less work; he is happy. The other bakers are out in the cold.

Now, these bakers out in the cold either have to survive the "baker" recession which lasts until the next generation of families pop up. By then, the market mover can hire them as employees. But, it doesn't end as an eight capacity oven comes out just as the families are appearing.

This is the story we have.

Lessons.

Technology has the prime directive of making our life better in some way, shape, or form. This has the unfortunate consequence of requiring people to be agile to change. We invent stuff, and sometimes, it changes how we work enough to the point where we are more productive with less of our most precious resource: time.

If you want to be successful, then be prepared for and anticipate change. I live every day like a super computer is going to do my job tomorrow, and I love it. I'd much rather be camping.

We should want to be lazy, sitting on a beach, camping in the forest, climbing a mountain, playing chess. We should want these things. Some of us spend so much damn time trying to innovate and put ourselves out of a job that we forget to do the things we are working hard to achieve.

Course Correction

I'm a huge fan of education, and I love to learn. I'm a fan of any government program to try to take unemployed people and make them useful in other contexts.

I'd vote for something that reduced the work week from 40 hours to 32 hours yet required the same pay. I know some 40 hour jobs can be done in 4.

Poor people, by definition, can't change into the roles needed by magic. College is expensive. Computers cost money. Government training programs suck and only prepare people for 1-2 years until the next innovation makes them useless.

I'd be in favor of sacking a lot of wealthy finance people since I believe they did steal of a bunch of fucking money with stupid rules. I have yet to see how an excel spreadsheet could turn into a robot to do my dishes.

Friday, September 2, 2011

Intersecting Lines (The Art of Representing Data)

So, Atwood's Law. Yes, I'm thinking about writing a very light weight computational geometry library in JavaScript. I'm purposely going to target 2D computational geometry because (a) it is 'easier', (b) it can be more readily understood. So, I'm about to start work on the "kernel". I'm about to define a "line".

Well, how do I do that?

If I go back to what I taught college algebra students, then a line is

y = m x + b

Ah, yes, intercept-slope form. Beyond the fact that this is a very easy representation of a line, it is very fucking useless. Why? You can't express vertical lines. Let's look at another version

(y - y₀) = (x - x₀) m

Oh, yes, still useless. Can't express vertical lines. Ok, Ok. Back up, college algebra was apparently a waste of time since it didn't give a meaningful definition of line. Somewhere, probably graduate school, I realized a line a equivalence relationship of some sort. Aha, yes, a general form to a line.

a x + b y + c = 0

This is way better as I have a complete representation of a line. Yay! Ok, so, let's use it. I can take two of these bastards and solve the system of two unknowns. Ok, what happens when there is no solution?... uh, they are parallel and don't intersect. Ok, so, yes, I find the point at which they intersect. Is this useful? Well, it could be, but I've lost information. Let's look at this equation in vector form.

N (x - x₀) = 0

This would be the normal-i-can't-remember-name-but-similiar-to-plane-equation form. This is a nice and cute representation, but it gives be a really poor way to intersect lines naturally. It has a beautiful explaination thou, basically, the line is a collection of all vectors that are orthogonal to to the normal (when translated to the origin). While this is a useless computational representation, it does tell us something useful about the general form. Namely, N = (a,b); isn't that nice.

So far, the general form has given us the most bang for our buck. However, it still sucks since this general form is a compressed representation of a line that enables me to (a) check to see if a point is in the line, (b) find more points in the line, (c) find a line that is perpendicular to it. In the platonic world, this set is awesome. However, the real world tend to suck and lines start somewhere. So, let's go back to that vector stuff and consider (briefly) an Affine linear combination.

L(k) = k * P₁ + (1 - k) * P₂

Is this a line? Well, it only has one variable and it has at least two points. Well, as cute and as insightful as this is; it is still useless. Too many operations.

L(t) = P₀ + D t

This is what I am going to go with. I know where the line starts! I know which direction it is going! It is very easy to intersect two lines! It's easy to compute the normal of the line (use complex numbers to rotate D)! It is trivial to compute new points in the line! As an added bonus, I get rays and segments for free by trapping t in a one dimensional box. Yay!

Moral of the Story

There are many ways to represent the same or very similar data. The one you learned first was probably not the best.

Having been alive for a while, I've learned to try to think ahead of time to find the right representation for the data. That is, after-all, 90% of computing. If spend a bunch of your time writing algorithms, then you are probably doing something wrong.

Am I making the right decision? Not sure. See, now that I've decided which numbers to actually store. I now need to pick between an object, an array, or a composition of arrays. I'll benchmark that since that is part of the core, and it will affect both the performance, readability, and beauty of the code.

Final Decision: going with readability and the 3rd representation even thou it costs double the CPU cycles. Although, I am impressed with how fast JavaScript is.