Adventures of a Protoss in Seattle: 2010

Sunday, December 26, 2010

3 Types of Programmers: Zerg, Terran, Protoss

There are three types of programmers in this world.

Terran Programmer

Ruggedy, the terran programmer gets shit done and is smart enough to make it work at every level. The code isn't sexy nor elegant, but it gets the job done and works well enough. Their tools are what-ever they can afford.

A terran programmer usually works best in a start-up or as a technical leader. A canonical example of a terran based company is 37 signals.

Zerg Programmer

The company matters most to the zerg programmer. They need their IDE (i.e. creep). Management needs to hire lots of them to ship even the most basic of products, but they can hire hordes to solve problems of scale. They depend on their queen vendor.

A zerg programmer works best as a cog in some corporate machinery, and they tend to use a Microsoft or Oracle products. Most offshore outsourcing company are an example of a zerg company.

Protoss Programmer

Shiny and advanced mathematics is the primary tool for the protoss; this greatly limits their numbers. They use languages like Lisp or ML to develop spectacular results, and they are free to use anything.

The protoss tend to cluster in academia until they have matured to the point where they have the insight that can power a company. For instance, Google's PageRank is a protoss insight that powers Google thus making Google a protoss company.

Moral

If you build a company, then you need to ultimately use people to get things done. You need to find the right people for the right job to get the company as a whole executing.

Each type of programmer has their pros and cons in a business, and the goal is to utilize and structure the company such that everyone works together effectively.

If we ignore (or worse argue) about differences, then we miss out on the potential to work together and build truly great things.

Testing is not a waste of time, I don't know that your code works

I've been in the land of formal mathematics where all the equations are good and correct.

My venture into computing however is not so pure as I would have liked. I tried to be virtuous in this land by bring the gospel of proof and correctness, but I have failed. I have given into temptation of the wild out laws. I have abandoned static types in favor of dynamic types. I have abandoned super-planning waterfall methodologies to agile methodologies. I have abandoned my safe IDE for the raw ninja powers of nano/gedit. I have abandon formal techniques and am working on developing testing based techniques (relevant to my domains and projects).

Here are my thoughts on testing.

Interaction between Proof and Testing

If you can prove it, then you can test it. You know where it should work and where it shouldn't work. Thinking about testing first will give you hypothesis on what to prove. Once you get it working, do you need to prove it? Usually no; but you should have unit tests that enable you or your team to reasonably do the engineering needed to take it to production.

For instance, if you are writing PHP, then the academic portion of the brain can bang out the parser and interpreter in a couple of hours. You can bang out the proof in a day. The problem is that real engineering has to come in and optimize it for production. This means going down into the bowels of code and optimizing loops, changing data structures, finding code equivalences. Unit tests help engineering efforts undergo regression testing as the entire code base is optimized. As the code grows, the feasibility of a proof diminishes. Ideally thou, the proofs enabled the tests to be constructed such that "If all these tests pass, then this proof reasonable asserts the code is good"

Stupid Shit

When I dropped static typing, we hired a QA guy whose job was to use the product every day. So far, we don't having typing issues. We have buttons that don't work due to typos or overlapping divs. Sometimes, your code can be proved correct if it just works. Usually humans need to test this.

I think I can solve some of the issues with selenium IDE, but the time for me to solve it using that methodology combined with the opportunity cost of doing something else means I should I just hire some scrubs off the street to check buttons and check for other faults as well.

You and I are not in control

Let me introduce you to google maps and how they did versioning. I picked a URL for google maps that represents their bleeding edge version. One day, half of our clients didn't work any more. WTF? They updated to a new version and broke our shit. Fortunately, they also provided an archive of old versions and a little hacking into production server and it was working again. I didn't know this at the time, but fortunately Google knew enough about it and started archiving.

We live in a time of interdependence. I trust you to develop service XYZ, and you trust me to develop ABC. Either you or myself can fuck up, and those bridges need to be tested and measured. For instance, I have a server to proxy geo coding requests. I do this so I can enable workarounds when google fucks up (and they do about 0.01% of time which is why I have a table of 18 addresses that google couldn't geo code correctly).

When google started promoting version 3 of their api, I found out that a quick URL change didn't work for me. My tests however told me where it didn't work and I could treat my tests as a simple todo list and grind through the version change. As long as the tests measured every use case that I needed, then I was fine.

You need to develop tests for things that you don't control. You will have many black boxes and you need to test how you use them and your assumptions.

In summary

No matter how smart you are, I guarantee that you will eventually need testing because testing will find you and the business will depend on it. Refusing to test is just ego because a test will fail and that will compromise the integrity of your ego's self image of you as a perfect and heroic being. Once you accept this, allow your ego to rest and stay hidden during real engineering.

Sunday, December 19, 2010

17 Thoughts on Programming

Your constants are your client’s variables.
All software is layered like cake because no one can commit. Those that can commit, fail.
Program's don't learn. Programmers just learn new tools.
Eventually, your program becomes someone else’s function.
Be one with the machine, and you will be annoyed by your code.
The code you are working now that is special fits within someone else's general framework. In a month, you will have wished you knew about that framework.
If you don't have any loops, then you haven't done anything except play with Legos. Why is it bad to play with legos?
If you could communicate complexity, then it wouldn't be complex.
Velocity induces complexity (either technical or managerial).
Your software will be abused my criminal minds.
One half of a business always builds top down, the other builds bottom up; the people doing it top down will get the credit.
If your crappy code makes it a need to hire ten people, then at least feel good about the economy. Also, be the owner of the vending machine.
It is fun to optimize, but it is hard to evolve; if you evolve, then you grow and find new things to optimize.
Every language emits beauty, and every language emits horror. Choose wisely and cluster people appropriately.
Sometimes, you will solve a real problem; most times, you will solve a problem at someone else's expense.
Software as a Service is an infinite recursive chain of passing the buck. If you accept the buck, then you can keep it.
The person that follows your steps probably has different designs, enable them to rebuild and learn from your work than force them into the same idioms. After all, they have to maintain it.

Saturday, December 18, 2010

Defer Deletion, Garbage Collection, and Bulk Undelete (using WIN + CouchDB)

I really, really, really hate providing the delete function. So, for WIN, I provide a delete that doesn't delete until 31 days have passed. It allows me to sleep at night and dream of unicorns.

I have an updator that will change the name space of the document and adds meta data to the document related to the deletion.

https://github.com/mathgladiator/win/blob/master/lib/win.config.js#L110

Then I provide a function to the environment to easily call the updator that looks and tastes like a delete:

https://github.com/mathgladiator/win/blob/master/lib/win.environment.js#L139

Every day, I have a cron job that looks at this indexer

https://github.com/mathgladiator/win/blob/master/lib/win.config.js#L118

and it kills them one by one. Now, I know that actual deletes will be done in 31 (or so) days.

What if I need to undo? Well, it is very easy to undo one element. If I would like to undo a whole bunch, I have to provide a common key to un-delete from. That's what the action parameter does. If I write a loop that deletes a bunch of stuff, then I need to build a fairly unique key that enables me to undo that batch delete.

Thursday, December 16, 2010

Understanding a sea of JSON with Map Reduce

CouchDB stores a lot of data in a sea of JSON, and it isn't exactly easy to get a good grasp on what there is.

For WIN, I force each object to have a name-space field called 'ns'; this enables me to partition the data and enable developers to partition the data. Ideally, this helps in keeping things separate.

A fundamental problem is that I want to have an idea of what it is in the data set and be able (and enable developers) to write appropriate documentation so everyone stays on the same page. I would also like data to adher to some kind of structural quality. However, it would be nice to be able to look for oddities that could become future support issues (it would also be nice if everyone used the same language and kept things consistent; I would rather nip inconsistencies in the bud earlier rather than later).

So, I flatten the structural qualities of each object and count them using this code (for CouchDB's incremental MapReduce).

http://pygments.org/demo/12753/ (alternative http://pastie.org/1384759 )

This enables me to grep the code base and then use blame to work with the developer to resolve oddities. Or, I can turn a blind eye because it isn't in a table that matters that much (i.e. meta data or user controlled data).

I can monitor this for changes daily to determine what is happening on development (where oddities first get introduced).

This mode of thinking enables me to think about unicorns when it comes to the database (oh, and never allowing anyone to delete; everything goes to trash with an trash_goes_out_on field that is set for 60 days in the future when it will be actually deleted).

Tuesday, December 14, 2010

Database Development Mistakes as NoSQL propaganda

Context

http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-application-developers

Summary

Not using appropriate indexes
Not enforcing referential integrity
Using natural rather than surrogate (technical) primary keys
Writing queries that require DISTINCT to work
Favouring aggregation over joins
Not simplifying complex queries through views
Not sanitizing input
Not using prepared statements
Not normalizing enough
Normalizing too much
Using exclusive arcs
Not doing performance analysis on queries at all
Over-reliance on UNION ALL and particularly UNION constructs
Using OR conditions in queries
Not designing their data model to lend itself to high-performing solutions
Selfish database design and usage.
Abusing denormalised data
Scared of writing SQL
Dogmatic 'No Stored Procedures' policies.
Not understanding database design
Not using version control on the database schema
Working directly against a live database
Not reading up and understanding more advanced database concepts (indexes, clustered indexes, constraints, materialized views, etc)
Failing to test for scalability ... test data of only 3 or 4 rows will never give you the real picture of real live performance
They only test on toy databases.
Not using indexes.
Not communicating with experienced DBAs.
Poor Performance Caused by Correlated Subqueries
Forgetting to set up relationships between the tables.
Not using parameterized queries.
Favoring "Elegant" code over highly performing code.
Not doing the correct level of normalization.
You want to make sure that data is not duplicated
Using Excel for storing (huge amounts of) data.
Unnecessarily using a function on a value in a where clause with the result of that index not being used.
Not adding check constraints to ensure the validity of the data.
Adding unnormalized columns to tables out of pure laziness or time pressure.
not so much about the database per se but indeed annoying.
Not taking advantage of CLUSTERED INDEXES
Not using a SERIAL (autonumber) datatype as a PRIMARY KEY
Not UPDATING STATISTICS on a table when many records have been INSERTED or DELETED.

My Thoughts

All of these are consequences of using a one-size fits-all solution for storing your data. Fact is, application developers shouldn't worry about how they use data. They should be able to get their job done without worrying about the long-beard in the back room. I've been in this role, and I can sympathize with it.

Then, I realized something has to change. I took away SQL and built a very simple RESTful layer to the data layer, and then I watched how application developers solved their problems. I was amazed at their cleverness. Instead of saying "oh, these silly application developers are so dumb and don't know shit about databases", I said "I wonder how clever they could be if I just gave them memcached and simple get/put/by_index".

They taught me a thing or to about how awesome memcache can be (especially with cron-jobs).

Ideally, if you are building the data layer, then all you need to do to enable application developers is get the right complexity class out of the data. If you have ten billion things, then you need to provide the functions that get to a thousand things relevant to what the application developer needs to do. For bigger tasks, computations are best represented with MapReduce, and I feel that MapReduce is way easier to learn for fresh application developers. CouchDB's incremental MapReduce is by far the easiest to learn.

That being said, performance is always going to be an issue. If you enable developers this way, then you need to provide a realistic environment.

Have a development server with more data than production and with a slower CPU (if you can't do this, then you the ability to connect to production in a read-only mode).
Force them to profile their code (ab works very well for most situations)
Work with business people to define how consistency should work
Train them to do cache invalidation

In my mind, core to the NoSQL movement is the ability to empower more people with persistence rather than knowing edge cases of how a database works. The sooner you empower people, the sooner the business can iterate and the more time you can dedicate to sitting on a beach thinking about unicorns.

Related entry: Big Data enables Agile Data.

Sunday, December 12, 2010

Why I gave up on static types

I like programming language theory and how to use typing to do some pretty impressive things, but I'm getting older now and I just don't give a shit about types for day to day stuff. I also gave up on object-orientated code. I also said F-U to relational database theory. Why?

Because people using your product don't give a shit about how it gets done. That's the reality. They don't care if you use assembler or JavaScript. They just don't. The question is: can you make people happy. The more important question is: can you sell? can your team sell? can your sales team make compromises to make the sell?

This last question is the question that I ponder about since it affects my profits. Do I want to put up some academic/aesthetic wall in front of a sale? Or, do I want to enable them to make a sell?

This is where all that rigidity breaks down and I ask a new question. Is this methodology or technology better for sales?

Static typing? No.
Object Orientation? No.
Relational Databases? No.

There is a lot of bull-shit technology out there (especially built on .NET or Java) that is simply a wall to sales. Now, it does depends on what you are doing, but ultimately it comes down to sales.

My issue with static types is that I can't add new members at run-time; nor does it propagate. Everything I do now is basically a giant JavaScript object that I pass around with JSON. I don't care what is in it. From a business point of view, I know that if everything in the system doesn't try to map the JSON into a static class, then I keep all the data; it just propagates. This enables me to change elements at the data store like adding a boolean named "my_sales_team_is_awesome_and_sold_a_feature_that_can_be_added_by_a_bool", then I can sleep knowing that the entire system will just deal with it and pass it along. I don't need to deploy a binary nor compile across an entire system to add a little bool.

My issue with object orientated code is that most of my stuff is non-inherited. I have things that can not be objects. While I do use the JavaScript object a bit, I don't use prototypes. I just treat it like a map and move on with my day. I don't give a shit about binding code to data; this is the worst possible thing you can do. I need all my data in a format that it is (a) obvious what it is and (b) easy to transform by looking from the outside. This is my data model guide line; if any idiot can look at the data and know what it means, then it is a good data model.

My issue with databases is the same as static types. I don't want to plan out how my data is going to look. I don't want to think. I want to be agile and just capture data and throw it into the database. I want to capture as much data as possible then organize it later. I don't want to think about normalizing which I can always break (show me your schema, and I will find a feature that will break it). I just want to put my data somewhere safe and have it replicate. This is why I use CouchDB. It's very relaxing.

Looking back at my life, I realize that I was wasting a bunch of time and energy trying to reach a goal with stupid means. My goal was to enable crazy fast development, and I achieved this goal by simply changing my outlook and aesthetics.

Having said that, I realize that there are reasons these things exist. If you need them, then you should use them. I love static types, but only for raw performance. There are performance patterns that can be implemented as a server that are very flexible, and those are important things to learn as they enable you to deploy safe services. The problem thou is always with specifics.

Oh, it also helps to have mastered grep and write code that enables grep to be useful; this is an amazing productivity boosts for when static types are actually very useful.

I haven't completely given up on types, I just now realize that their place is not where I would have liked it. If you look at my github, then you can probably tell where I've been spending my time in terms of type system.

That's right, I'm a node.js junkie. I just spent a weekend cutting a new version of my platform, and I have to say that I get amazing velocity with it. So much so that I can focus on leveling up my design rather than painting yet another bike shed.

Saturday, December 11, 2010

WIN is looking good; good enough to start documenting and testing more hard-core

Well, I put WIN into production. I learned that if you rely on unsupported couchdb code, then strange things happen since there is no debug code. I found a bug in node.js that I need to mock up and send to the node.js team. I learned that I don't like looking at more than 1K code.

I just spent half a day re-factoring and cleaning up win so it makes more sense, and I added crap comments. I also linted to look for stupid issues, so it looks a lot cleaner now.

So, now, I'm going to write the guide ultra-hard-core fashion. I am confident in the patterns that I am going to present, and I'm confident that the system can be hacked to get anything anyone would want.

Thursday, December 9, 2010

Why Mustache is for WIN

Mustache is a logic-less templating language. By being lacking in logic, it easily enables cross-language template interpretation. This is important for two reasons.

It protects work in constructing good DOM. This is true for many template languages, but it makes sure the assets are protected from language change.
By enabling templates to work in multiple languages, you enable it to work it multiple contexts. For instance, if you have a search feature that you would like ajaxified, then you can work towards producing a JSON object. For SEO, you use the template to send off the HTML. For Ajax, you can just get the JSON object and do the JSON to HTML in the browser. Generally, JSON is more efficient to send over the wire when compared to HTML; ergo, you get a snappier response in addition to faster development time (by only writing one template and not worrying about DOM manipulation).

When I approached Mustache, I was hesitant at first since I do not do much front-end engineering. I wondered what it can't do, and I've looked for counter-examples. Simply put, I can do amazing things with mustache. So can you.

The key is to think of Mustache as just a simple HTML encoder over a giant JSON represention of the module, page, layout, etc. You will put in some silly things in the JSON, but in the end it will enable something very powerful in you architect around getting a giant a JSON object back.

Namely, it is very easy to automate testing on giant JSON files. That is, it is easier to script against JSON than junk HTML. For me and WIN, this is a fairly important question as I would like to be able to crawl my entire projects to look for errors.

Wednesday, December 8, 2010

Say Yes to Internet Censorship

Why?

Because it will make things worse.

When things are bad, talk begins of revolution.

Viva La Revolución

By the way, this was troll-bait. Just an experiment. Of course, there should be no censorship, but that is obvious to me. Is this not obvious to others???

Tuesday, December 7, 2010

3 reasons why I don't key off of email anymore.

For some of my clients, I built their stuff such that a user only needed an email and a password. Registration was easy and it was awesome. Now, I have introduced a login name back in. Here is why.

Emails Change

I had clients that lost jobs and they needed to change their email; well, that required writing a change email function. That's not pleasant because emails may already exist due to a prior sign up or a different use case.

People get fired, two employees at a company. One used their personal to sign up to product where as the other used their business. The one who used their personal got fired and the needs to transfer access to the other employee, but its already taken. So, either I have account merger or they manage multiple credentials. Never the less, they have to call in for support if we present an obstacle. Additionally, companies get bought and emails change.

By adding the level of indirection, I'm enabling them to handle these issues themselves rather than supporting it on our end.

Multiple Accounts per Email

If you enable a single email to manage multiple accounts, then you help them out companies that have different billable uses of your product. Otherwise, you require them to be able to setup multiple emails which just sucks.

Multiple Managers/Owners

If you focus on providing a single account, then you can enable your product to be managed by multiple people (or enable collaborative features). It is easy to key off of the account's login name to enable multiple users to access the account.

Sunday, December 5, 2010

Entrepreneurial Enlightenment and Insight

“Ideas are a dime a dozen” is a very stupid saying just like “work smarter than harder”.

Many of my math professors said that you know more or less all the math you are going to need to know, but you need to be able to communicate it (that is, you need to be able to communicate math). That’s the true goal of the master’s degree program.

Understanding “Ideas are a dime a dozen” is the same thing that many entrepreneurs know but can’t express in an effective manner. Communicating is harder than knowing.

Here is how you get it to knowing that “Ideas are a dime a dozen” (or at least, how I’m trying to try to sell it to you):

Sit in a business.

Watch the people.

Be creative on ways to make them awesome. What would make the business better? What could you sell them to make them better?

Do this every day, and you will have lots of ideas.

Once you have an idea, you need to be able to execute on it. Execution is the art of getting things done and progressing the state of a business from conception to cash flow.

I have a torrent of ideas, so I’m set for life in the idea category. Now, how can I execute. Here is a generic three step business plan.

Build it (Engineering)
Find Customers (Marketing)
Sell it (Sales)

Once you see the pipeline, everything in the world starts to make a lot of sense. Everyone has a happiness function, H, and a life-is-shit function, S, that they use to make decisions.

Your goal is to maximize H-S.

If the market is small, then you can be a one super awesome consultant.

If the market is big and mundane, then you can build a company.

If the market is big and complex, then you can build a firm.

Once you have ideas, how do you execute in #1, #2, #3? How do you build it? Do you know someone that just builds things? How do you find customers? Do you know someone that can network or go door to door? How do you sell it? Do you know someone that can sell ice to an eskimo?

Once you can answer these questions, you can build a business. However, you must keep in mind that the people in the business are the only true asset it has. Do you have the right people doing the right jobs that they want to do? That last bit is basically my digested form of Execution: The Discipline of Getting Things Done (which is a good book, but you have to put yourself in their shoes to understand what they are saying. It’s not an easy book to read.)

Once you start to have 10+ ideas, you can either just blindly following them (I'm very guilty of this in my life as a coder), or you measure them and pick the best one. What is the best one? After-all you have your own H-S function. It could be by impact on the world, revenue, profit, job creation, or just plain fun.

I hope that I have communicated how to gain entrepreneurial insight into how to manufacture ideas. This is why I write my blog, so I can level up my communication capabilities. If you find yourself like me, knowing things but feeling an inability to express them, then you need to start writing now.

Thursday, December 2, 2010

The problem only the best programmers can solve: trust

I just watched DHH's key note at the Ruby X Conf, and I must admit that DHH's concept of freedom was inspiring.

After thinking about it more, I know why. Programmers tend to be control freaks. I know I'm a control freak, and I'm slowly giving up control so I can get away from the computer and get outside more.

The central problem of team programming is trust.

Static typing tells me that I'm not trustful enough to keep to my own convention and keep my shit straight. Having worked in JavaScript for so long now, I don't even think about types. I like the benefits that static typing can provide (performance), but for day to day stuff, I don't care nor do I really think about it. If it does what I want, then I'm done.

Monkey patching is very interesting, and the same capabilities is present in JavaScript. I find it useful. For instance, in WIN, I needed a trim function for strings. Why doesn't JavaScript have a .trim function? I don't know, but I can extend it's prototype. I find this is very convenient. I can also plant bombs in the string prototype or the object prototype, but I don't want to do this. I need write to tests to test the basic assumptions about the code that I'm using.

When I defended lock based SCM, I was basically saying "I don't trust developers to work together". Now, we use mercurial and I don't worry about it. If developers have a conflict, then it is their responsibility to fix it, and it is management's goal to manage assignments such that conflict is rare.

That's the mentality that I've had to develop when switching from academia programming versus industry programming. When someone else breaks something, they have to fix it. Fucked up a type? You fix it. Broke the string prototype? You fix it. Got a conflict? You fix it. You did something stupid? Ok, we are human, now go fix it.

When people have the responsibility to fix things, the quality get better organically, and best of all. I'm usually left out of the picture for most of it.

Oh? you would like root password. Sure, that's fine. We have a root password ceremony (it has hooded capes and everything draconian with candles), but I trust them enough. If a developer fucks up prod, then they fix it.

The key to enabling DHH's freedom is empowering trust. The key to empowering trust is knowing how to protect liabilities. If you are able to take backups everyday, then you should. We do. You should also test the restore every week. We do. If you are unable to take backups, then you need some form of revision control and never ever delete anything.

When you are programming, the biggest liability you have is how you persist the state of the business. The next liability is how much you annoy your customers (i.e. infinite loop of sending emails = very bad). Once you figure out how to protect the company's ass and enable developer freedom, then you are golden.

I think that DHH's concept of freedom is an ultimate goal for the next ten years for both programmers and service providers in many industries. Some industries however, are always going to be control festivals simply because that is how that market works. I would not want a airplane control system written in ruby. I would rather it be done in OCaml with the most insane type system ever. Fortunately for the majority of programmers, these examples are in the minority. If you are in that minority, then you know enough about programming languages to build your own prison to protect the business.

I think DHH's sentiment on programming languages extends to databases, and this is why I work with and promote CouchDB. It just makes me happy.

The influence of advanced mathematics on programming.

I'm selling many of my books on amazon, and as I was going through the books I realized that most of it was useful, but only useful an indirect way. I would to share some thoughts on how my studies in graduate level mathematics influences by day to day operations of building products, managing databases, and doing everything a free electron can do in a given day at a start-ups.

Topology

I think the best introductory book to topology is "Introduction to Topology by Crump W. Baker". Topology is basically the study of connectedness and surfaces. When studying topology, you think about how are things different. Are a donut and coffee cup the same? Well. Yes they are once you define what "same" means. There are practical programming challenges in topology in how once can process and do feature selection in computer vision. But, there are more mundane ways of applying topology.

For instance, a relational database is a topology in a discrete graph sense. How does this help me? Well, I'm about to do some stupid DELETEs and UPDATEs on a very large data set. Is the data set before and after the same in regards to current business value? Did I botch up? Topology comes in with the idea of topological invariants. A topological invariant is a quality that can be measured and is invariant under any continuous transformation (isomorphism).

If I were to write a query that measures the business value of the database (say, by the sum of the transactions, sum of paying accounts, and so forth), then I can use these to get a good sense of whether or not I botched up my changes by measuring before and after.

Algebra

If you take a bunch of things and make those things operate on each other, then you have an algebra. There are a lot of properties involved in what the operation implies, and the first year is basically dedicated to defining all those properties and understanding their significance. The ultimate results you typically end up looking at in a first or second year course are the unsolvable theorems (i.e. Doubling a cube).

My most immediate thought on how any of this has any practical bearing on programming is MapReduce. Algebra, in my mind, plays a huge role in how to think about designing algorithms in a MapReduce environment. Namely, the reduce phase where you think about merging. Given two or more documents, how do you reduce them to one? The algebraic properties are things that one must consider (and you may get them for free).

Analysis

This is my favorite branch of mathematics because it is the puzzle of inserting zeros and bounding values. I recommend anyone to check out The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities. It is an amazing little book that I go through every year to make sure I'm still smart enough to call myself a Mathematician. The most obvious application of this art is numerical analysis. However, most of the time, I don't need to do any numerical analysis since I work primarily on search problems these days.

Unfortunately, most people get the short end of the stick when they study calculus and get a very boiler plate version of the Calculus. I recommend Differential and Integral Calculus.

I take analysis concepts outside of code and into management. For instance, how can I measure the code and enforce a code quality metric to prevent SQL injection hacks? How can I enable developers to converge to a right answer under QA? What does QA need to do?

Proofs

I must admit that I was a stickler when it came to proofs since writing a proof is just as much fun as programming is to me. When anyone writes a program, they are writing a constructive proof that something exists. This begs the question of whether or not that something is what you want.

Does your program need a proof? Two years ago, I would have sad "absolutely". Now, I don't think so because proofs are kind of useless. The problem is that I have to understand enough about the formalism for the proof to make any sense. Well, the source code is a formalism of its own; in fact, a very precise formalism. The proof is already written.

Are proofs useless? I think going through the years of writing proofs has helped me write very good tests. I can look at the code, and know where the problem spots are going to be. Those trouble spots are going to need tests to ensure they work as expected. For instance, reliance on third party services always requires some kind of tests to ensure that updates are working as expected. Things I don't control are things I don't have a chance at proving, so I need tests that are automated and tested daily.

Problem Solving

I think the study of mathematics is probably the fastest way to build problem solving skills since you constantly fail and each failure costs nothing, but the caveat is you may not be solving practical problems. However, building it up as a skill enables you to be more effective at being a programmer.

Do you need advanced math?

Not really. Most math is basically a form of mental masturbation and building the mental discipline/stamina to sit down and think very hard. I think it makes you better in some ways, but there is an opportunity cost. It all depends on what you want to do. If you want to ship products, then you are probably fine to avoid it. If you want to make awesome libraries and sell them to product people, then you probable need some advanced math.

Friday, November 26, 2010

Programmer Legs (And a potential patch/cure for restless leg syndrome)

When I went to college, there was this room in Nichols that was primarily dedicated to computer science courses. During many (if not all) of the classes, the room would shake a bit. It was kind of annoying, but it was due to the fact that almost every person was shaking their leg.

This is what we called "Programmer Legs". We attributed it to the fact that we didn't get out much, and were mostly pale shadowy figures who were deprived of Vitamin D.

Fast forward 6 years, and I have restless legs. For those that don't know, restless legs is very unpleasant. Imagine you are laying down, and your leg is compelled to move. If you don't move your leg, then you will start to feel impending doom. You will wonder many things like "omg, do I have a blood clot?" or "holy crap, I'm going to die". So, you get up and wander around and you get a drink from the fridge. No problem until you lay down again and the legs get angry.

Well, two days without sleep and I was off to the doctor. They gave me some Naproxen Sodium (pain killer) and Carisoprodol (muscle relaxant). This fixes the problem by putting you to sleep, but it doesn't really help. After you use the prescriptions up, you may have a week until your legs get restless again.

Unfortunately, my muffler broke. This unfortunate event forced me to drive my car to Goodyear to get it fixed, and I walked home.

That night, I slept without problem even thou the prior night I had a re-occurrence of restless leg. The next day, I walked again to pick up my car. Again, that night, no problem. The next day, I basically worked all day and that final evening I had minor issues.

Eventually, I realized a correlation and built a hypothesis. Walking is good you.

If I walk a mile, then I can go to sleep with minor annoyances.
If I walk two miles, then I can go to sleep with no problems.
If I walk five miles, then I get a whole week of no problems.
When I walked ten miles, I got two weeks of no problems.

Now, I try to walk everyday. If I can't go to sleep, then I go for a quick walk (about 1.5 miles) and get to sleep just fine.

While I'm annoyed that I body has decided to jack me and force me to exercise, I'm finding the liberation of being able to walk for hours very... enjoyable. So much so that I'm planning to walk 50+ miles sometime next year over the course of three days.

Big Data Enables Agile Data

The funny thing about NoSQL is that it is being solved and addressed by the Big Data and Scalability communities where there are legitimate problems of scale that are very difficult yet it enables Agile Data.

Here, I define Agile Data as:

the ability to record all available data at the point of a transaction/form/user interaction (including a context)
organize data after data is available

It should be clear from this description that an RDBMS is not Agile in this sense as it requires me to organize data before I collect it. Sure, there is a way to achieve the above with a RDBMS and you could develop a methodology or an engine to accomplish it, but that violates the spirit of an RDBMS since I would just be packing JSON objects into a row.

The perfect example of a Big/Agile data problem is that of analytics. I would like to record as much information is available (the http headers, the client data, maybe some page content, etc). Sure, I could build a structure/schema to try to solve the problem that I think will be valid, but then I'm potentially reducing the amount of information I'm gathering. Instead, I take the mantra of "gather everything", I finish the collection faster and can start studying the data to look for interesting patterns.

The really neat thing about having Big Data at my disposal is that Agile data introduces Big Data problems since storage requirements grow a lot faster than with a typical solution.

Wednesday, November 24, 2010

The Secret of Innovation

There is a phrase that has always bugged me.

"Work Smarter, not Harder"

The reason it bugs me is because every time I say it, I feel like a douche bag. The reason I feel this way because I'm telling people they are doing something dumb and many people associate doing things in a dumb way as an indicator of their intelligence. This is why I don't say this phrase anymore as it isn't helpful. It sounds nice to be able to say because it transitivity says "look, I'm smart, and you could work less if you were as smart as me". That's a douche-baggy thing to say.

Now, having said that. If you can achieve the goal of working smarter with less effort, then you have innovated. This is the secret to innovation.

Before I leave for the day, I think the following thoughts.

How can I do the work I did today in half the time?
What did I learn?
How could I have done better?
How would I explain what I did?

That is, I focus on how I can improve my methodology, my education, my quality, or my communication.

Why you don't need JOINs (and the RDBMS to do them)

Before I sit down and design a back-end for a project, I write the ideal API specification that would enable developers to be happy and enable them to provide all the polish and sizzle to sell the product. Then I turn the spec into a RESTful service where I don't worry about complexity nor scale. I let the developers work with it and I collect data on how it is used, so I know where the crap is; there is always crap to deal with.

This process has worked out very well for me and my clients, and it is working out very well for the development community in general. We are in the SaaS phase where we produce and consume each other's services. This is nice.

Sometimes we consume two such services and then make something new. This is called a "mashup". Well, guess what? A "mashup" is equivalent to an application level JOIN. This used to be a service provided by the relational database, but now people are doing it by hand.

Better yet, people are used to it and they are kinda fine with it. This is a good thing as it enables service developers to focus on their services and let product developers focus on their product. Developers are learning to

Optimize, cache, pre-compute back-end requests
Write loops to efficiently cross reference code
Avoid the angry looking DBA

There are systems, like memcached, that exist in such a way that they enable product developers to solve product problems without the need to make the DBA change anything. Once developers are empowered, they can use their own creativity and intelligence to polish the product.

We may be looking at the last decade that DBAs are ever bottlenecks. Does this mean that DBAs are obsolete? No, it is a lateral move for them as they become service programmers with the role to produce optimized services. Can they use a RDBMS for it? Sure, they can eat what-ever dog food they want.

Having been on both sides of the fence as a DBA and a product developer, I am comfortable saying that NoSQL movement is definitely going to take hold in ways that people don't expect. I think it is going to restructure the entire way corporations view IT.

Sunday, November 21, 2010

Avoid DRY for Product Development

This is part of my comment on HN.

At the right level of abstraction, DRY is the best advice possible. If you are building a data layer or a system, then you will be best off if you keep things DRY. However, once you get into product land. Then, I advise people to just get the job done quickly rather than worry about engineering principles.

The reason is very clear, in product land, engineering principles are third to usability and marketing. Avoiding DRY enables two things.

Polish
If you have five sections of code that are the same now, then there is good chance that they will diverge as the product matures. This is polish, and it is a good thing. While it is true that you have more work to do, it isn't rocket science. Trying to use DRY for polish is going to create even more cumbersome code with a lot of branches for all the special cases, and it tries to create an artificial problem of intelligence. Please avoid.

Hiring Scrubs
If you realize that the polish needed to make a product isn't exactly rocket science, then you can hire many scrubs. I define a scrub as someone new to computing, but capable enough to work within a developer environment to find and polish simple code. I like to hire scrubs as it provides a great first job in programming for many people. I was a scrub once, and it wasn't that bad as a 16 year-old.

Once you enable these two things, you have enabled marketing and the usability folks to iterate.

Update: Don't just hire scrubs
It takes a balancing act to get products out using a combination of awesome engineers, "good enough" engineers, and scrubs. A company ultimately needs all tiers to be able to push and iterate products quickly, and it is the company's job to ensure that the people process can nurture engineers from scrub to awesomeness.

Wednesday, November 17, 2010

Escaping Mr. 20% and losing the ego

This is a fairly long response to a comment on HN.

I've written many lines of code and thought many ideas in my life, but I have not shipped nearly as many. Why? I love coding for coding's sake more than shipping. Shipping requires product development, non-longbeard design, marketing, sales, and more important: customers.

Except for the game engine, everything I did was the special 20%. All the 20% projects were just a neat little idea implemented as if the core tech behind a masters thesis or a senior thesis. However, they didn't have the polish to become a product, and I get easily distracted by more difficult and more sexy ideas. At best, most of them make me look like Captain Hindsight.

I'm perfect for academics, but I found myself being too impractical for my students. This was one of reasons I dropped out of graduate school as I had become an irrelevant preacher. I did not practice what they were going to do, so how could I teach them? It felt dirty teaching them fairly useless things. I was working with tools (such as OCaml and LISP) that enabled me to program better, but programming better doesn't mean shipping better. Most programmers just need to learn how to solve basic problems and have the discipline needed to ship products; telling them that they can avoid memory leaks by using OCaml is like telling them to learn french to avoid pissing off Mexican drug lords. Instead, they just need to get in the discipline of being careful with alloc and free. Any aspect of a technical decision could be looked at as either a technological or a management+discipline issue.

So, looking back on the pile of code I've written. They are all shit because they didn't ship. Shitty code that ships is better than perfect code that doesn't. This is one of the lessons in life that make people like me depressed, but I've gotten over that. I've helped shipped three products and working on two more. One of them is my November start-up sprint (which I'm doing OK on, but I could be doing better).

Problematically, building many 20% things builds an ego. Ego is fascinating. On one hand, you need Ego (or to utilize a customer's ego) to sell a product. On the other hand, you need to lose your ego to build, ship, and support a product. This is why you need to have someone else to do your sales for you. When you get some one to sell you, then you don't need ego anymore. You just have a bunch of work to do.

Monday, November 15, 2010

How to use CouchDB? like this

CouchDB is a very interesting persistence package, and it solves 90% of the problems you find when you build a back-end for a web application. The 90% that CouchDB includes get/put/b-tree indexing/reliability; all this is good standard-stuff in the database world. I want to talk about the other 10% since crud is boring.

The last 10% is usually something like search; it is the novel algorithm that takes all your data and provides it in a meaningful way that makes your product awesome. CouchDB rarely solves this (neither do other packages). The more special the algorithm is, the more painful it will be to try to solve with CouchDB's MapReduce framework alone.

Fortunately, CouchDB has replication built in. I use the replication to push data from CouchDB to a custom server where I aggregate it into a meaningful service. The library is called Otto (short for ottoman).

The biggest problem you are going to have is what happens when your custom server crashes?

This can be solved by

providing your own persistence, and deal with reliability
not worrying about it and launch 3 servers with a custom HTTP server that replicates three ways; spend more money.
don't care and re-replicate the entire data set, and have potentially non-trivial down-time.

All three of these options suck at some level, but 2 is where you will want to go. In the beginning thou, 3 is the best choice.

From a complexity standpoint, you can make your life easier by enabling your custom software to merge in bulk sets. This enables you to lazily run your algorithm as you collect a lot of data at once. I have found that this style of bulk insertions makes the third option feasible in many domains.

The nice thing about using CouchDB is that you don't need a schema. You don't need to "plan". If you need store data, then you just store it. Just give it a namespace and insert.

Please comment if you think I should write a book on CouchDB? I've done relational databases for years, and I've lurked in the CouchDB for a while. I'm currently building a web framework around node.js and CouchDB called WIN.

Cloud Coding

I just want to say that Twilio is an awesome company. About a year ago, I was seriously considering dealing with asterisk. Asterisk is perfectly good software, but it didn't make sense for the problem I had. I took a risk and quickly developed the VOIP feature for a product.

Developing and supporting a product with Twilio over the past year has made me think about the need for developing in the cloud. I'm at the base level of cloud development: terminal. I can log into a VM and get things done. I don't need scp, sshfs, or anything else to support my development capability. My brain, fingers, and ssh are all that I need.

Why do I need a VM to develop with twilio? Well, the monitor and way to debug a twilio application is to use the phone. The phone isn't easy to work with, so I need a public IP. I also need a way to watch the HTTP traffic, so I need the ability to grep logs.

Basically, fundamental Unix skills let you work with new technologies faster.

Saturday, November 13, 2010

A programming language is not just a tool

A programming language is tool to let you transform representations and solve problems.

A programming language is a contract that enables groups of people to solve problems together.

A programming language is a contract that enables frameworks to be built

A programming language is a way to expression thoughts

Wednesday, November 10, 2010

Now, having said that

Having given praise to my polyglot nature, I must condemn it.

One of the reasons I study programming languages is so that I can write my own. When I find a neat feature in a programming language, I think about how I would solve the computational problem to translate it into C.

Having aged, I'ved decided against writing yet another language. The world needs less languages than more. We ultimately only need 3.

We need one language to experiment and think about complexity.

We need one language to ship products.

We need one language to train beginners.

Will we ever agree which language fits these roles best? No. Now, I do my best not to amplify the problem.

Programming Language research will be over when all three questions can be answered with one language. In my opinion, the closest languages now to do it are either C# or JavaScript.

I'm betting on JavaScript.

The 3 Programming Languages you need to Know

Every good programmer needs to know at least 3 languages. Of course, I'm probably wrong.

I can quickly understand a programmer using the biases and stereotypes that I've built up over the years by knowing their favorite programming languages. When I read a resume, I try to classify the "why the programmer used the programming language" with these arch types and how I stereotype and use my biases to find what I want from a stack of resumes.

Happiness Language
This language is what you think in. This is the language that you wish you could use all the time. This is the language that you write your projects in. For me, this is OCaml (and now JavaScript although I'm integrating CoffeeScript into my universe). For many, this is LISP or Haskell. When I find out someone's happiness language, it tells me a lot about them.

If the language is esoteric or new, then they are passionate about computing.

If the language is mainstream, then they may be more sensible or practical about computing.

Hack-it-out / GTD Language
This is the language that contains everything including a kitchen sink. It is very mature and has a massive library base. With this language, you enable yourself to build quick services and command line utilities to help you out in a pinch. Anything that has already been done is at your finger tips.

If the programmer lists many languages, then they may be able to utilize all of them by building RESTful services.

If I don't detect a hack-it-out language or too few languages, then I suspect they are either inexperienced or too specialized.

Bread and Butter
This is the language that you can use to keep yourself alive when life hands you lemons. This is the language that you know just in case you need to hustle yourself to provide for yourself and your family.

If they don't have a bread and butter language, then they probably need some education on how to work in a team effectively.

Friday, November 5, 2010

Betting the Farm on JavaScript

So, I can't sleep and what am I thinking about? Type Systems...

Two years ago, I would have cursed the dynamic typing landscape as fools and miscreants because I'm a performance junkie. I like my code to run insanely fast. This is my typical salvo in any type system argument.

Now, I realize that I was just being closed minded and rather stupid (and even more so ignorant). Consider this JavaScript code.

function (a,b) { return a + b; }

What is its type? Well, if I may bastardize OCaml type notation for a second, then I would say something like:

(a',b') -> c'

What the fuck is c' ? The reason I don't know what c' is because JavaScript's + operator is overloaded in a poorly defined way that is plain stupid from a type inference perspective. This question combined with my lack of creativity ultimately lead me to the conclusion that JavaScript is doomed to win the performance argument.

Well, JavaScript is doomed until you run the program for significant lengths of time and realize that we have multiple cores that are going to be wasted on consumer machines. Let me explain this black magic that is bouncing around in my head.

Type Inference in JavaScript

Given a function like the above, I can't type it in a meaningful way unless I know the type of the inputs. However, when the function is evaluated I have the types and the return type can be established. If I utilize just in time compiling, then I can use type inference to produce types and compile a native version of the code given the correct type signature.

Type inference is fairly expensive (at least, the ways that I've solved it) but is fortunately 100% cpu bound. It seems perfect to fork off into a queue for some poor core to solve.

My suspicion is that this is what JaegerMonkey is doing, but I'm rather ignorant of its code. At least if they are not doing it, then I can add it in as a patch If I suddenly find myself with a wealth of time on my hands.

Future of JavaScript

I wager that JavaScript will enable C level optimizations by learning how to game the next generation of JavaScript environments. I look forward to optimizing my JavaScript libraries to have tight inner loops.

I believe in this future because there is commercial interest in enabling it.

This is why I'm very optimistic about node.js and the future of server side JavaScript. Sure my node.ocaml is three times faster than node.js now, but it is just a matter of time until node.js catches up.

It would also be a good time to take v8 or a build of JaegerMonkey and start building a game engine with it and provide a canvas/WebGL compliant C back-end.

Also, if I was back in academics, then this would be the perfect topic for a PhD dissertation.

Tuesday, November 2, 2010

API as your Queen (or how I feel in love with NoSQL)

I like to think and play around with software architectures. About 1.5 years ago, I was inspired by twitter. Or, more precisely, I was inspired by the twitter developer community and the developing mash up scene. I have also watched the growing number of NoSQL solutions. The two are connected.

Developers are performing joins in the application space (or they are not doing joins at all).

Joins are the most important aspect of the relational database model, and if we don’t need them at the data tier any more, then do we need a relational database?

The answer is "it depends", but you don't even want to start there. You want to start by building an API that enables the product to work. This is where I start as I can get a good flow of the ebb and flow of data.

With node.js, node.ocaml, or KayakHTTP, it is very easy to create a REST-ish server for enabling developers to build products against a back-end. You can also see how much of a back-end you really are going to need and which products you will want to utilize in the future.

Always start by building your product as an API.

When you do this, your API is now a NoSQL back-end. You would be surprised what people can do when they don't have to deal with SQL.

Green Computing; do more with less

It seems these days that anyone can do "real-time" internet these days due to awesome technologies like node.js or node.ocaml (shameless self promotion). Realistically thou, no one cares if your software is optimized in C using pointer arithmetic to squeeze an extra hundred requests per second out of a box. Very few care that one has taken the time to read the Black Book. Why? computers are really fast and many people have figured out how to make software scale out horizontally. Computing resources are way cheaper than developer mind-share, so where is the motivation for business?

Well, that's a good question.

This can be addressed with "green computing" marketing campaign. Unfortunately, It would be ultimately doomed out of the gate.

Thursday, October 28, 2010

If I were to write another IDE (or, being too productive and unwise)

So, two years ago, I wrote an IDE for a collection of tools that me and some friends used to build an awesome Ajax site. It was madness!

This is what it had:

Text editing (standard stuff like syntax coloring, color based find and replace)
Gui editor (box manipulation and property manipulation, pixel perfect editing)
Locking Source Code via Live Server to edit against
Cross Machine Synchronization (that is, drag a file from your local computer and it gets uploaded to the remote computer and changes are synchronized as the local file is changed = very cool for designers)
A big compiler that did a lot of grunt work (and a lot of automated pre-optimizations, so it automated evil)
100K lines of code.

And, all the IDE was used for was to manage the input to "the compiler". The compiler took all the css, validated it, turned all the parametric css (think Less except Turing complete and with an image library to splice and make buttons really easy to stamp out) into images and css, optimize all the image file names, optimize all the images (using optipng and some custom algorithms that removed superfluous colors that don't add much value), took the state machines and combined it with the JavaScript code, compiled the GUI into JavaScript, brought in the JavaScript kernel, and it would spit out one JavaScript file, build an RPC library for JavaScript to communicate to the PHP backend, one (or two) css files, and all the images (some in sprite sheets) per "application".

It was a revolutionary Ajax platform, but it also sucked for reasons I'm not going to discuss.

If I were to do it all again, here is what I would do differently (and kinda doing now).

Write the GUI editor as a stand alone editor that could be launched from command line.
Not use XML at all for the Gui file format; XML was a terrible choice. Now, I would use JavaScript to construct the object and transfer it to the host program as JSON which it could then trivial serialize out as JavaScript code that merges nicely. Plus, it would be hackable by a text editor.
Open source the custom template engine / (Or use Mustache)
Write the state machine compiler (the thing that made Ajax real easy for us) as a library rather than a DSL; open source it. (The closest thing available now is zef's mobl which I would use now rather than invent my own).
Not worry about image file names, rather just map out all images via CSS and then use a CDN.
Skip GUI editing entirely, it isn't needed and pixel perfect is over-rated.
In locking SCM versus distributed SCM, the key to distributed is designing your code and architecture in a way where merging is obviously the correct choice. If you design your code and how your code works together with other developers in a way where merging will work, then merging will work just fine.
Not build a custom file system/custom source control. I kid you not. I have a tool called fire that lets me very quickly build a document object model that serializes to giant XML files. I then used some old networking code to turn the DOM into a server where developers would checkout bits and pieces of the DOM to edit it. Every change coming from a developer would then backup the entire XML file. Once the XML file grew to 1MB, I was going through gigabytes of data per hour.
Open source the image tools (they are coming)
Open source the JavaScript tools (or use Google's Closure)
Open source the kernel (or use jQuery more effectively)
Open source parametric CSS (Or, add image manipulation to Less)

What I had built was a web operating system, and I plan to dump the code out there some day. If you want to dig through the code some time, then send me a shout out and I'll expedite the dump (even thou it sucks).

I would love to revisit the problems we were facing, but the fundamental problem with building a web operating system is the user/developer education debt is currently infeasible.

More over, I don't have enough energy to maintain that kind of code base without at least five long beards. As I age, I'm looking more to polishing things up rather than hacking things out. Building an IDE was a great way to learn and make mistakes, but it was also misguided as it prevented be from adding value to the project and the business. Although, I enabled my team to do marvelous things that are still not being done in the marketplace of products.

Learn about the project so awesome that I created an IDE for it: @killer startups. on mashable. Our youtube video (holy crap, we built this!).

Watching that, I realized how awesome the tech and idea was. Maybe we should revive it? I don't know.

Wednesday, October 27, 2010

On IDEs (A refinement of my 30 lessons)

In my last post on 30 lessons, the most contentious lesson was on IDEs. I would like to clarify a few things.

IDEs in general, are a good thing. For instance, Visual Studio 2010 is by far the best IDE on the planet IF you use C#. Eclipse is arguably the best (and I'm not going to hold myself to this, so feel free to disagree since I don't use Java anymore) Java IDE out there. For languages like C# and Java, you really do need an IDE; namely because they go beyond just editing and introduce language level extensions like re-factoring and code completion.

Now, that I've given praise to IDE land; I'll tear it down.

root

I deploy and primarily work with Linux stuff. For me, if you want sudo/root access at my company, then you need to be able to use either vim or nano in a highly proficient manner (you also need to be able to use scp). I believe that if you can't do this, then you are not a computing professional worthy of the power of root. Why? there is no IDE for /etc where all configuration lives. Sometimes, something is configured wrong for a new use-case and we have to do something non-trivial in real-time to a live production server. After-all, real men use root.

New Stuff

What IDE existed for Ruby when it came out? or JavaScript? or Haskell? If you find yourself limited by the IDE, then you are going to miss out on new technologies and the ideas they offer. At the end of the day, comfort with "primitive tools" enables you to work with new innovative technologies.

Productivity?

I'll concede every productivity claim made my pro-IDE developers. Here is the thing: what are you being productive at?

Nomadic Lifestyle

A long time ago, my computer crashed and I lost about 2 months of work. I had to reinstall windows and my IDE (Visual Studio 6). I made a lot of modifications to it to suite me. They were lost, and I was lost. I didn't like the feeling that I was dependent on customizing the world to suite me, so I decided then that I would never customize software again. If I needed to tweak anything, then I was doing something wrong. Having lived this way for a while, I can say that Life is pretty good this way. I spend more time on my work than messing around.

Although, recently, I must admit that I've changed the background in my Ubuntu install; that's about it thou.

Cluster Editing

I have a cluster of 32 computers running, and I use pico to code on it. I simply use clusterssh to edit them all at once. What IDE can do this?

fin

Ok, I'm done bashing IDEs.

Saturday, October 23, 2010

30 lessons learned in computing over the last ten years

In looking at the last ten years of my life, I realized that I've learned many things. Mostly about how wrong I've been, and how stupid I've been. So, having looked at the 80+ projects I've worked on in the past ten years (excluding coursework, current start-ups, and graduate studies), I have reduced what I learned to a blog post. (In bullet format no-less).

If you plan to write a programming language, then commit to every aspect. It is one thing to translate between languages; it is an entirely different effort to provide good error/warning messages, good developer tools, and to document an entirely different way of thinking. In writing Kira, I invented a whole new way to think about how to code, and while much of it was neat to me; some of it was very wrong and kinda stupid.
Geometric computing is annoying, always use doubles. Never be clever with floats; floats will always let you down. Actually, never use floats.
Lisp is the ultimate way to think, but don’t expect everyone to agree with you. Actually, most people will look at you as if you are crazy. The few that listen will revere you as a god that has opened their eyes to computing.
If you plan on writing yet another Object Relational Mapper, then only handle row writing/transactions. Anything else will be wrong in the long term.
If you want to provide students with a computer algebra system, then make sure they can input math equations into a computer first.
Don’t build an IDE. Learn to use terminal and some text editor. If you need an IDE, then you are doing something wrong. When you master the terminal, the window environment will be cluttered with terminals and very few “applications”
Learn UNIX, they had 99% of computing right. Your better way is most likely wrong at some level.
Avoid XML, use JSON. Usage of text formats is a boon to expressiveness and the fact that computing has gotten cheap. Only use binary based serialization for games.
If you plan to build an ORM to manage and upgrade your database, then never ever delete columns; please rename them.
Never delete anything, mv it to /tmp
Never wait for money to do anything; there is always a place to start.
Optimize complexity after people use a feature and complain. Once they complain, you have a real complexity problem. I’ve had O(n^3) algorithms in products for years, and it didn’t matter because what they powered were not used.
Text games can be fun too; if you want to write an MMO, then make a MUD. You can get users, and then you can use that to get traction to build something bigger. Develop rules and a culture.
Don’t worry about concurrency in your database until you have real liabilities issues.
Backup every day at the minimum, and test restores every week. If your restore takes more than 5 minutes of your time (as in time using the keyboard), then you did something wrong. If you can’t backup, then you have real issues and enough money to solve them with massive amounts of replicationg.
Never write an IDE; it will always be a mistake. However, if you do make it, then realize most people don’t know that silver bullets don’t exist. You can easily sell it if you find the right sucker; this will of course become a part of your shame that you must own up when you die.
JavaScript is now the required programming language for the web; get used to it. JavaScript is also going to get crazy fast once people figure out how to do need based type inference. Once JavaScript is uber, learn to appreciate the way it works rather than map your way of thinking to it.
Master state machines, and you will master custom controls. Learn enough about finite state machines to be able to draw pictures and reason about how events coming into the machine affect the state.
There is more value in learning to work in and around piles of crappy code than learning to make beautiful code; all code turns into shit given enough time and hands.
If you want to build a spreadsheet program, then figure out how to extend Excel because Excel is god of the spreadsheet market.
Write five games before writing a game engine.
Debugging statistical applications is surprisingly difficult, but you can debug it by using R and checking the results with statistics.
Don’t design the uber algorithm to power a product; instead figure out how to make a simple algorithm and then hire ten people to make the product uber.
Learn to love Source Control. Backups are not enough. As you age, you will appreciate it more.
Communicate to people more often, don’t stay in the cave expecting people will know your genius. At some point in your life, you will need to start selling your genius.
Realizing that every product that exists solves some kind of problem. Rather than dismissing the product, find out more about the problem the product is trying to solve. Life is easier when you can look at new technology and find out that it does solve.
Learn to be sold. Keep the business card of a good salesman. Sometimes, they actually have good products, but they are always useful.
You can make developers do anything you want. Normal users on the other hand are not so masochistic.
If you are debating between Build or Buy, then you should Build. You are debating which means you don’t know enough about it to make a sound decision. When you build, at least you will get something working before you find what to Buy and how to design with it.
You will pay dearly for being prickly; learn to be goo and flexible to the changing world. Be water, my friend.

If you got to this point, then good job. The biggest thing I have learned (and probably the most painful) in the past ten years is how to deal with my ego. Ego is supposedly your best friend, but it also your worst enemy. Ego is a powerful force, but it isn't the right force to use. While I admit that I've used ego to push myself in very positive direction, I think I would have been better off if I didn't as the side effects trump the pros.

Wednesday, October 20, 2010

last ten years (if I uploaded it to github)

I've gone through about 10 years of backups looking, and I thought about putting it all up on github. However, most of it would just suck. So, instead, I'll put up gravestones for each of them and not look back anymore. Then, I'll delete the files and be done with. Keep in mind, most of these are side projects. Only two are failed start-ups (Hurox and FileSharingAccelerator). This list excludes current strategic technologies I've developed at my current business.

kitchen	a game engine
grill	statically typed clone of JavaScript
juknow	a JavaScript to PHP converter
kapowie	an OpenOffice Spreadsheets to PHP converter
blaze	An animation language for doing periodic animations (very similar to these devices that do music looping)
iknife	a vector editor in C#
diffistory	an algorithm that would reconstruct a new version of a file using aspects from n-versions of a file
cosmic pipeline	a re-invented of unix pipes on Windows (decided just to switch to unix for that problem)
simpledepend	a php dependency analysis toolkit; using a standard procedural style, you could produce libraries and end-points that would have no includes and consume minimal memory.
scripture	a documentation language
istate	a network synchronization language; describe data-structure and then use them it would sync over the wire

butcherblock	3d object modeler and game editor
cauldron	3d rendering engine (pre-kitchen)
fork	2d gui editor/texture atlas packer
fryingpan	3d physics wrapper library (physx, bullet, ode)
zmlc/fire	heap modeler for structure serialization
inferno	better version of fire
bnj	Bayesian network tools in java 3.x
gem	graph editor and modeler
rpgengine	a player system with buffs
arcane works	a functional programming language version of excel
melchior	a new IDE for web editing with symbolic editing
alienautopsy	world of warcraft bot
convexbodyplayground	given any closed model, this broke it down into convex bodies
facebookwalk	crawl facebook and convert image emails into text/stalk students
researchfu	cms with mathml support
phpmathml	a mathml to png renderer
tutorcas	computer algebra system
mathgraphics	a point set visualization language
sliceem	a 3d model by using contour levels
gg+	gui graph + (open scene graph gui system)
mathmyway	a php-like language that enabled tutorcas to come alive on the web
monte carlo localization	construct a map and locate yourself (robotics)
jovian katana 1/2	ajax ide, gui editor (used to build hurox)
hurox	social marketplace
frame compiler	given a single image, split it up into 9 images for building frames (generates php code)
image2php	convert an image to php (useful for combining with given colors to change hue on the fly)
javascriptmaster	a javascript parser that enabled me to search code to find inefficiencies and problems.
evolution4k	a 3d space game
progressive mesh demo	an implementation of Hoppe's progressive mesh stuff
spherex	a physics engine where gravity was opposite (everything repelled)
zengine	marching cubes optimization where space is a grid represented by link lists of open and closed space
zterrain	yet another terrain engine
jove3	next generation ajax platform
spherebands	yet another collision detection engine
primgen	marching cube implementation for building models from math equations
lotr-risk	scanned all the game assets and made it work in a network environment
rt-canvas	a real-time (comet based) canvas toolkit for charting streaming data
tilemake	generates 16 images from 3 images using reflections and rotations
nova	javascript based router/topology for web page linking
gravity	a shared file system for keeping replicas in sync (with distributed locking)
mercurial	a simple deployment system (was completely ignorant of the scm)
kira	programming language targeting php with a built in orm
bench	a simple c# benchmarking library/language for testing how webservers react with different types of requests (large posts, small get, small posts) in a haphazard way.
j.encoder	a simple javascript template language (rolled up into jove)
simpledepend	a php comment dependency factor tool; wrap functions in comments and then wrap destinations in comments; the destinations get split into multiple files and the dependents are traced out included when needed.
particalgen	an image generator for particle system images (using point rendering)
scripture	a immature documentation system
diffhistory	a interface to diff that enabled a fs written in dokan to track a head and a revision log of how to go back in time
peoplemachine	a small scale crm
hi, my name is jeff	a book (did my own version of nanowrmo in march)
massshard	a simple orm built around sharding
bspcompiler	compile a mesh into a bsp tree
chef	an ide for my game engine (turned into jove)
factorg	small library for factoring gaussian integers
font2texture	convert a font to a texture map
quickneighborhood	quicksort using planes (designed for broad-phase of a large collection of particles moving quickly)
pointcloudalgorithms	3d algorithms on point clouds
spoon	agent modeling
poonix	turned microsoft virtual pc into a managed cloud
devsync	a suckier version of rsync in windows
glasseson	add glasses using opencv to a picture (client work that didn't pan out)
gold	transaction manager for s3
horde	simplequeueservice based video transcoding (for hurox)
magistrate	interface to kira's backend
filesharingaccelerator	a centralized file sharing service
wealth	my text based version of acquire (board game)
sudoku	yet another sudoku solver
csdc	first orm
mapgen4galciv	made a map generator for galactic civ
mathmod	a web language for math (second version of mathmyway)
auth	oauth before oauth
rage	rapid game engine (precursor to cauldron)
businesslogichelper	regular expression search for enforcing rules on how to produce and consume sql
defensemaped	a map editor for a tower game
notjavascript	a working javascript interpreter with very strict rules (and no newline detection)
sword	a text editor control (from scratch)
boxplane	a gui control
lamegeo	a c# geometry library (primarily for a future reverse geo-coder)
libquest	asynchronous quest manager (how wow quests work)