Archive for the ‘programming’ Category

Friday, November 21st, 2008

With the Thanksgiving break finally upon me I now realize that I need to be a better behaved blogger. The nice thing about having the blog on my website is that I’m pretty committed to keeping it going despite my occasional/frequent droughts.

It’s been busy for me at school and elsewhere. I’ve started regaining my social life in the midst of my more intense studying aspects in the hopes of finally becoming the well-rounded individual. Also, I’ve finally returned to the developer stage with some interesting projects.

I’ve pretty much abandoned my trivial web projects in favor of a few more serious research and other projects in Python and C++. This is the third time I’ve been programming in C++ and the first real exposure to Python. It’s been fun dealing with interface/facade ideas in these languages instead of using the familiar Java concepts. It’s also interesting to try plugging in these tools into some fun new libraries.

I must say that I don’t despise C++ as I used to. It’s really not much different from Java except for the exaggerated power you feel from managing your own memory.  When I switched over the first time I can still remember how uncomfortable I felt with the lack of the amazing Java library and documentation, but you can overcome that.  The code can obviously be much more elegant at times as the magical pointer garbage can be pretty interesting. The biggest concern is that your code doesn’t start looking like something written by Escher.

Python is fun and different, but I’m not sure that I’m used to it yet. When working with C++ and Python at the same time you start to realize how uncomfortable that void of high level management is. I really have no idea how long it’s going to take me to get really comfortable with the language. You know what I’m talking about; there’s that point where you feel like you’re working on a code assembly line and the code colors and connecting shapes are all in arm’s reach.

The current situation with Python is a lot like that episode of Futurama where Bender is floating through space. A small species of people form on his body and evolve into a fully civilized species and he plays god for a little while. Then he screws up and causes a nuclear war, killing off his little world.  He then meets up with a god-like galaxy and they start talking and figuring stuff out. I think I just encountered god after playing around and blowing up my own mini-universe. Soon I should a good approach for creating life, but right now the options are just blowing my mind.

I’ll keep you up to date and I hope to have some descriptions of my tools and projects soon. Until then, I guess I hope that you’re staying healthy and looking forward to a happy Thanksgiving.

Friday, August 1st, 2008

One of the biggest, although non-normalizing, issues that I’ve struggled with is that of the natural versus surrogate keys. Now, I didn’t know what these were a few years ago so on the off chance that you are me in that time period, I’ll briefly explain. A natural key is a unique/primary key that is made up of strictly information that is logically connected with the entity you are storing in the row. The tried and true example is that of a social security number or a detailed title, basically anything that is part of the entity that makes it unique. What makes things hard is choosing the right data to make that storage easier, obviously the same FIRST_NAME occurs in more than one PERSON so you’ll have to use another or more rows (ex. FIRST_NAME, LAST_NAME, BIRTHDATE, BIRTH_PLACE). But it’s a lot easier to use a surrogate key, most often an autoincrementing integer, to just represent each row, because you know that will be distinct.

When instructed in my undergrad database lectures I was told, flat out, to never use surrogate keys. Fortunately, the man I learned this from can now be properly labeled as a “hack” when it came to practical computational theory. He would literally grab his shirt collar and adjust it uncomfortable while making strange, wounded-animal-like noises every time a programmer stuck an extra method call in a constructor or used a surrogate key. His goal was to make us flawless when it came to design practice, but he really just made us terrified that if we did something outside of his strict guidelines he would leap on us with his red pen. Or that the noises he was making was part of the summoning act to bring some giant bird to come and tear our typing fingers off.

Well, we learned to do a pretty good impression of him, and we also learned that what he was saying wasn’t exactly flawless, or even reasonable. The reality of the matter is that it’s OKAY TO USE INTEGER PRIMARY KEYS. First of all, it’s not the end of the world to de-naturalize a piece of data. Plus, not only is it okay, but there are significant benefits in speed and tool interoperability.

The only argument in existence against using a surrogate key is that you are essentially defining rows as being defined by a simple number that has nothing to do with the actual data. Sure, this is a loss, and it’s going to make comparisons between multiple tables a whole lot easier. But what about complex tables that aren’t easy to represent in one or two fields. I remember that same “E” professor made us, for a practical software course, store foreign key relationships to a table using 3 varchar fields. Since we had 2 mappings to said table, this meant 6 varchar fields defining the relationships instead of 2 integer fields that would have been much faster for our thousands of rows and much more readable.

Beyond that, do you know how to fix the entity integrity issue? Really, you can just add a unique key constraint on the same fields you would have used for the natural primary key. By using this method you essentially recreate the exact same restrictions on a faster indexed table for which it is much easier to define relationships.

CAVEAT: But it’s not right to say that a database designer, especially a green one, gets to use surrogate keys off the start. Why not? Because it’s not the best case. To be honest, I think this whole world would (www) be a little bit better off if we could make the initial model work. I will continue to use natural keys whenever it is convenient for the model, but the difference is that I will make a conscious decision about which one to pick.

Call me a glassy-eyed, idealist youth, but I honestly believe that you can test a good programmer not by what he knows but by how he evaluates his options. It’s a fine balance between diving off the board before checking for pool water and hourly, broad-field pH tests, but it’s the one that will make you a solid programmer/architect/designer.

Friday, August 1st, 2008

My developer friend, Scott Sloan, has been working on his DB class for some time now and it’s quite a useful tool for doing queries simply. Part of the ongoing movement, between him and myself, is designing a rock-solid set of framework classes that will aid in rapid PHP development. Of course, his project has some stuff to show for it while mine are still awaiting a beta release. But I really do love this class and others from Scott, and I use them in Droplet and contribute as I find need to break them. :D

The most recent change to this DB class was the addition of exception based error handling, making database connections an entirely simpler creature to deal with. This class does a lot of abstraction, and up until now it’s been virtually impossible to debug it, or any db interactions, without stack traces. Unfortunately, every silver lining comes with a dark cloud.

This cloud, not dealing with Scott, happens to be PHP’s development traditions. Just like most functionality, exceptions are good, maybe even necessary, but the implementation of them was very poor from a security perspective. The fact that you can’t disable the printing of the stack trace from an uncaught exception is inexcusable at best. But I can guess how that conversation went:

-Should we have an option to disable stack printing (specifically of method parameter values) for select Exceptions?
-Why?
-Well, maybe they wouldn’t want the end user to see what was passed in a particular method?
-But you just catch the exception!
-But what if you don’t catch the exception? They see everything!
-Are you suggesting that we write code to protect programmers who are breaking rules? Plus, all production servers have warnings/errors disabled for output, unless their people are idiots.
-Oh….I suppose you’re right.

Here’s my beef: you need to plan on some mistakes. No offense, but haven’t you ever forgotten to catch an exception? Since this is a scripting language, you don’t have the parental compile time warnings or blocks, like Java, to say “Yo, you didn’t tell me to do anything when this code freaks, and believe me it can. Fix yo’ code, homes.” (I assume that a PHP compiler would use a similar compile error vernacular). The reality is that there are many production systems that don’t hide warnings/errors, and even if they did you wouldn’t want password information getting written to a log file whenever you fail to connect to a database.

The key here is a “who needs to know” system, just like I talked about in my blog entry about keys. There should never, ever, ever,ever,ever be a way for the language to “accidentally” print a system password to a user. Even if the developer is a complete idiot! If he passes a password or hash into a function,  he’s not going to think about what would happen if that function would error. He’ll fix that when it happens. It probably sounds like I’m defending the Cro-Magnon programmers of the world, but I’m not…..really.

An even worse PHP prospect is the ability to dump a class with private class values onto a page with one motion (i.e. var_dump). I know that these are all helpful tools in debugging, and that private variables were never meant to be a security constraint in this fashion, but the way they did it DOESN’T MAKE SENSE!! That function should not, I repeat ‘NOT’, be able to print private access variables unless there are appropriate accessor functions. That’s what object oriented design is all about.

I wouldn’t be so hard on PHP if it weren’t for the fact that these examples are the ones that give PHP a bad name. When someone’s data gets stolen on a PHP site it isn’t that PHP is a bad language, it’s that the programmer wasn’t thinking about that specific hole. But there are a lot of spots where developers can not know the rules, or forget a step and accidentally release loads of information to an eager hacker. As part of the group defining how the tool gets made, we need to be careful that the tool doesn’t have a cigar cutter that’s big enough for our “baby developers” to fit their arms in.

Anyways, code smarter not harder!

Tuesday, July 22nd, 2008

In a recent article on the blog I link to most, Jeff Atwood took the time to discuss database normalization and its rather blurry scale of reason. As usual, I found myself agreeing with Jeff and questioning a few of my own practices in database design. This is something I encourage all my peers to do, especially those who don’t spend much time contemplating the rules by which you program.

Now, when I say that I agree with the fact that normalization logic is blurred, that doesn’t mean that I abandon it. Quite the opposite, I’m actually kind of a fanatic about getting information nomalized, at least up to about 4NF….because beyond that I think it gets a little masonic, if you know what I mean. But I can also see the reasons that one has for denormalization.  Tools are the biggest one that I deal with because many applications from Oracle and reporting tools cannot deal with complex joins, intricate keys, or other common tricks. There are also speed issues that would get you to commit flattening and other acts of database treason.

The big deal here isn’t the fact that these practices are right or wrong (which they often are…clearly), but rather that there is a lack of openness on the topic in the CS curriculum and among the ‘elite programming circles’. When I got to my first serious development job I saw that people were designing tables with an apparent ignorance to the “Laws of the RDBMS”. It didn’t take me any time to throw my nose up and strut by like a upstart schoolgirl passing a pack of teenage smokers.

The reality is that young developers can’t come out of college thinking that they hold the keys to the ‘Normal Universe’ just because they took a database class or two.  No, you cut your teeth on your first 5 gut-wrenching projects, and at that point you earn the rights to sidestep a database design or two, but not before. It’s not black and white and it certainly has little to do with mathematics that you may have been taught. I’m not saying the math and theory isn’t important or even necessary to get to that point. I’m just saying that your decisions on who to marry comes less from the math knowledge that adding 1 and 1 returns 2, and more about the experience you gain in what works and what doesn’t after that first step.

Now that I’ve compared database design to marriage, theoretical mathematicians to freemasons, and myself to a snobby schoolgirl, it’s time to say that this isn’t the end of this. In fact it’s just the beginning. Hopefully I’ll have a few posts for you on breaking down the illusions that I’ve struggled with in the real world of database applications.

*Two things: 1) Normalcy, especially in the physical universe, doesn’t exist. 2)Just because compared DB laws to smoking and then some laws are breakable doesn’t mean you should go smoke. If you’re of age and you want to fine, but it’s a gross habit folks. Just like ignorance of design patterns. And you don’t want to be loke those freaks, do you?

Wednesday, July 9th, 2008

A new section of my personal website has just been opened and will hopefully see a lot of growth in the coming years: the Open Source Section. While I’ve been contributing for a while to a colleague’s open-source projects, this is my first time creating my own projects from the ground up.

I’ve described my reasons for opening this section on the home page and a few others, but it’s really important to hear. Actually, I’ve been really hesitant to make my code open source for a while now, but I realize that it’s not acceptable. After about a year of thinking about it and finding out what my problems were, I’ve decided that it’s in my best interest to break out and try it. That’s not to say that I didn’t have valid reasons to hold me back.

My first concern was credit. While I’m not terribly attached to my code, like other programmers I do feel the need to be at least partially recognized for the work I’ve put in to it. To be honest, it’s from a long line of experiences I’ve had where people have been unsupportive of hours of work or ideas that I’ve created. I’ve grown to hate the idea of someone devaluing or stealing my work. Unfortunately, this has lead me down the path to the Dark Side(see also) of proprietary ideas and opening myself to other opinions.

The fact of the matter is that I’ve benefited from other tools in my work but I’ve never credited the developers then. But there’s a reason why that’s okay. It’s not about the programmer(s)…it’s about the code. While every programmer wants to make a life out of this, it’s not appropriate to take every little action and demand credit for it. If I write something pretty nice, which isn’t quite likely, maybe someone will use it and tell a friend about it: “this is pretty nice“. If I need credit, I can my potential employer “I did this” or “I contributed to this”. But notice that the pride is in the finished product and you can enjoy your work on it…not on the casual afternoon of programming that you have the urge to brag about. I’ve been guilty of this a thousand times over.

The other issue is code quality. While I like to think of myself as a pretty solid programmer, I’m definitely not comfortable with showing my code off to the world. Actually, my first project, Droplet, saw a complete redesign the day before I put it online because I was worried about what my peers might think of my half-decent code. After reading Jeff Atwood’s post on programmers who fear outside eyes, I realized that I was part of my own problem and that this was a good way to break out of that.

In any case, I hope that you check out the open source website and try out Droplet. I really have enjoyed using the software and I hope you do the same. It’s not pretty, and it’s not a money-maker…but it’s a nice tool that you can share with everyone. Please share your comments and suggestions, and write a few lines of code yourself…I’m still waiting on my first user-submitted patch!

If Droplet isn’t your cup of tea, I have a few more on the way including a couple PHP classes and a fun few items in Java.