Software Concept Design & Modelling

Have you ever sat at a design review meeting and felt that something is wrong in the design?. Often you cannot elaborate what is wrong, and then after a few hours, you end up writing an email which reads more like philosophy than software engineering. You are a great software engineer, and an asset to your company, but in the interview, you struggle to explain why. They take you in skeptically, and after a few days, they are blown away. Welcome to the world of SCDM!

Tuesday, March 8, 2016

Introducing the Big P Notation P(N)

An excellent article on Big O Notation:

https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/

I have realized that with lowered cost of computing and with computers in almost all desks in the world, the focus may have to shift to the Big P notation P(N) as I would like to refer to it.

Back in the old days when computers, memory and disk were super expensive, it was necessary to determine the maximum complexity of an algorithm and measure it so that we can optimize the algorithm. Definitely, there are cases where we still need to do this like when scaling upto a million users, etc.

However, today focus might need to shift to the P(N) notation which has become more and more important over the years.

If O(N) refers to time complexity, based loosely on how much CPU time an algorithm will consume, P(N) refers to people time complexity - which is based on how much human time is spent on creating, maintaining and optimizing an algorithm.

If 3 people work on the algorithm it becomes P(3). If you work on an algorithm 3 times as part of the development process, again you get P(5) [Dev - 1, Code Review - 2 people, Unit Tests - 2 people]

People are the most costly resource for computing today. I refer to the hundreds and thousands of well paid programmers. Any algorithm written today needs to consider the cost benefits as well.

If I can write an algorithm which is easy to understand, digest & maintain even by the most cheapest programmer I can hire, that might give you more bang for the buck than to worry about how much CPU time the algorithm will consume.

Sometimes, there are cases where a very computationally intensive O(N) algorithm is actually less preferred because of the high P(N) value.

The best case I can think of is an engine I wrote around 2004 which used reflection (very time consuming on the CPU) to validate DAL code before it hit the database. It took time to run, but it would raise many problems before the code would even hit the Oracle database thereby reducing development time significantly.

I would say it has a very low P(N) value. P(5) - I wrote it once, got it code reviewed, got it unit tested, and it "just worked" although it was relatively slow.

Because of some misguided notion of performance, we removed this algorithm and lost all the validation logic, and each person had to write the validations manually (sometimes missing many validations). Hence the P(N) value became P(N raised to 5). 5 counts of human effort for every person in my team.

So we reduced the O(N) value significantly, but massively increased P(N) value leading to a very unstable, though fast application.

Summary: Programs were written for people and not the other way around. Maybe, now that computing costs have gone down significantly, and people costs rise with inflation, we need to consider P(N) more than O(N) for many cases.

Monday, March 7, 2016

Lesson 8: No solution can be generalized to be always true in all situations. Depends on the system design.

We had a recent situation where we were going through a case where when the code reads from a value stored within a ConcurrentDictionary, this value could be stale.

We had an interesting discussion why this is ok. And more interesting about why we have to make an update to this value to be thread safe.

(1) Why Stale Value is OK in some cases

This is ok because in some cases, the design of the system is like a traffic light - some vehicles may go even when the light is orange, some may flout the "rules" and still the system can function.

If writes are consistent, reads can be stale in some systems. The critical reason why reads could be stale is if the stale value has no permanent effect on the system, and the cost of keeping it not stale is too expensive.

If the stale value caused permanent changes then it is not ok (persistence, database transactions, etc).

If the stale value is a signal which is read often and the effects of it being stale are very temporary in nature, for performance reasons, it is ok to read the stale value even while updating it in the ConcurrentDictionary.

I see this as a case where computers and programmers do not do well in grey area scenarios. Not all answers can be 0 or 1. Sometimes, grey areas exist and provide optimal solutions to real world problems.

(2) Why the update itself should be thread safe

If this ConcurrentDictionary stores a number, and we want to update the number, we want to keep this operation thread safe because if it is not, then depending on the number of threads accessing the write at the same time, the value of the number could careen across wrong values, thereby leading to a permanent bad state.

This is similar to the traffic signal entering a permanent state of malfunction where only some lanes turn green all the time for no particular reason except software error irrespective of the traffic load at each side of the signal.

We want the number to be correct on a permanent basis, but allow the reads to be stale sometimes for performance reasons.

A great way to do this is to use the AddOrUpdate method which I have not verified myself. But it looks like the below may work fine:

concurrentDictionary.AddOrUpdate(key, value, (key, oldValue) => oldValue + 1);

So locking the update in this case is not really being inconsistent when following this philosophy.

Wednesday, November 11, 2015

Software Engineering is dual natured: both a particle and a wave

The title of this post maybe pretty obtuse and hard to make sense of. I'll try to explain.This post is to disparage everyone who evaluates software engineers only by checking to see whether they know how to sort an array fast or ask algorithm questions.

The fundamental reason why this is a bad idea is because similar to most natural phenomena and physics in particular, we are making the very wrong assumption that because someone knows everything about the building blocks of something, that person can be a great engineer building applications.

This is completely wrong. And I'll explain why...

In the beginning there were Newton's laws which could explain most natural phenomena at the physical level (laws of motion, etc). This is what I call as the "wave".

Later, it was found that these laws break down at sub atomic levels (basic building blocks). This is what I call the "particle".

Essentially, physicists realized that two completely different sets of rules defined what happens at the sub atomic level and what happens at the larger levels of physical phenomena.

Boiling down the essence of this means that just because you have a deep understanding of how something works internally, you cannot use those equations in your daily life. For that, you can rely on much simpler set of laws like Newton's laws.

Of course, you can do everything at the sub-atomic level all the time, which would mean that you spend all your time on figuring out how those very complex equations map at the physical level.

How easy would it be then to do anything practical fast? - Try to model the path of a ball thrown at 45 degrees using quantum mechanical equations...

This is also why these guys who know the internals very well cannot produce any tangible results to solving big problems. They are applying particle physics in situations where they are supposed to use Newton's laws.

I may have mentioned these very thoughts in some previous posts. Why I am writing this post is because I had a concrete example of this in my recent work.

The particle part of software is the underlying data structures and algorithms internal to a high level framework like .NET and C#.

The wave part of software is when we build complex software to do some workflow using these building blocks. This is more similar to a process in Applied Electronics & Instrumentation more than linked lists and sorting algorithms.

A very distinguished engineer tried to solve the out of memory issue using the particle methodology. Use a more efficient data structure to reduce the memory usage. He tried for months and failed and the software could never handle more than a certain amount of data.

When I saw the problem, I approached it from the wave perspective. Irrespective of the data structure used, or the sorting algorithms used - the problem was because of the way the entire process worked. It was similar to a factory where the pipes feed chemicals into the plant to manufacture something else. The input flow needs to be regulated to the consumption. If this is not done, the process will overload - run out of memory.

I fixed the problem which had remained open for 8 years. I used no sorting algorithms or specialized data structures. Now the problem is completely resolved.

It is true that understanding the building blocks of software can help in rare situations where the higher level code does not work the way you think it may - because of how it is coded internally. However, this is an extremely rare situation. Most software projects fail a lot before it reaches that stage. Just think about how buggy Android is - and it is built by people who are from the top most universities and have the most degrees. The reason for this is firstly their arrogance that they will only hire people from some top colleges to build software which will be used by everyone. This is also because they think too much at the particle level, while they should actually be thinking at the wave level.

The best examples of this kind of follies are: Google Wave, Android and even the most horrible to use Gmail.

More Data

It is always a struggle to provide a more clearer explanation of what I am trying to say here. Let me give you a few more examples which clearly show how a system is not the sum of its parts.

A factory consists of many parts mechanical, non mechanical and electrical all of which work together for the most part to produce the output. The behavior of the factory cannot be calculated as the sum of its parts because when parts come together they interact in new ways which may not always have been foreseen before. Even if it did, when we add more parts and go on, eventually we reach a point where we have to consider the entire system in a holistic manner rather than try to optimize atomically.
For Software Engineers who love to go back to the building blocks because it allows them to understand "better" how it works - why not go back to the hardware level? No software runs in isolation. It always run on top of some hardware which sometimes changes the behavior of the software. So, if we want to "really", "comprehensively" understand how best to optimize the software or how it works, perhaps, we should first stop at machine language and optimize at that level with compilers, and then go deeper into how the hardware handles the instructions at the chip level and fix that as well.

If this seems nonsensical to you, then the whole talk of data structures should also. Because all structures are static and there is only so much which can be done at that level. Software when running in a system consists of rich interactions between live instances of objects with complex behavior. This is best optimized as a whole process, rather than as the sum of its parts.

Friday, July 24, 2015

Using a Design Pattern does not mean do not design the solution

This is an idea which I have explained a lot to others previously. In that example, I designed a workflow system and compared to an existing design pattern. I found that my design did not have the bug the design pattern had - which was mentioned as a bug in the design pattern. Still, we find scenarios where people think that if you do not know a design pattern that is some indicator that you are not a good developer.

It is not how much information you know - it is about how you use the information you already have which makes a great or so so developer. As Einstein said: "The true sign of intelligence is not knowledge but imagination." - I saw this in my kids kindergarten wall.

Let us go really deep into the implementation of a Producer Consumer design pattern as they seem to call it which was full of holes. This time I can write at length about this because I just finished fixing a problem which nobody else was able to fix so far.

The problem is that we have a set of threads which are spawned from a set of timers. These threads share the load of performing different work items. One such work item comes from a queue which contains two types of items:

1) Which generates an item which can be consumed directly.
2) Which generates an item which generates more items - of both types (producing items and/ or items which can be consumed).

It looks straightforward, but this will not work properly. This is because these items which get into the queue can come in any sequence - which depends on the dataset from which they are generated from. So, we have a very complex system whose behavior is very dynamic and dependent on the million different combinations of the input dataset.

This system would eventually end up either sleeping indefinitely or running out of memory because of the following reasons:

1) There is no control to regulate the fact that one producer can add so many items into the queue that we run out of memory.

2) There is no control to regulate the fact that because the queue can contain items in any sequence, we don't end up in a scenario where there are only producers and no consumers so we again run out of memory.

3) If we do try to sleep the producer temporarily to control the memory usage, we got to do it in such a way that, there are adequate consumers to consume what is in the queue, and some consumers are available always to get the enqueued items below a limit.

The more I encounter real-world applications of software design patterns, the more I see badly designed software which does not work properly because the developer ceded completely to what is written in some book somewhere.

I see bugs of the worst kind in both design patterns I've seen in the field. This makes me really wary of the next time I see a design pattern in code and someone tells me.. oh.. we are just using the XYZ design pattern here, it takes care of everything you see...

Tuesday, March 11, 2014

Design is not code behavior and may not even be how the code works in reality

People sometimes use the word software design in a promiscuous manner. Pretty much anything and everything fits into "design". That is not really true. In reality, design is a lot of things and none of them may directly relate to the code - it influences the code, rather than having a direct relationship. I say this because the design may say X and the code may wrongly do Y. Then it is the flaw in the code and not in the design.

What really comprises of software design?

Design is really a set of ideas which are noted down when thinking about the implementation.

- The set of assumptions which you consider when thinking about the implementation. This is the world view of the code. If a new assumption comes along, that can make or break the existing design.

- The goals of the project - "ultimately when all is said and done, this is what the implementation will accomplish".

- A good design may mention stuff which goes tangential to what a developer may think. A good example is in a recent project "by design", the assumption was that the database rows cannot be fully trusted because they could be out of date w.r.t the file system - so they database rows can only be used as an indicator that something may need to be done. The final word needs to come from the file system.

- A design artifact may not always be re-discoverable by reverse engineering the code. In most cases, it may take months of looking at the code behavior to assume that "one of the original intents of the coder may have been to do X and Y". Even then we cannot be fully sure about it. Hence, it is always important to have a design document, because without it, we cannot add or modify to an existing design without risk. The code could be wrong w.r.t the design document, but still it is important to know whether your thoughts are in alignment to the original designer of the code if the major project goals and assumptions remain the same over time.

- A design always considers the high level interactions between the various entities in the system. This need not always be software modules which are written by the developer implementing the project. It could be third party stuff like OS, IIS, file system, database, etc.

- The two minimally required UML artifacts for design are Use Case Diagrams and Activity Diagrams. UML makes it easier to go about designing a complex project where we cannot think of everything "at once". It helps us split the work in a structured fashion into small "bite sized" pieces which are easily understandable and digestable.

Friday, January 24, 2014

Using the concepts of nature in software

This post came about when my wife told me that she was playing a certain game in iOS and nobody she knew could go past a certain level in the game. She tried herself and her nephew also tried and failed. My son who is 4 years old and knew nothing about software or games was able easily go past that level. The reason he was able to do that was because he approach was random, and when he tried a burst of random approaches with no particular logic, he was able to break the level.

There is a software analogy to this as well. When any software gets mature, we tend to frown upon major changes to it, or someone "breaking the rules" and writing code on it. Every architect has been on this side at some point in the career. I have done this myself in cases where I knew for a fact that this would break something. I have also vehemently objected to a requirement which would cause me to break a lot of code while trying to implement it.

In software as in every other field, we tend to frown upon failure of any sort. If we worked on a project and it failed after 6 months, or was unable to attain what we wanted it to attain, this is considered a failure as well.

Is it really?

Today I feel the opposite. Everybody is different. When we tighten the rules and put a straitjacket around a framework or software, we are basically preventing any kind of out of the box activity from happening. We know this is difficult to do, or it breaks something - but how do you know that for sure? - maybe you failed to do something in a certain way - someone else maybe able to do what you were unable to do.

How would you know what was possible, if we prevent someone from trying out a different approach?

Sometimes, even when a different approach "fails", a lot of different things come out of the effort improving the code base in ways we never thought possible - because nobody every looked at the code from that particular perspective before.

Perspective is everything when it comes to large code bases. Because beyond a certain size, it becomes too much for one person to internalize. At that point of time, perspective is the only thing through which you can gain understanding of the system. When it changes, the understanding changes, and even evolves. You find over time that what you thought impossible yesterday is very much possible today.

And none of this is possible, if you never tried out a new approach, or went down the risky path.

A concrete example of this is when I had to add a new mode to an existing software which made it more reliable, but also quite slower than before. I tried out a very different approach to make this as fast as possible - basically indexing files beforehand, instead of doing it when the process was running. This was not enough - so I had to go around and make other parts of the code run 10X faster than before, and utilize multiple cores. The end result was that even if in the new mode, it was as fast as the earlier mode or maybe a bit slower, if you ran in the old mode, it was 10X faster.

If I had never tried the new mode saying it was too difficult to do, I would never have improved the old mode, and found hundreds of small but important things which made the software better in the long run.

How is this related to Nature?

We write this simple, cave-man software and prevent others from trying out new things, or worry about risk. But look at nature which has created plants, birds, mammals and even the human brain. We have reached this stage in evolution because Nature tried every random permutation and combination based on the given parameters and optimized along the path which is just right to reach the current stage today.

It tried all approaches, all risks. Entire species dies, new ones arose, but it still went forward and continues to move forward.

Random approaches and failures are necessary as they provide us with as much information as the successes do. Only through learning both success and failure can we really move forward. Infinite patience is required to find the right solutions for difficult problems. Solutions are not just found through brute force human intelligence or infinite CPU power.

I think a major failure of current software is that we build something and we tighten the rules as we go along, and then new ideas have to create something new from scratch. We never continue to build upon the same thing, evolving it over time. So, we are forever going in a circle, reaching nowhere because of this. All logic is useful, whether it was written in FORTRAN or PASCAL or C++.

Discussing these ideas with others, they mentioned that this might be the reason that in some companies on the west coast, they tend to keep only younger employees and not experienced ones. Because, even though experienced employees are technically competent, they are highly resistant to change and unmalleable to consider new approaches and risky techniques. So, companies keep younger employees who don't know enough not to take risky approaches.

Whatever be the case, I can't generalize that older employees are always resistant to change. They have strong opinions for sure, but I have worked with very constructive senior developers who embrace the fact that we are building something and every time, this has resulted in massive success. On the other hand, I have also worked with very capable senior developers who are very resistant to change and very nonconstructive in terms of showing only negativity when faced with having to build anything new. There have been teams which are highly "aligned" to resist change, even building frameworks which actively straitjacket any new approaches to code. A particular instance was a service architecture where the code would error out if you returned an object with two properties instead of one, or you had to duplicate a class in two assemblies deliberately just to be able to write the service layer code.

Friday, October 11, 2013

The Code Paradox

It is kind of interesting for me to note that when the code is really bad and very buggy, making small changes to it, does not affect its overall quality a lot. In many cases, defects combine to get something working together in a certain way.

What is more interesting is that when the code is quite good, well written, modular, reusable making small changes to it can actually break everything. So, it affects quality a lot more, and small things cause more stuff to break especially because of reuse.

This could be entropy. The natural state is disorder and it stabilizes there.

When we try to bring order, that is an unnatural state. So, I have seen many, many cases where the code is really good, and someone unfamiliar with it makes a slight change and it keeps breaking all over the place.

The real challenge here is - how do we write software which has entropy by design but still works well? - because I believe biological systems which work well with the most diversity and population are like that.

Is it not weird that even in software, which is totally man made, this basic law of entropy remains true? - just shows that in a system where entropy is the law, even for seemingly autonomous entities (like us) who create new things - the things we create however abstract they are also stabilize at entropy.

Embrace the chaos?
Embrace the diversity of thought which makes a single code-base have both good and bad code?

Definitely something to think about.