just enough architecture - Risk-Driven model

The concept of failure is central to the design process, and it is by thinking in terms of obviating failure that successful designs are achieved. ...
 
Although often an implicit and tacit part of the methodology of design, failure considerations and proactive failure analysis are essential for achieving success. And it is precisely when such considerations and analyses are incorrect or incomplete that design errors are introduced and actual failures occur. (Petroski, 1994)

How much design and architecture should developers do?

There is much active debate about this question and several kinds of answers have been suggested:

  • No up-front design. Developers should just write code. Design happens, but is coincident with coding, and happens at the keyboard rather than in advance.
  • Use a yardstick. For example, developers should spend 10% of their time on architecture and design, 40% on coding, 20% on integrating, and 30% on testing.
  • Build a documentation package. Developers should employ a comprehensive set of design and documentation techniques sufficient to produce a complete written design document.
  • Ad hoc. Developers should react to the project needs and decide on the spot how much design to do.

The ad hoc approach is perhaps the most common, but it is also subjective and provides no enduring lessons. Avoiding design altogether is impractical when failure risks are high, but so is building a complete documentation package when risks are low. Using a yardstick can help you plan how much effort designing the architecture will take, but it does not help you choose techniques.

The risk-driven model is a reaction to a world where developers are under pressure to build high quality software quickly and at reasonable cost, yet those developers have more architecture techniques than they can afford to apply. The risk-driven model helps them answer the two questions above: how much software architecture work should they do, and which techniques should they use? It is an approach that helps developers follow a middle path, one that avoids wasting time on techniques that help their projects only a little but ensures that project-threatening risks are addressed by appropriate techniques.

What is risk-driven model?

Can be summarized in 3 steps:

  1. identify and prioritize risks
  2. select and apply a set of techniques
  3. evaluate risk reduction

You do not want to waste time on low-impact techniques, nor do you want to ignore project-threatening risks. You want to build successful systems by taking a path that spends your time most effectively.

Risk or feature focus

The key element of the risk-driven model is the promotion of risk to prominence. What you choose to promote has an impact.

Teams that focus on features will pay less attention to other areas, including risks.

Logical rational

We identified A, B, and C as risks, with B being primary. We spent time applying techniques X and Y because we believed they would help us reduce the risk of B. We evaluated the resulting design and decided that we had sufficiently mitigated the risk of B, so we proceeded on to coding.

This allows you to answer the broad question, “How much software architecture should you do?” by providing a plan, i.e. the technique to apply, based on the relevant context, i.e. the perceived risks.

Any developer can answer the question, “Which features are you working on?” but many have trouble with the question, “What are your primary failure risks and corresponding engineering techniques?” If risks were indeed primary then they would find it an easy question to answer.

Technique choices should vary

Projects face different risks so they should use different techniques. Some projects will have tricky quality attribute requirements that need up-front planned design, while other projects are tweaks to existing systems and entail little risk of failure. Some development teams are distributed and so they document their designs for others to read, while other teams are co-located and can reduce this formality.

When developers fail to align their architecture activities with their risks, they will over-use or under-use architectural techniques, or both. Most organizations guide developers to follow a process that includes some kind of documentation template or a list of design activities. These can be beneficial and effective, but they can also inadvertently steer developers astray.

Here are some examples of well-intentioned rules that guide developers to activities that may be mismatched with their project’s risks.

  • The team must always (or never) build full documentation for each system.
  • The team must always (or never) build a class diagram, a layer diagram, etc.
  • The team must spend 10% (or 0%) of the project time on architecture.

Such guidelines can be better than no guidance, but each project will face a different set of risks. It would be a great coincidence if the same set of diagrams or techniques were always the best way to mitigate a changing set of risks.

Risks

In the context of engineering, risk is commonly defined as the chance of failure times the impact of that failure. Both the probability of failure and the impact are uncertain because they are difficult to measure precisely. You can sidestep the distinction between perceived risks and actual risks by bundling the concept of uncertainty into the definition of risk. The definition of risk then becomes:

risk = perceived probability of failure × perceived impact

Examples:

Project management risksSoftware engineering risks
Lead developer hit by busThe server may not scale to 1000 users
Customer needs not understoodParsing of the response messages may be buggy
Senior VP hates our managerSystem is working now but if we touch anything it may fall apart

Describing risks

You can state a risk categorically, often as the lack of a needed quality attribute like modifiability or reliability. But often this is too vague to be actionable: if you do something, are you sure that it actually reduces the categorical risk?

It is better to describe risks such that you can later test to see if they have been mitigated. Instead of just listing a quality attribute like reliability, describe each risk of failure as a testable failure scenario, such as “During peak loads, customers experience user interface latencies greater than five seconds.”

Identifying risks

The easiest place to start is with the requirements, in whatever form they take, and looking for things that seem difficult to achieve.

Misunderstood or incomplete quality attribute requirements are a common risk. You can use Quality Attribute Workshop, a Taxonomy-Based Questionnaire or something similar, to elicit risks and produce a prioritized list of failure scenarios.

Prototypical risks

For example, Systems projects usually worry more about performance than IT projects do, and Web projects almost always worry about security.

It’s important to realize when your project differs from the norm so that you avoid blind spots.

For example, software that runs a hospital might most closely resemble an IT project, with its integration concerns and complex domain types. However, a system that takes 10 minutes to reboot after a power failure is usually a minor risk for an IT project, but a major risk at a hospital.

Prioritizing risks

Not all risks are equally large, so they can be prioritized.

The team’s perception of risks may not be the same as the stakeholders’ perception. It is best to validate that time and money are being spent in accordance with stakeholder priorities.

Risks can be categorized on two dimensions:

  • their priority to stakeholders
  • their perceived difficulty by developers

Be aware that some technical risks, such as platform choices, cannot be easily assessed by stakeholders.

Software Engineering risk reduction techniques

Spectrum from analyses to solutions

Few examples:

  • applying design or architecture pattern
  • domain modeling
  • throughput modeling
  • security analysis
  • prototyping

The risk-driven model focuses on techniques that are on the analysis-end of the spectrum, ones that are procedural and independent of the problem domain.

Techniques mitigate risks

If you have <a risk>, consider <a technique> to reduce it.

In practice, some risks can be mitigated by multiple techniques, while others risks require you to invent techniques on the fly.

This frame of mind, where you choose techniques based on risks, helps you to work efficiently. You do not want to waste time (or other resources) on low-impact techniques, nor do you want to ignore project-threatening risks.

You want to build successful systems by taking a path that spends your time most effectively. That means only applying techniques when they are motivated by risks.

Optimal basket of techniques

If your biggest risk is that your chosen framework is inappropriate, you should spend your time analyzing or prototyping your framework choice instead of on usability.

Your time is scarce, so you should choose techniques that are maximally effective at reducing your failure risks, not just somewhat effective.

Cannot eliminate engineering risk

The downside of trying to eliminate engineering risk is time.

The reason you cannot afford to eliminate engineering risk is because you must balance it with non-engineering risk, which is predominantly project management risk. Consequently, a software developer does not have the option to apply every useful technique because risk reductions must be balanced against time and cost.

Techniques with affinities

In software architecture, some techniques only go with particular risks because they were designed that way and it is difficult to use them for another purpose. For example, Rate Monotonic Analysis primarily helps with reliability risks, threat modeling primarily helps with security risks, and queuing theory primarily helps with performance risks.

When to stop

Which design and architecture techniques should you use? The answer is to identify risks and choose techniques to combat them. The techniques best suited to one project will not be the ones best suited to another project. But the mindset of aligning your architecture techniques, your experience, and the guidance you have learned will steer you to appropriate techniques.

How much design and architecture should you do? Time spent designing or analyzing is time that could have been spent building, testing, etc., so you want to get the balance right, neither doing too much design, nor ignoring risks that could swamp your project.

Effort should be commensurate with risk

The risk-driven model strives to efficiently apply techniques to reduce risks, which means not over- or under-applying techniques. To achieve efficiency, the risk-driven model uses this guiding principle:

Architecture efforts should be commensurate with the risk of failure.

Incomplete architecture designs

When you apply the risk-driven model, you only design the areas where you perceive failure risks. Most of the time, applying a design technique means building a model of some kind, either on paper or a whiteboard. Consequently, your architecture model will likely be detailed in some areas and sketchy, or even non-existent, in others.

Subjective evaluation

The risk-driven model is a framework to facilitate your decision making, but it cannot make judgment calls for you. It identifies salient ideas (prioritized risks and corresponding techniques) and guides you to ask the right questions about your design work.

By using the risk-driven model, you are ahead because you have identified risks, enacted corresponding techniques, and kept your effort commensurate with your risks. But eventually you must make a subjective evaluation: will the architecture you designed enable you to overcome your failure risks?

Planned and evolutionary design

Evolutionary design

Evolutionary design means that the design of the system grows as the system is implemented.

Though some projects use evolutionary design recklessly, its advocates say that evolutionary design must be paired with supporting practices like refactoring, test-driven design, and continuous integration.

Planned design

At the opposite end of the spectrum from evolutionary design is planned design. The general idea behind planned design is that plans are worked out in great detail before construction begins. Analogies with bridge design and construction are often brought up, since bridge construction rarely begins before its design is complete.

Planned architecture design is also practical when an architecture is shared by many teams working in parallel, and therefore useful to know before the sub-teams start working.

In this case, a planned architecture that defines the top-level components and connectors can be paired with local designs, where sub-teams design the internal models of the components and connectors.

The architecture usually insists on some overall invariants and design decisions, such as setting up a concurrency policy, a standard set of connectors, allocating high-level responsibilities, or defining some localized quality attribute scenarios.

Minimal planned design

In between evolutionary design and planned design is minimal planned design, or Little Design Up Front.

Advocates of minimal planned design worry that they might design themselves into a corner if they did all evolutionary design, but they also worry that all planned design is difficult and likely to get things wrong. Martin Fowler puts estimated numbers on this, saying he does roughly 20% planned design and 80% evolutionary design.

Balancing planned and evolutionary design is possible. One way is to do some initial planned design to ensure that the architecture will handle the biggest risks. After this initial planned design, future changes to requirements can often be handled through local design, or with evolutionary design if the project also has refactoring, test-driven-design, and continuous integration practices working smoothly.

Software development process

A good software development process does more than just minimize engineering risk, since it must also factor in other business needs and risks, such as time-to-market pressures.

Software development process

A software development process orchestrates a team’s activities with the goal of balancing both engineering and project management risks. It is tempting, but impossible, to cleanly separate engineering process from project management process.

A software development process helps you prioritize risks across both engineering and project management, and perhaps to decide that even though engineering risks still exists, other risks outweigh them.

Risk as shared vocabulary

Risks are the shared vocabulary between engineers and project managers. A manager’s job is to understand tradeoffs and make decisions across the risks on a project.

The concept of a risk is positioned in the common ground between the world of engineering and the world of project management. Engineers may choose to ignore office politics and marketing meetings, and managers may choose to ignore the database schema and performance estimates, but in the idea of risks they find common ground to make decisions about the system.

Baked-in risks

In practice, some risk mitigation steps are deliberately baked-in to the software development process.

At a large company worried about team coordination, the process might insist on various forms of documentation at project milestones. Agile processes bake-in worries about time-to-market and customer rejecting the project, and consequently insist that the software be built and delivered in short iterations.

Baking risk mitigation techniques into the software development process can be a blessing, when the process bakes-in risks that you would prioritize anyway, so it saves you the time of every day deciding that, for example, you should stick to two-week iterations rather than slipping the schedule.

Baking risks into the software development process can be a curse when you get it wrong.

If you decide to tailor your software development process to bake-in risks, consider:

  • project complexity (big, small)
  • team size (big, small)
  • location (distributed, co-located)
  • domain (IT, finance, systems, embedded, safety-critical, etc)
  • kind of customer (internal, external, shrink-wrapped)

Understanding process variations

  1. Is there up-front design?
  2. What is the nature of the design (planned/evolutionary; redesign allowed)?
  3. How is work prioritized across iterations?
  4. How long is an iteration?
  5. How detailed should your design models be?
  6. How long you should hold on to your design models?
ProcessUp-front designNature of designPrioritization of workIteration length
WaterfallIn analysis & design phasesPlanned design; no redesignOpenOpen
IterativeOptionalPlanned or evolutionary; redesign allowedOpen, often feature-centricOpen, usually 1-8 weeks
SpiralNonePlanned or evolutionaryRiskiest work firstOpen
UP / RUPOptional; design activities front-loadedPlanned or evolutionaryRiskiest work first, then highest valueUsually 2-6 weeks
XPNone, but some do in iteration zeroEvolutionary designHighest customer value firstUsually 2-6 weeks

Application to an agile processes

The big challenges are:

  • how to address initial engineering risks
  • how to incorporate engineering risks that you later discover into the stack of work to do

Risks

You will have identified some risks at the beginning of the project, such as:

  • initial choices for architectural style
  • choice of frameworks
  • choice of other COTS (Commercial Off-The-Shelf) components

Some agile projects use an iteration zero to get their development environment set up, including source code control and automated build tools. You can piggyback here to start mitigating the identified risks.

Risks backlog

At the end of an iteration, you need to evaluate how well your activities mitigated your risks. Most of the time you will have reduced a risk sufficiently that it drops off your radar, but sometimes not.

Whenever possible, risks should be written up as testable items.

One way to incorporate risk into an agile process is to convert the feature backlog into a feature & risk backlog. The product owner adds features and the software team adds technical risks. The software team must help the product owner to understand the technical risks and suitably prioritize the backlog.

How can we handle both backlogged feature and risks?

Prioritizing risks and features

It is tempting to put both features and risks on the same backlog, but managing the backlog becomes more complex once you introduce risks, because both features and risks must be prioritized together. Who is qualified to prioritize both?

It is the job of the software engineers to educate the product owners that if they ever want to have a secure application, they need to address that risk early, since it will be difficult or impossible to add later. As part of the reflection at the end of each iteration, you should evaluate architectural risks and feed them into the backlog.

An agile process can handle architectural risks by doing three things. Architectural risks that you know in advance can be (at least partially) handled in a time-boxed iteration zero, where no features are planned to be delivered. Small architectural risks can be handled as they arise during iterations. And large architectural risks should be promoted to be on par with features, and inserted into a combined feature & risk backlog.

Risk and architecture refactoring

In the beginning, they know and understand less. After some work (design, prototyping, iterations, etc.) they have better grounded opinions on suitable designs.

Once they recognize that their code does not represent the best design (e.g., by detecting code smells), they have two choices:

  • One is to ignore the divergence, which yields technical debt. If allowed to accumulate, the system will become a big ball of mud.
  • The other is to refactor the code, which keeps it maintainable.

Refactoring, by definition, means re-design and the scale of that redesign can vary. Sometimes a refactoring involves just a handful of objects or some localized code. But other times it involves more sweeping architectural changes and is called architecture refactoring. Since little published guidance exists for refactoring at large scale, architecture refactoring is generally performed ad hoc.

Alternatives to the risk-driven model

The risk-driven model does two things:

  • it helps you decide when you can stop doing architecture
  • it guides you to appropriate architecture activities

It is not good at predicting how long you will spend designing, but it helps you recognize when you have done enough.

There are several alternatives to the risk-driven model:

  • no design: borrow heavily from presumptive architecture
  • documentation package: write full documentation package that describes your architecture
  • yardsticks
    • most projects should spend 33-37% of their total time doing architecture
    • small projects spending as little as 5%
    • very large project spending 40%
    • it does not help you decide whether one more (or one less) day of architecture work is appropriate
  • ad hoc: make a decision in the moment, based on their experience and their best understanding of the project’s needs
    • dependent upon the skill and experience of the developer
    • not teachable
    • kind of informal risk-driven model, where developers tacitly weigh the risks and choose appropriate techniques