<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Enterprise Technology</title>
	<atom:link href="http://www.christopher-hart.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.christopher-hart.com/blog</link>
	<description>Articles on enterprise architecture with a focus on scalable, resilient solutions.</description>
	<pubDate>Sat, 27 Dec 2008 18:17:45 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>The dangers of narrow subject matter expertise and the case for solution architecture</title>
		<link>http://www.christopher-hart.com/blog/2008/12/27/the-dangers-of-narrow-subject-matter-expertise-and-the-case-for-solution-architecture/</link>
		<comments>http://www.christopher-hart.com/blog/2008/12/27/the-dangers-of-narrow-subject-matter-expertise-and-the-case-for-solution-architecture/#comments</comments>
		<pubDate>Sat, 27 Dec 2008 18:17:45 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
		
		<category><![CDATA[Enterprise Architecture]]></category>

		<category><![CDATA[architecture]]></category>

		<category><![CDATA[complexity]]></category>

		<category><![CDATA[solution architecture]]></category>

		<guid isPermaLink="false">http://www.christopher-hart.com/blog/?p=11</guid>
		<description><![CDATA[As technology solutions continue to increase in complexity, organizations often respond by creating teams with deep technical expertise to design, build and maintain their technology assets.  One side effect of deep technical expertise is narrowing breadth of knowledge.  While most IT professionals start their careers with broad technical knowledge (though perhaps not experience), as one&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>As technology solutions continue to increase in complexity, organizations often respond by creating teams with deep technical expertise to design, build and maintain their technology assets.  One side effect of deep technical expertise is narrowing breadth of knowledge.  While most IT professionals start their careers with broad technical knowledge (though perhaps not experience), as one&#8217;s experience and interest deepens in one particular domain, the breadth of knowledge - by necessity - shrinks.  This side effect is rarely seen as negative; in fact, deep technical expertise is often - and rightly - held in high regard.  </p>
<p>Unfortunately, lack of breadth presents serious risks to the quality of our solutions.  The complexity within technology domains (driving us to create deep technical expertise) also generates complexity in the interfaces between these domains.  For the purposes of this discussion, I&#8217;m referring to domains in a high level, coarse-grained sense like application development, application servers, network infrastructure, supporting application components (databases, middleware, and the like), storage solutions, and so on. These domains are individually complex, often to the point where there are sub-specialties within them.  (Point in case: most large organizations have some network engineers who specialize in load balancing while others are experts in network design/engineering.)</p>
<p>Some will argue that while these individual domains are complex, the internal complexity is abstracted from the interfaces to other components thus hiding the &#8220;inner workings&#8221;.  While this is often a design goal, it is rarely fully realized.  It&#8217;s easy to believe this well-meaning but dangerous fallacy, especially since so many IT professionals are &#8220;classically educated&#8221; as software developers.  Any college educated CompSci or CIS professional learned the importance of object-orientation, interfaces, abstraction, and so on.  Our trust in abstraction is so conditioned that it just feels like it should work in other domains.  It seems logical, but it just doesn&#8217;t scale to the breadth of systems and degree of complexity outside of a pure software engineering paradigm.</p>
<p>An exploration of the reasons why abstraction doesn&#8217;t scale could be an entire series of articles in its own right, but a cursory treatment may help convince skeptics.  First, abstraction in an object-oriented software engineering context is completely implemented within a single system (like a programming language).  The layer of abstraction and the complexity on either side of that abstraction share common tools, semantics and structure.  This commonality reduces the overall complexity of the solution and the difficulty associated with making the abstraction work.  Purists will argue that commonality doesn&#8217;t matter that much.  Evidence that this is not true can be found by comparing the difficulty in integrating two software components both written in Java with reasonable software standards with the difficulty in integrating a software component written in Java with another component written in .NET using web services.  The myriad of &#8220;standards&#8221; for web services highlight the difference between these scenarios.</p>
<p>Second, abstraction within software engineering is rooted in programming languages that exhibit a high degree of precision with respect to their semantics and syntax and, in comparison to broad IT &#8220;solutions&#8221; are relatively simple.  Java, for example, has only 50 keywords and a handful of syntax rules that can be used to implement abstraction.  Compare this to the average load-balancing solution, network switching infrastructure, or application server configuration, all of which can be configured in highly variable (and novel) ways.  This difference in complexity makes abstraction a much more difficult task.  In a software engineering world, it&#8217;s the difference between abstraction for a simple framework (something like implementing MVC) and abstraction for an operating system&#8217;s threading and memory management libraries.</p>
<p>Third, standards and patterns for abstraction in software engineering are well established and commonly understood.  Creating standards and identifying patterns is easier in software because of the previous two points.  Standards and patterns for integrating components from different domains (e.g. making load-balancers, web servers, application servers, and database servers work together) do exist and may be commonly understood, but are not so detailed or so precise that they reduce complexity or completely hide the inner workings of each individual component.</p>
<p>If we can agree that abstraction doesn&#8217;t really work between domains and that individual domains are so complex that they require deep technical expertise, we must then acknowledge that the integration of these components is a significant concern in its own right.  This is what architecture - especially solution architecture - is really about.  So called &#8220;PowerPoint architecture&#8221; or domain specific architecture (provided by software architects, storage solution architects, etc.) is not a substitute for holistic solution architecture that defines how disparate components from different domains will interact.  Make no mistake: PowerPoint architecture and domain-specific architecture have their place. Domain-specific architecture must be part of the solution delivery process.  Unfortunately, it is too often the focus, usually at the expense of good solution architecture.</p>
<p>Solution architects need to balance breadth and depth of technical knowledge to be effective.  This means that not every solution architect need come from a heavy software architecture or software engineering background.  Instead, a good solution architect understands a wide-range of technology domains and experience in putting them together in a variety of settings.  </p>
<p>A classic example of a problem where this kind of solution architecture really makes a difference is in geographically distributed web applications that are transactional and stateful in nature.  The developer or software architect will rely on the application server&#8217;s services for managing session state and persistence.  The infrastructure/hosting teams will rely on load-balancing solutions for &#8220;session stickiness&#8221; to keep a customer &#8220;stuck&#8221; to a particular web and application server for the duration of the session.  Easy enough, except as soon as you launch the application in production, you start getting reports of customers complaining that they&#8217;re losing their sessions, having to &#8220;start over&#8221; in multi-step transaction flows, or other intermittent, unpredictable behavior.  What happened?</p>
<p>Large ISPs like Comcast or AOL have multiple proxy servers from which a customer&#8217;s HTTP session may originate.  During the customer&#8217;s session, the ISP may internally load-balance the customer to a different proxy server, causing the source IP address to change.  Your load-balancer session stickiness didn&#8217;t account for this, the user got load-balanced to a different web or application server, and the session state couldn&#8217;t be rebuilt.  </p>
<p>There are many variations on this theme&#8230;  Maybe the load-balancer uses SSL ID, but the ISP&#8217;s proxy had a different A record cached for your site, so the user ended up in another data center.  Perhaps your web servers can&#8217;t route traffic to the application server that &#8220;knows&#8221; the customer&#8217;s state.  Or you&#8217;ve really done your homework and built a global cache to manage state, but the cache didn&#8217;t replicate fast enough.  The bottom line is that there are a variety of scenarios in which the interaction between the load-balancing infrastructure, application server configuration, and application code determine the actual customer experience.</p>
<p>This is where a good solution architect will save the day.  Your deep technical SMEs are still invaluable, but detecting the possibility of scenarios like the one above requires breadth of knowledge and thorough understanding of the characteristics the <em>solution</em> must have rather than any individual component.  The need for solution architecture is very real.  Doing it well has a tangible effect on the quality of technology solutions and mitigates the risks from deep technical expertise creating silos of domains.  We cannot get away from the need for our really sharp SMEs, nor should we want to.  However, we must acknowledge that our solutions demand attention to integrating disparate components in increasingly complex ways.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christopher-hart.com/blog/2008/12/27/the-dangers-of-narrow-subject-matter-expertise-and-the-case-for-solution-architecture/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Resiliency, Architecture and the Importance of Testing</title>
		<link>http://www.christopher-hart.com/blog/2008/12/11/resiliency-architecture-and-the-importance-of-testing/</link>
		<comments>http://www.christopher-hart.com/blog/2008/12/11/resiliency-architecture-and-the-importance-of-testing/#comments</comments>
		<pubDate>Fri, 12 Dec 2008 04:10:19 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
		
		<category><![CDATA[Resiliency]]></category>

		<guid isPermaLink="false">http://www.christopher-hart.com/blog/?p=5</guid>
		<description><![CDATA[Everyone in the IT business - and particularly developers - are familiar with testing.  Testing is the mechanism by which organizations perform quality assurance.  The good news is that testing is so engrained in software development organizations that some level of testing is performed in almost every organization.  The bad news is that software testing is really just one aspect of testing the entire solution; the things you're not testing when you do software QA can just as easily sink the ship.  There are two other important aspects which are critical to testing to ensure a solution is resilient.  ]]></description>
			<content:encoded><![CDATA[<p>Everyone in the IT business - particularly developers - are familiar with testing.  Testing is the mechanism by which organizations perform quality assurance.  The good news is that testing is so engrained in software development organizations that some level of testing is almost always performed.  The bad news is that software testing is really just one aspect of testing the entire solution; the things you&#8217;re <em>not</em> testing when you do software QA can just as easily sink the ship.  There are two other important aspects which are critical to testing to ensure a solution is resilient.  </p>
<p>First, the infrastructure must be tested.  &#8221;But,&#8221; you say, &#8220;I test my infrastructure in the process of testing the software!&#8221;  This may be true to varying degrees depending on the types of software testing that are performed.  However, many details of the infrastructure are difficult to test or unique to a particular environment.  Server configurations may match between your test environment and production, but firewall rules are most certainly different.  It&#8217;s very difficult to know that a test server is configured exactly the same as a production server - do you know with absolute certainty that your test server and your production server have exactly the same startup configuration?  Kernel tuning parameters?  Fiber channel storage configuration?  If you audit your environment, I promise the vast majority of organizations will find differences.</p>
<p>These factors may not seem that important, and in many &#8220;sunny day&#8221; scenarios, they&#8217;re probably not.  It&#8217;s when conditions inevitably vary from normal that these variations rear their ugly head.  It&#8217;s precisely these times when you don&#8217;t want to be left wondering why your application is suddenly failing, only to discover after hours of your sysadmin pulling her hair out that one production server&#8217;s NIC has a default gateway configured incorrectly.</p>
<p>Dealing with this situation requires a multi-prong approach.  First, periodic audits of configurable items on all servers needs to be standard operating procedure.  Second, new production environments need to be tested in the same way you would test in a performance testing environment.  Third, existing production environments undergoing change should have predefined methods for periodic verification.  For example, if a production environment has a change (e.g. new code, new server configuration, patches), there should be a way to &#8220;test&#8221; these changes on a small subset of all the production servers.  This requires planning in advance, which is why architecture and planning for resiliency is so important.  When two or more identical production environments exist (hopefully always!), take each one offline periodically and test them.</p>
<p>Similar to infrastructure, architectural items also need to be verified.  It would be unthinkable to not test functional requirements of your application, so why wouldn&#8217;t you also test architectural requirements?  In particular, architectural requirements that affect the quality of your application are absolutely critical.  For example, if you have redundancy, failover, or the ability for a component to run in a degraded mode built into the structure of your solution, they must be tested.  Similar to functional requirements, these architectural requirements need to have test plans and have traceability through design artifacts.</p>
<p>With so much focus on &#8220;functional&#8221; requirements, many organizations lose focus of the &#8220;non-functional&#8221; requirements.  Calling the latter non-functional does a great disservice to these important details; they&#8217;re really quality requirements.  The overall quality of the environment is a function of many inputs: software, infrastructure, architecture, and testing of all three.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christopher-hart.com/blog/2008/12/11/resiliency-architecture-and-the-importance-of-testing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What is resiliency?</title>
		<link>http://www.christopher-hart.com/blog/2008/04/15/what-is-resiliency/</link>
		<comments>http://www.christopher-hart.com/blog/2008/04/15/what-is-resiliency/#comments</comments>
		<pubDate>Wed, 16 Apr 2008 03:15:55 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
		
		<category><![CDATA[Resiliency]]></category>

		<guid isPermaLink="false">http://www.christopher-hart.com/blog/?p=3</guid>
		<description><![CDATA[One of the subjects that I deal with frequently is resiliency; specifically, the resiliency of technology solutions.  But what does it mean to be resilient?  Fundamentally, it means that a system or solution needs to be engineered with these goals in mind:

The entire solution is designed to continue to function as normally as possible in [...]]]></description>
			<content:encoded><![CDATA[<p>One of the subjects that I deal with frequently is resiliency; specifically, the resiliency of technology solutions.  But what does it mean to be resilient?  Fundamentally, it means that a system or solution needs to be engineered with these goals in mind:</p>
<ol>
<li>The <em>entire solution </em>is designed to continue to function as normally as possible in the face of failure.</li>
<li>When failures occur, they are invisible to the customer.</li>
<li>If a failure <em>must</em> be visible to the customer, the solution provides the highest level of service possible (in other words, compartmentalize failures).</li>
</ol>
<div>This sounds straight forward in theory but is rarely so in practice.  Why?  There are many contributing factors, and I&#8217;ll be dealing with these in detail in subsequent posts.  Some are obvious: resiliency adds cost, implementation costs must be balanced with business value and time-to-market pressures, and the fact that future failures are much more abstract the current business needs.  Despite these challenges, many organizations try to do the right thing by investing in the construction of resilient solutions that ultimately fail.  </div>
<div> </div>
<div>These scenarios are the ones that are particularly frustrating, leaving very knowledgeable technologists wondering why such a robust system failed.  In such cases, the answers are usually much more subtle: complexity of systems lead to difficulty identifying failure modes, quantifying specific resiliency needs is rarely systematic, control plans are inadequate or absent leading to the development of new and unpredictable types of failures.  In all these cases, if you&#8217;ve ended up in such a scenario, it&#8217;s difficult or impossible to even quantify the operational, reputation and financial risks posed to the business - you just don&#8217;t know what you don&#8217;t know.</div>
<div> </div>
<div>On this site, I&#8217;ll discuss these and other quandaries that threaten the stability of critical enterprise infrastructure.  Business no longer have the luxury of tolerating unreliable technology.  Five to ten years ago, the internet and related technologies were seen as new and unique - the virtual &#8220;wild west&#8221;.  Because these &#8220;enabling technologies&#8221; were viewed as somehow separate from the services and products that businesses provided, failure of the technology was not a direct reflection on the quality of the product or the capability of the provider.  Now, those enabling technologies have faded into the background - they are no longer new and exotic.  Customers expect mobile banking solutions on their cell phone to &#8220;just work&#8221;,  just as land line telephone customers expect dial-tone or homeowners expect power from their electrical outlets.  Failure of technology now equates to failure of the business.</div>
<div> </div>
<div>Resiliency is the mechanism to ensure that our solutions meet these demands.  Resiliency may not be easy, but it is necessary.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.christopher-hart.com/blog/2008/04/15/what-is-resiliency/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
