Improving Processes in Large Service Organizations

Process improvement in large-scale service sector enterprises is considerably different from and more difficult than in manufacturing processes. Yet, the services sector represents a majority of the economic output of many developed economies including the US, so there are good economic to reasons improve service-based processes. In order to improve them, it is critical to first measure people’s work, but difficult to do so. In large enterprises, the process takes place in parallel in hundreds to tens of thousands of workstations simultaneously; much of the work is intangible, and there are few tools that can measure the work and map the processes accurately.

In the past, consultants watched workers and made sketches of the process. That approach has several weaknesses in a large scale environment, including the Hawthorne effect; it is also nearly impossible to observe enough workers for long enough and in sufficient detail to obtain statistically valid workflow samples. That takes computers and software based collection products. Lean advocates have attempted to use Value Stream Mapping and spaghetti diagrams that are useful in manufacturing. They often encounter problems with that approach for a number of reasons, including that knowledge/service work is not as repetitive or as predictable as in manufacturing – there are often a number of ways to successfully complete a transaction so that there is no easily defined standard work. Therefore, many Value Stream Maps would need to be built for each of many processes (various transaction types), and several spaghetti diagrams for each of the process instances and successful variations. Furthermore, the kinds of wastes found in service are different in many ways from manufacturing wastes, though the definition of value added activities (muda), the importance of even loading (mura) and the avoidance of overwork (muri) apply in both environments. Perhaps mura and muri apply to an even greater degree for knowledge workers as complex processes require a clear thinking, human brain to do the job right the first time. That means workers who are not sleep deprived nor exhausted from 16 hour days.

This article discusses a current approach to the problem of improving services in knowledge worker environments, specifically where computers are involved in a large portion of the process. Typical environments include health care claims processing, call centers, financial services, IT service delivery and many others. Service delivery in each of the above environments typically involves large amounts of money or services which will impact a large number of customers.


World-class methodologies such as Lean and Six Sigma are well known, proven approaches to process improvement in the manufacturing sector of the US economy. As a result, according to the US government’s Bureau of Labor Statistics, manufacturing productivity doubled between 1987 and 2007. See Figure 1. Adopted later in the services sector, productivity there increased only 22% in the same time period. However, there are many other reasons for the slower gains. It took several years for senior management and quality professionals to embrace the idea that process improvement could work in that sector at all, followed by the realization that both Lean and Six Sigma must be applied differently in the services sector. For example, the focus in manufacturing is on the widget as it makes its way through the process, while thefocus in services is on the interaction with a human customer – an intangible, often complex and difficult to measure, but critical aspect of the service process. Additionally, in manufacturing, the process is linear and largely run at one or a handful of lines at once; while in large-scale services, such as insurance claims processing, call centers, hospitals, financial services, IT services and others, the process takes place in parallel at hundreds to tens of thousands of workstations. Then, instances of services work often have a various number of steps and different paths through the systems, and so are generally more complex than manufacturing. They are also less repetitive, making it much more difficult to measure, and hence to improve, than even complex manufacturing processes.

One area experiencing difficulty succeeding in process improvement is that of knowledge workers – used here to mean people working in any industry, but with heavy use of networked computers and other digital systems. The approach mentioned earlier using consultants to observe people with manual tools is generally ineffective.

In recent years, several vendors in the Business Process Management space have used software to collect data from applications and networks for more robust input to process mapping. Inputs have typically included database log extracts or other enterprise application logs. This approach is more accurate and more scalable than human based observation, but it is dependent on the quality of the logs which often are not very granular and lack access to user interactions with application. They are necessary for detailed analysis of user behavior including exactly how much time the user spends doing each atomic unit of work. Additionally, each application has its own log format and content, making for a difficult data collection and analysis task. Granted, systems built with BPM tools help with standardization. However, even those systems cannot provide in-process measurement of the user’s activity as he looks at emails, accesses corporate web sites, or runs legacy mainframe based transactions on dumb terminal emulators. In the author’s experience, those activities can consume a large portion of the user’s time, if not actually constituting the main production task.

Achieving breakthrough levels of process improvement requires both very granular and scalable data collection and storage as well as world class process improvement methodologies. While collecting process data, one must also avoid materially changing the performance characteristics of the process under study or the enterprise infrastructure. It is not generally feasible to require an enterprise IT department or a third party application vendor to retrofit applications with performance metrics and reporting. Ideally, then, one would use passive network taps and/or switch or router based constructs like Cisco VACL Capture or mirror ports along with high capacity collection devices. These collection approaches are affordable in relation to the potential savings. They can be non-invasive and, for a centralized process under study even in the largest enterprises, can generally be co-located near the server farm or mainframe in a few data centers at most. Given government regulations such as HIPAA security, and the need to protect financial transactions, the system must encrypt all collected data while at rest and in motion, in real time. In contact centers in large enterprises such as telecom service providers processing millions of transactions per year, there can be gigabytes per second of data to capture and manage, requiring powerful servers and fast storage. There is also one broad class of computing model where traffic across the network, unlike web oriented applications or the still-common dumb terminal emulation, does not accurately reflect the user’s behavior. That is the fat client model, where an application on a PC does significant work. For those situations, a PC resident agent with a small footprint and a low reporting traffic rate can capture screen shots and field changes, forwarding the information to a central server for storage and later analysis. This approach has proven economical even in VPN-based home office workers and virtualized desktop environments like Citrix, thanks to the widespread availability of broadband data services in most residences. By capturing exactly where and for how long the user spends his time, process improvement teams can obtain the granular, empirical data they need to make fact-based decisions.

As we know from Lean and Six Sigma, in order to improve the process, one must measure its Key Performance Indicators(KPIs); including items like user think time, system time, key data inputs and granular transformation actions. The collected data is the key input to a business process discovery and analysis. Once the improvement teams and subject matter experts initially discover and analyze the process in sufficient depth, they can turn captured data into useful business objects. That often takes some programming, database access, metadata and the like to augment the captured data, though teams of two to five people can complete a project in two to six months. Analysis can then provide the data necessary to do root-cause analysis of problems, hypothesize the changes needed to improve the process, do analysis and experiments to better understand the main factors and their interactions and to measure changes to the process as they are piloted and put into production. Using open source or proprietary tools, it is possible to turn the collected data into a business process map and to enable ad hoc analysis in near real time. See Figure 2. Business analysts and quality professionals can perform the actual analysis more quickly than they can write up an analysis request to the data warehouse people using previous approaches.

Having large volumes of empirical data available, rather than the usual anecdotal process data and estimates, gives improvement teams the ability to analyze it scientifically. They can also produce in-process metrics like all end user paths through the process, including the fastest and least number of steps, measure variation, defects, workers who cherry pick transactions, and analyze key indicators of choice at various steps in the process – whether straight-through as we we’d like, or rather, down exception paths. This information provides the data necessary to successfully coach workers, provide training information, and generate information necessary to prioritize improvements such as partial or full automation of certain transactions. The power of inexpensive, industry standard servers allows Operational Excellence implementation teams to analyze the long tails of exceptions, to relate work and customer process behavior back to business rules for improvements and to change processes with higher levels of confidence than with previous approaches.

Process metrics don’t just materialize and processes have no pre-existing context, unless they were originally imbued with BPMN or similar metadata. In 2012, the majority of processes in use were built before BPM standards gained significant traction so it is necessary for business subject matter experts to label the discovered activities (applications, transactions steps, etc) with names reflecting what business people call them in every day usage. The discovery engine can then generate actionable intelligence from transitions occurring in the monitored business process.

One approach to generating business events is commonly called screen scraping, though in reality it is a fairly sophisticated form of data collection and analysis in its own right. One must be able to analyze the data moving between client and server, whether that data is from dumb terminals, web servers or whatever traverses the network. Then one must render the data in the same way that the target machines do in order to recreate what the user saw and how he responded. See Figure 3 (click here to enlarge) for an example of data analysis which recreated the user’s experience for the business analyst and provided information needed for the second use of the original data, as well as an analysis of user and system behavior. An additional requirement for significant improvement is to analyze and report on process data without the restrictions of traditional Business Intelligence products with pre-defined schema, event summarization and the resulting restrictions on analysis and reporting. State of the art process mapping and analytic systems use so-called big data technology to store and analyze heterogeneous data and are now embodied in commercial, off the shelf software and put to use in healthcare insurance companies improving processes like claims operations, as well as in banks and telecom service providers to improve call center operations. By generating process intelligence, workforce intelligence and customer intelligence, enterprises can save millions of dollars annually through continuous productivity improvements, identification and reduction of transaction defects and waste, and can specifically reduce work in progress. Certain enterprises build software robots to automate some or all of the tedious, highly repetitive work of following business rules in transactions previously done by humans, though where human judgment is not required. In this way, these enterprises reduce costs and lead times, as well as free people to perform the work requiring human judgment and reasoning skills, all while improving customer satisfaction.


Mike Darrish is an Atlanta based, certified Lean Six Sigma Black Belt and Process Improvement Specialist at OpenConnect Systems, where he delivers improvement project in large enterprises, using Comprehend, a product suite implementing the concepts described in the article. Visit OpenConnect at Prior to OpenConnect, Mike worked for over 25 years in large enterprises and startups providing network management, IT performance improvement and IT infrastructure products in technical and a variety of sales roles.

Contact him at

Similar Posts