Table of Contents
List of Figures
List of Tables
This thesis is the result of my Candidatus Scientiarum degree at the Department of Informatics at the the Faculty of Mathematics and Natural Sciences at the University of Oslo, in the period January 2002 to May 2003.
I have been a FreeBSD user since 1996, and while I am not currently developing free software, I consider myself a part of the BSD community. In my work I have been contributing to a number of other open source projects since 1997.
I would like to thank the librarians at the Department of Informatics for their great help in finding supporting litterature.
I would like to thank my mentors Ingvil Hovig and Tone Bratteteig for their help during my thesis.
I would like to thank Dag-Erling Smørgrav for his kind support, both in terms of providing information and helping with the development of tools for statistics generation.
I would like to thank my familly for their support, proof-reading and comments.
I would like to thank the FreeBSD Organization at the rpt.FreeBSD.org cluster for providing disk space and computing resources.
I would like to thank all my interviewees for their time and helpful answers.
I would like to thank all those who have read and commented on my work throughout the writing process for their kind words and their useful feedback.
I would like to thank Tatsumi Hosokawa for letting me use his artwork for the cover.
Finally, I would like to thank all my friends who have joined me for coffies and heard me rant about my thesis. Without your help for venting, I could never have finished this thesis.
Abstract
This thesis provides a baseline on which a methodology for the FreeBSD Project can be built. The three results of this thesis are:
A descriptive "project model" for the FreeBSD Project
A set of "quality goals" for the project model
A "comparison" between the quality goals and the project model, giving us the quality of the project model.
The project model is based on project documents, interviews, mail archives and the experience of the author in working with the project. The quality goals are based on this as well as supporting litterature.
The discussion of project issues is backed up by a strong repertoire of theory in the field of software engineering.
The main findings of this thesis are:
The FreeBSD Project scores well on most of the defined quality goals
There are issues regarding the project organisation that needs to be adressed
When adding people to a project, the amount they need to talk to one-another follows the curve (2^n)-1 where n is the number of people in the project [Brooks, 1995]. Tools can combat this communication overhead, and one tool that is considered effective is a methodology. A methodology is a set of normative documents saying how things should be done in the project, and the processes used are chosen because they are thought to be more effective and efficient for this project than other processes.
Every software development effort uses a methodology in some way or another, but it may not be very well articulated. Some have only a set of guidelines. Having a well articulated methodology, however, may have advantages such as making the project easier to "sell" to uninformed third parties that the project needs to approach for funding, support or to sell its services towards. It becomes easier in the sense that the third party can understand what makes the project tick, and this gives it credibility to the third party. This is in accordance with [Giddens, 1997: 32] who sais that trust is gained through confidence in the person or oranisation's predictability.
This research will study the FreeBSD Project, a project where the project members work on a voluntary basis to create a free, BSD-based UNIX-like operating system. This project has experienced a rapid growth of project members, and does not have a well documented methodology. The first aim of this research is to describe the FreeBSD Project, and I do this through creating a project model. A project model describes how the project is run, without making judgement of how it should be run. This model will serve as a basis for discussion about the current processes, and can over time be transformed into or lay the basis for a methodology. In terms of process improvement it will provide a baseline to which the project can be compared as it evolves and improves.
Although the project model is not normative like a methodology, it will still help reduce the communication overhead, reducing time needed to find out how processes are done and who should be contacted for which issues. In order to reduce the time effectively, it should be fast and easy to look up information in.
In order to discuss the current processes, I have chosen to measure its qualities. We will do this through stating a set of quality goals for the project model and measure how well the goals are achieved. Through comparing these findings to the quality goals, we will be able to say how well this project model works in fulfilling the goals.
It is outside the scope of the thesis to discuss alternative project models or recommend improvements to the current project model. It is also outside the scope of this thesis to provide a full methodology for the FreeBSD Project. As the FreeBSD project is based on voluntary work, cost will not be discussed in this thesis.
The scope of my study has been following the FreeBSD Project from April 1st, 2002 to April 1st, 2003. I will select a few subprojects that I, on the basis of interviews and reading from mailinglists, find suitable to describe, and thus not describe every subproject to the FreeBSD Project.
In summary, this thesis aims to:
Create a project model for the FreeBSD Project
Create a set of metrics to measure the quality of the project model
Compare the project model with the metrics to say if the project model holds the desired qualities.
While the main object of this thesis is motivated by reducing the communications overhead and increasing the marketing opportunities in this particular project, there are a number of other motivations as well.
[Cockburn, 1998] provides the following figure for how effective communication is through different mediums:
Since most of the communication in the FreeBSD Project is mailing lists, its effectiveness is that of "interactive writing only". Less effective communication means more communication to communicate what is needed. Reducing the communications overhead through a project model is therefore a real time saver for the people in the project.
[Cockburn, 1998] also provides a model for when what kind of methodology is needed.
In Section 4.2 we will see that there has been an increase in active developers in the FreeBSD project during the past few years. With them comes more ideas of what the project should be, and thus an increase in problem size. From my interpretation of [Cockburn, 1998] this means that while the project could use a lightweight, tacitly understood methodology a few years back, it is time to develop a project model now.
Open source development is a little understood phenomenon. What has been researched has mainly been focussed around the most famous project, Linux. Many researchers talk about "the open source model" as if it were one model, and usually refer to the Bazaar model [Raymond, 2000] which Linux has been explained by. To gain a better understanding of open source, we need to examine more projects closely. FreeBSD, a popular open source project known as a rock-solid and feature rich operating system, is a good candidate. With only one article written on the FreeBSD projects approach to software development, this study is important as there is much to learn from how it runs its software development.
Finding more software process models that work well with the open source community allows us to choose between more software process models when working in an open source environment. This allows companies that want to explore the possibilities of open source to choose a model that is easier to integrate into their company. When projects want to go open source or open source projects are created, it is good to have a set of software process models to choose from and see what model would work well for what project and thus make an informed opinion on what model to choose.
FreeBSD has a long history, and is an offspring of BSD, an operating system that received much attention and use in academia until its end in 1994. Studying FreeBSD gives us the opportunity to contrast todays practises with how they used to be.
The BSD projects come from a tradition of using hierarchical organisational structures. This contrasts the more flat and unstructured Bazaar model, and many people in the open source community perceive this as an elitist project. Even so, the project attracts new developers. The FreeBSD Project is therefore interesting to study to broaden the view on what organisational models fit into the open source culture.
BSD has since its beginning been an operating system in which research results have been quickly incorporated. Rather than being focused on backwards compatibility, it has worked as a test-bed for researchers in computer science. A kind of evolution has taken place where the best technologies have been kept and where new technologies all the time can be integrated and evaluated. The FreeBSD Project has maintained this spirit, while maintaining its reputation of delivering a rock solid product. Other projects can learn from the processes that allow this.
The development of BSD pioneered both open source and internet based development. The FreeBSD Project is a continuation of this development. What has changed, and how does it organise its development today?
The open source BSD derivatives NetBSD and FreeBSD pioneered net based distribution and large-scale distribution on cheap CD-ROM media. The ease and low cost together with high availability has been seen as one of the success factors for open source projects. How does the FreeBSD Project handle its distribution today?
FreeBSD has a need to be understood by the corporate world. A quote from one of the project founders, Jordan K. Hubbard, illustrates this: "My biggest frustration is that more of [...] the corporate world [...] hasn't been quick to jump aboard with personnel and material resources. They don't see this, as I do, as a powerful collaborative model rather than just a bunch of guys doing a free OS. I look at the track record of more traditional collaborative efforts like the OSF [(Open Systems Foundation)]or the ACE [(Advance Computing Environment)] Consortium [...] and I already see far greater success with groups like the FreeBSD Project or the Linux International folks". [Laird et.al, 2001]
One of the main reasons for the resignation of both Jordan K. Hubbard and Mike Smith was that from being a fun project, FreeBSD became to them associated with enormous amounts of conflict and bureaucracy [Daemon News, 2002]. As the project grew larger, the communication needs increased. This is where [Cockburn, 1998] finds it necessary to use a methodology as opposed to a loose collection of techniques. A methodology needs a starting point describing what is today, and this project model will provide such a starting point.
If measured by the number of users, much of the code in BSD has had a large success. The influence on commercial Unix systems and Linux is tremendous: half of Linux' utilities come from BSD and most of the advantages pioneered by BSD have been incorporated into commercial Unix systems. Apple's new operating system, Mac OS/X, relies on the BSD derivative Darwin. Even Microsoft has copied many of its utilities and features, such as the FTP client distributed with Windows [Daemon News, 2001]. The network stack TCP/IP has become the network standard of the internet. This success suggests that we should examine its development closely and see what we can learn.
Studying the evolution of the methodologies as the projects evolve can help us understand what factors make the methodologies change. If we have a plan for how our project will evolve, we can examine these factors in choosing what methodology to start out with. Since the costs for changing methodologies midway can be large , taking measures to ensure that few changes of methodologies are needed will save the project time and money.
The case study we will examine is the FreeBSD Project. The FreeBSD project consists at the time of writing of 275 members with special privileges and many associated developers and users through mailing lists and community web-pages.
This thesis consists of three logical parts:
"Framework" - this introduction, the context the context the thesis is framed within and the methods used for making this thesis
"Aspects" - three aspects of the FreeBSD Project: the organisational structure, administration of the project and the development model used in the project. This will be followed by a discussion chapter, putting the aspects together and discussing the findings.
"Project Model" - the resulting model for the FreeBSD Project and its evaluation
The "Background" chapter gives the context of both the FreeBSD Project and this thesis. The context of the FreeBSD Project is set through discussing the history of the original BSD and the historical events that led to the FreeBSD Project that is today. Then comes an introduction to the Open Source community that the FreeBSD Project is a part of. Having summarised FreeBSD's context, the theory that is the context for this thesis is presented.
In the "Research Methods" chapter, the different research methods used are detailed and discussed. After this, metrics are set for how to measure the quality of the project model, and finally the project model development process is discussed.
The next part consists of three aspects of the Project Model and a discussion tying these parts together. Detailing the entire project is outside the scope of this thesis. The three aspects "organisational structure", "administration" and "development model" are detailed more closely, being a more in-depth examination than the project model is. This is because the Project Model is intended to be easy to use, and not provide a detailed explanation of why things are the way they are.
Based on the three aspects and the following discussion, the research done in the way listed in the "research methods" chapter and developed in the way discussed in the research methods chapter, the project model is presented. This model should be accurate as of April 1st, 2003.
Finally we will discuss how the project model compares to the quality metrics set up, and discuss the project model. Lastly we will gather the conclusions found in each chapter and make the final conclusion.
This chapter serves an introduction into the literature that supports my topic. It will also describe the context of the FreeBSD Project. It will, after having clarified key terms, describe the history of the FreeBSD Project and the open source community it is a part of, then discuss aspects of project organisation, and finally come to terms with how we will understand quality.
Many of the words in the research aim are used daily where some words have more than one meaning depending on the context in which they are used. To remove ambiguity I will define how these words will be understood in this thesis.
A "project" is according to [PMI, 2000] a planned, " temporary endeavour undertaken to create a unique product, service or result". In this research, a project will refer to a logical unit of software development effort. Projects can contain sub-projects that are a project in their own right, and which outcome is planned to form a part of the main project.
A "model" simplifies something by adressing only the features important for the context of the model. It can be either descriptive, saying how that something is right now, or normative, saying how that something should be. Models are made to give an overview. The descriptive model does so on the basis of experience, while the normative model gives an overview of how that something should be and implies an evaluation that this is better than the alternatives. The descriptive model implies no such thing.
A "methodology" is a normative model, guidelining what processes and deliverables should be used for a project and in what sequence they should come. It includes what tools should be used to build the deliverables and templates for the deliverables. [Smevold, 2001: 21-24]
A "project model" is a descriptive model of how a project is organised. It can be based on monitoring the project statistically, interviewing project members, observation, document analysis, or a combination.
According to [Sommerville, 2001], a "software process" is "a set of activities and associated results which produce a software product". The four common activities to all software processes are:
Specification - The definition of required functionality and constraints on the software
Development - The software must be created to meet the specifications
Validation - Confirm that the software developed is what the customer wants
Evolution/Maintenance - The software must be updated to meet changing needs from the customer
It is worth noting that although the software processes have these features in common, the results are not the same for these activities. For instance, a software process that relies heavily upon iterations will have a different result for the specification activity than one relying on a set sequence of actions.
A "software process model" is a simplified description of the software development process, an abstraction that can describe one or more software processes. This is used both to communicate the software process without communicating it in its entirety, to classify the software process and as the basis for creating a software process for a new development effort. Examples are the waterfall model, the spiral model, incremental development and extreme programming. [Sommerville, 2001: 7-14]
Figure 2.1 helps make the difference between how the four concepts are used in this thesis clearer:
The software process model is an abstraction of how a methodology can be made. It can be compared to Plato's idea "horse" when the methodology is compared to the actual horse. The project model is a generalisation of how the project is right now, while the methodology is a plan of how the project should be. In terms of process improvement, the project model is the current status and the methodology is how the project should be run. Doing process improvement would thus be aligning the two.
By "Berkeley", the University of California at Berkeley is meant unless otherwise explicitly stated.
By "BSD", the Berkeley Software Distribution and its project at the Computer Science Research Group ("CSRG") is meant.
The FreeBSD Project is an open source project. In order to understand what this means, we need to understand where FreeBSD is coming from and what the open source culture brings. This section will first present a history of BSD development, and will then continue to investigate the open source community.
UNIX, originally developed at Bell Labs in 1969, was rewritten in C, rather than the original assembler, in 1973 by Dennis Richie and Kenneth Thompson. Since its presentation, professors at Berkeley had taken a strong interest in UNIX. With their first running UNIX machine early 1974, strong links were built to the UNIX communities at Bell Labs. Old ties to batch processing systems made the UNIX running machine have to run a batch job system for 16 hours and UNIX for 8 every day. With the success of Berkeley's Ingres database written for UNIX and increased demand by students, the computer science department ordered a new machine to run UNIX only.
Together with the new UNIX machine, Bill Joy and Chuck Haley started as graduate students. They wrote a Pascal system and a number of utilities and editors such as 'em' and 'vi'. The Pascal system became popular among the computer science students because of its good error handling system. Joy and Haley took an interest in the kernel when installing a tape with 50 changes to the kernel supplied by Bell Labs. These changes together with the Pascal system and their utilities was put together to form the "Berkeley Software Distribution" by Joy early 1977. This feedback and changes together with new features such as extensive support for terminals led to the second version, known as 2BSD in 1978.
As the need for larger address spaces for user programs came, the computer science department ordered the newly announced VAX-11 in 1978. Since the department was used to UNIX, and UNIX with virtual memory support was not available for the VAX, Ozalp Babaoglu and Bill Joy ported the virtual memory support to V/32, the VAX UNIX version. Seeing that this architecture would obsolete the older computers in the department, Joy began porting the 2BSD software to the VAX. The new kernel enhancements and utilities led to 3BSD, the first VAX distribution from Berkeley.
With 3BSD, Berkeley had proved its ability to create and maintain a working system and was able to land an 18-month contract with DARPA to create the operating system that should run on multiple hardware platforms so that DARPA would have a unified platform at the operating system level. This provided funds to employ Laura Tong to do the project administration. She set up a distribution system that could ship a much larger number of copies: 150 for 4BSD and 400 for 4.1BSD. Being satisfied with the results, DARPA continued its funding and a steering committee was set up to guide the design work and ensure that the needs of the DARPA research community were met.
Apart from being a credible actor, the research community needed the CSRG at Berkeley and BSD. Bell Labs and its UNIX had long had the role as a clearing house for computer science research. Many researchers were interested in UNIX which made their research results often either UNIX specific or easily integratable into UNIX. But with Bell Labs' commercialisation of UNIX in 1979, Bell Labs could no longer play this role, and the CSRG soon stepped into this role. [McKusick, 1992]
The work to implement the requests required much redesign and implementation of TCP/IP based network services. This led to an internal intermediate release, called 4.1a. However, many people grew impatient waiting for 4.2BSD, and this led to many copies of the system. Based on the feedback from 4.1a, a proposal for the new system, called "4.2BSD System Manual", was made. It contained a concise description of the proposed user- and application programming interfaces.
When Joy left for Sun Microsystems in the late spring of 1982, Sam Leffler was given the responsibility to finish the project that had been promised to the DARPA community by Spring 1983. To conform to this deadline, the remaining projects were evaluated and strict priorities were set. April 1983, an intermediate release, 4.1c was released, and after reworking the I/O system and making the install process easier, 4.2BSD was released in August 1983. [McKusick, 1985]
As with 4BSD, many people were critical to 4.2BSD because of performance issues. As with 4.1BSD to 4BSD, 4.3BSD, released in June 1986, would resolve the issues causing the stir.
A requirement for using BSD was that a System V license was held. AT&T had increased their fees much and vendors who wanted to use the TCP/IP-based products that Berkeley had pioneered, lobbied CSRG to provide them under terms that did not require such a license. In response to this, Networking Release 1 (Net/1), consisting of the original networking code and supporting utilities, was released in June 1989 under a liberal license. While Berkeley charged a $1000 fee to provide a tape, the code could be redistributed by anyone as long as the copyrights in the source code was kept and the use of Berkeley's source was acknowledged in the documentation. This led many sites to provide public access to the source through the network, and this made the release very popular. Also, institutions regarded this as a way of supporting CSRGs research, and many hundred tapes were bought.
Inspired by the popularity by Net/1, Keith Bostic suggested that making a release that included more BSD code. However, Marshall Kirk McKusick and Mike Karels noted that this would require hundreds of utilities and libraries to be rewritten and the kernel to be closely examined and partly rewritten. Rather than giving up by the size of the effort, Bostic pioneered large-scale net-based development by approaching people to rewrite Unix utilities based only on their published descriptions. With his continued effort to solicit people, within 18 months almost all the utilities and libraries has been rewritten. Keeping McKusick and Karels to their promise, CSRG went through the kernel code file by file and removed code originating from 32/V. This effort culminated in the release of Network Release 2 that included a fully working system except for 6 files (out of 18.000) that had been deemed to difficult to rewrite.
As with Net/1, Net/2 became widely spread. Within 6 months, a version bootable on the Intel 386 Architecture had been made by Bill Jolitz who had rewritten the 6 files. This sparked communities that would lead to the major BSD open-source projects at the time of writing: NetBSD, OpenBSD, FreeBSD and Darwin.
Based on Jolitz six files, the company BSD Incorporated started shipping a commercially supported version. They marketed it as being a system as costing only 1% of what System V cost with source and binaries. Unix System Laboratories (USL), which was mostly owned by AT&T and owned the Unix trademark, quickly demanded they stop marketing their product as Unix. BSDI complied, but still unhappy about the competition, USL filed suit to stop BSDI from selling their product, claiming that it contained USL code and trade secrets. BSDI defended that they only used code publicly available from Berkeley and 6 other files, which led USL to include Berkeley and its Net/2 in the suit.
After six weeks of advisement, the judge ruled in January 1993 that only two of the complaints should be brought to court. These complaints should be heared in a state court rather than a federal court. Berkeley followed up with a counter-suit against USL for using code from Berkeley without stating so in their documentation. Soon after the filing in state court, USL was bought by Novell whose CEO, Ray Noorda, stated publicly that he would rather compete in the marketplace than in court. A settlement was reached in January 1994 that resulted in three files being removed and numerous minor changes and added copyright notices. With the settlement, USL promised to not sue any organisation that used the new release called 4.4BSD-Lite. [McKusick, 1999]
4.4BSD was the final release from CSRG. The three reasons were
The time required to attain funding had increased dramatic, limiting the time the scientists could spend working on BSD. In 1994, computer corporations were not prioritising funding such research. Rather, they preferred in-house research of which results they had the sole ownership of. Since BSD was freely redistributeable, the revenue from distributions sold was small.
BSD became a victim of its own success in that the features pioneered by BSD were included in most commercial operating systems.
Space constraints at Berkeley as well as the lack of funding limited the group from recruiting more people than the four involved. With 4.4BSD, the project had grown so large that four people could no longer architect and maintain it.
March 29th, 1992, Karl Lehenbauer published the FAQ for the 386BSD, which is by many recognised as the start to make the Berkeley Net/2 release available on the Intel architecture. Version 0.1 was made available in June and a community grew quickly. However, the author, Bill Jolitz, did not maintain the software much. This led Jordan K. Hubbard, Nate Williams and Rod Grimes to create the "Unofficial 386BSD Patchkit". The maintenance of this kit, however, came to a halt when Jolitz withdrew his sanction from the project. In response to this, the FreeBSD Project, name coined by David Greenman, was started.
The first FreeBSD distribution was released in December 1993, based on 4.3BSD-Lite but also included work from 386BSD and by the Free Software Foundation [Kolstad, 1994]. After being updated to version 1.1 in May, 1994, the the project was a success. However, at this time the lawsuit between AT&T and Berkeley had been settled, and the project had to switch to the 4.4BSD-Lite code base from Berkeley, leaving the encumbered 4.3BSD-Lite code behind. Because the Intel version of 4.4BSD was highly incomplete, it was not until November 1994 that the project could release version 2.0 [Hubbard, 1994].
Since version 2.1, FreeBSD has enjoyed a reputation as a stable, feature-full UNIX system for the Intel platform. Kirk McKusick describes the evolution of FreeBSD as a random path walk. This is because new features have been developed on the initiative from the developers rather than through desicions by the core team. The project has since its early days been split it two main branches. FreeBSD-STABLE is the branch that maintains the current working release that is offered to the public for production use. FreeBSD-CURRENT is the branch where the main development happens.
The FreeBSD Project was hosted and to a large degree sponsored by Walnut Creek CDROM during its initial years. In March 2000, BSDI Merget with Walnut Creek CDROM to form a united front for the BSD opertaing systems. In April 2001, Wind River bought Walnut Creek CDROM to aquire the rights to BSD/OS, and desided in October 2001 to end its relationship with FreeBSD, a relationship it had got when aquiring Walnut Creek CDROM [Reed, 2001]. The FreeBSD trademark and related rights were transferred to FreeBSD Mall Inc in January 2002.
Open source projects distinguish themselves from normal, commercial projects (also called closed source projects), in that the source code is distributed either alone or upon request, either for free or for a nominal fee to cover the distribution costs. With the internet, the distribution costs are often so amortised that most open source projects distribute both its binaries and source for free.
An important philosophy of the open source movement that comes from the Free Software Foundation and university projects such as BSD is that open source software should be self-hosting. This means that the operating system, the compilers and the utilities needed to make, compile and run the product should be open source. FreeBSD fits into this by being an open source operating system and providing most essential compilers and utilities, either their own or made by other open source projects.
[Hars et al, 2001] talks about "the open source development model", a term that in magazines and by many open source supporters is used in contrast to software process models. However, comparing these are meaningless. The open source development model refers to a significant detail in the project, that the source code will be available. The equivalent detail in commercial projects is that the project in some fashion contributes to the financial gain of the company that creates it. It makes as much sense to contrast software process models to the open source model as it makes to contrast them to the commercial model.
Other features common to open source projects is that they have internet based communities and the authors give the public the right to redistribute and modify the source code free of charge. [Jørgensen, 2001] discovered in his research that 43% of the developers had been paid to some extent for their involvement in their last project. In [Hars et al, 2001], 50% of the surveyed open source developers were paid to some extent. Although both surveys have a relatively small group of respondents, the fact that companies such as Sun, Ximian, IBM and Trolltech have contributed many projects to the open source community suggests that many developers are being paid for their time. The myth that participants do not get direct compensation for their work is therefore not necessarily correct.
However, a large part of the open source community does not get paid for their contribution. [Hars et al, 2001] has studies what motivates people to contribute to open source projects in general. They find that students and hobby programmers, the group that does not get paid for their involvement at all, the fun of having it as a hobby and expanding their skills, capabilities and knowledge are the most motivating factors. The gratification of such a hobby is explained by referring to Maslow's pyramid and the need for a stable, usually high evaluation of oneself. The increased human capital promises a future reward of being able to get a good job in a business which the participant is interested in.
According to [Raymond, 2000], all good programs come from scratching someone's itch. However, only a little more than a third of the respondents to [Hars et al, 2001] claimed that a personal need for the product was a main reason for participation.
The open source community prides itself on writing secure, well-reviewed software. [Raymond, 2000] quotes Linus Torvalds saying that "given enough eyes, all bugs are shallow". This is a paradigm in the community and a common belief that has many examples where problems have been discovered and fixed within hours of having been discovered. We will revisit this review in the verification techniques used by the FreeBSD Project to see if this holds true for this project.
As defined in our introduction, a software process model is a simplified description of the software development process. In being an abstraction, it can fit many software development processes whereas methodology usually only fits one project. Software process models help us both classify development efforts and serve as a basis for developing a methodology for new projects or projects whose methodologies are being re-engineered.
As outlined in our definition of a software process, it can consist of multiple sub-processes. A software process model is often presented as a structured set of sub-processes, so before we outline the most common software process models, a description of common sub-processes is in place.
During requirements analysis, the requirements are interpreted by the developer and an alignment of the understanding of the problem by the developers compared to the formal definition of the problem laid out in the requirements. This stage often needs the transformation of requirements written in a human language (for instance English) to a common technical and unambiguous notation, often one that can be interpreted by computers. Such models will not only align the understanding of the problem at hand between developers, but also determine whether the requirements are complete, contradictory or too ambiguous.
Design consists of forming an overall architecture for the project and reducing parts of the model to sub-problems that are tractable and simple enough in their complexity to have the full scope of the sub-problem in a developers mind at a time. For this, technical figures, design document, interface descriptions, user interfaces and such are created. A very much used tool for the design stage that includes all of these are use cases.
The only phase that is certain to be within a successful development effort is the implementation. Even with the most sophisticated Computer Aided Software Engineering (CASE) tools used for design, there are always parts that have to be implemented in code.
Verification is the verifying that the developed system does as designed, that the design completely covers the requirement and that the system developed thus completely covers the requirements. Also, testing that the program does not misbehave and behaves predictably is a part of this phase.
Integration is the deployment of the system at the customer, seeing that it integrates in the environment of the customer.
The software engineering skill that is probably least taught at university to computer science student is software maintenance. The absence of this process in many software process models shows the way it is regarded in the software industry. [Boehm, 1973] states in his article that almost 40% of the software effort in Great Britain in 1973 was maintenance, and he predicted the figure to rise.
According to [Canning, 1972], causes for initiating software maintenance are "program won't run," "program runs but produces wrong output," "business environment changes", and "enhancements and optimisation". [Swanson, 1976] classifies maintenance into corrective maintenance, adaptive maintenance and perfective maintenance.
Corrective maintenance is due to processing failures (i.e. abnormal termination, erroneous output in reports or files), performance failures (not meeting the performance criteria specified) and implementation failures (implementation not following coding standards, not adhering to design etc).
Adaptive maintenance happens due to environmental changes. Especially, change in data (i.e. a logical restructuring of the database used) or processing environments (new hardware or software architectures implemented) lead to adaptive maintenance.
Perfective maintenance is done to improve the maintainability of the code, processing inefficiency (i.e due to poor use of the operators time) and performance enhancement (increasing readability of reports, adhering to new data presentation standards).
In his article, [Swanson, 1976] writes about the importance of keeping a maintenance database. A maintenance database is a database that keeps track of all problem reports, preferably both submitted by users and developers. This database should cover all issues that are interesting for the project to track. Such items are usually the problem description and a detailed description of how it occurred, preferably with debugging information. But it can also include who is responsible for the bug, where in the code it was found, how long time it took to be resolved and so forth.
[Canning, 1972] has written on the "maintenance iceberg" that is due to bad coding practises. Since then, people have been taught what is considered better coding practises and more high level programming languages have become available, thus ideally making software more maintainable. However, the maintenance issue has not been reduced. But just as [Canning, 1972] dared not measure the iceberg in 1972, I have not found articles saying how large the iceberg is today and how it has evolved since 1972.
What is true is that the software engineering field has become more maintenance aware and authors like [Sommerville, 2001] have included it in the description of modern software process models. Indeed, many agile software process models are said to be maintenance centric, seeing every change as a maintenance change.
The world of open source is also aware of the maintenance issue. [Tatham, 1999] writes on reporting bproblems effectively, enabling the author of a program to find and resolve problems. The open source community has been good at using maintenance databases like GNATS and Bugzilla. Indeed, open source hosting sites, such as Sourceforge, often come with bug reporting, version tracking and user discussion systems allowing for ongoing maintenance in all active projects.
There exist many software process models. The most used models, that again are parents to many more refined models, are the stepwise, waterfall, evolutionary, spiral, transform, incremental and agile models. This section will briefly summarise these
With the introduction of assemblers and compilers for higher level programming languages, it became possible to write large systems. Problems of testing and documenting the systems led to the introduction of the "stepwise model". This model has eight stages that software development needs to go through in a linear order. The model is heavily document-driven: Royce gives the example of 15.000 pages for a system consisting of 100.000 instructions. [Benington, 1987]
The "waterfall model" is a refinement of the stepwise model. The two primary enhancements are feedback loops between the stages and the building of a prototype. The feedback loops is the realization that the current stage is an evaluation of the work done at the previous stage, and findings in this stage can force us to go back to a previous stage to do changes. The "build it twice" philosophy of the waterfall model builds a prototype to explore difficulties such as technical uncertainties or to gain understanding of how the user wants the interface. The main problem with the waterfall model is the criteria of the completion of fully elaborated documents. While this criterion is good for products that can easily be formalised, such as compilers and spacecraft controllers, this does not work well for end-user applications. Since the uses and interfaces of end-user applications are often poorly understood by the development staff in the beginning of a project, there is no purpose to writing the specifications in great detail. [Royce, 1970]
The "evolutionary model" assumes that there is software out there that with modifications can become the software the user wants. This approach matches well with situations where the user knows he or she needs a product but is not certain about how this product should be. This kind of development has been made less costly with the introduction of fourth-generation languages. There are mainly three problems with evolutionary programming:
Like the code-and-fix model, the code can easily turn into an unmanageable mess. This can lead to temporary fixes and features that the programmer assumes are nice but which are not used in the product.
The model assumes that the operational environment of the customer is flexible enough to follow the evolutionary path the process takes.
Like code-and-fix, the system can become poorly documented and structured, which will lead to problems when it has to be integrated with other applications or is going to be phased out and the data need to be transferred to another system.
The transform model approaches the problem of spaghetti code that easily results from the many of the before mentioned models. It assumes that formal specifications can be transformed into code. It is an iterative process where the first goal of the first iteration is to make a formal specification with the best initial understanding of the system. The following development iterations are spent changing the specifications based on the customers input and optimising the code by giving more detailed guidelines to the tool converting the specifications into code. When the program is deployed, further iterations are used to adjust the specifications based on operational experience. In this way the transform model bypasses the problem of poorly structured code that is hard to modify. While this sounds very promising, it has proved hard to build tools that will convert a specification into complete code for other than just small projects. [Saers, 2002]
The "incremental model" [Mills et al, 1980] gives the customer the ability to delay decisions until he has some experience with the system on which he can make the decision. The customer identifies which parts of the project are the most important to them, and these parts get developed in early increments. Each increment is a delivery that the use customer can put into production. Each increment is a small waterfall model, and this way specification, development, verification and validation can be processes that come over and over. The customer will get the necessary documents to keep an overview of the project status. The use of increments, the system being put into production and the customer overview minimise the risk of the project failing due to miscommunication. One major problem with the incremental model is that it is difficult to estimate how large an increment should be. Also, since the requirements do not come all at once, it is difficult to identify and build general structures that will make it easier to code the system.
The "spiral model", known from [Boehm, 1988], is an incremental model focused on risk. It is illustrated as a cycle going outwards where each cycle involves risk analysis, prototyping, simulation, design, validation, planning the next cycle and evaluating alternatives. The risk analysis, prototyping, simulation, validation, and alternative evaluation all work towards minimising risk of project failure, and the cycles are short enough that if the result of the analysis is that the project should be halted, then there is an opportunity to halt it at the end of every cycle. As with the evolutionary model, prototyping is used heavily, ending in an operational prototype given to the customer. [Saers, 2002]
The most resent addition to our repertoire of models are "agile models". These models focus mainly on handling change and the most well known of these, extreme programming, defines change as the norm and stability as an abnormality. Extreme programming is a set of 13 practises that together reduce project risk through improving the response to change and improves productivity through making building software in teams more fun. It can be summarised as a series of many stepwise models where the time to go through the model is at most a couple of days, usually about a day. In practise, this means that design, implementation and verification are ongoing processes. [Saers, 2002]
Just like two people living on an island can live together fine, a society of many individuals needs rules and regulations in order to function well. While two people can use a technique or method to do a job, [Cockburn, 1998] argues that for 20 people to do the job together, a methodology is needed. With an increased number of people involved, the need for communication becomes greater. Two people talking together use little time to synchronise their work, but 20 people who have to talk to everyone to be synchronised will spend all their time talking. A methodology greatly reduces the amount of communication needed to be done in order for the project to work satisfactory.
A methodology is the realization of a software process model within a project. While the model is just that, a model of how a project can be structured, the methodology is how this project is organised with all the details involved.
A methodology is to a software process model what an object is to a class in object-oriented programming. That is, it is an implementation of the model followed. [Cockburn, 1998] identifies nine elements that are usually described in methodologies. The elements [Cockburn, 1998] lists are:
Roles - A description of duties and responsibilities within a project. A role can typically be something asked for in a job description, but one person may have many roles within a project. I.e. requirements gatherer
Skills - Applied knowledge that enables a person to fulfil the duties of a role. I.e. object-oriented design
Techniques - Methods for doing an a piece of work. This is what [Hohmann, 1997] calls process. I.e. use case modelling. When working to reach a particular deliverable as a group, a more fitting word is process. I.e. requirements analysis
Teams - How you group people and how you assign them to roles.
Tools - What artifacts are used in your technique to do a piece of work. I.e. a compiler
Deliverables - The output from the technique or process used. [Hohmann, 1997] calls this outcome.
Standards - What is permitted and not permitted and what conventions should be followed. I.e. coding standards
Activities - what meetings, reviews, milestones and other general activities the person must attend, generate or do. I.e. create delivery
Quality - what rules, issues or concerns are to be tracked for each deliverable.
Methodology weight refers to how much is specified in the methodology. For instance, if coding standards are very thorough, this adds to the methodology weight and allows for less tolerance for variations than if more coding standards issues were unstated. This would make it more open and allow for higher variation tolerance. More thorough standards are more precise, and precision thus adds to methodology weight. [Cockburn, 1998]
Projects can have different criticality level. The levels of criticality are classified by the result of product failure. The four levels that [Cockburn, 1998] and others refer to are (1) nuisance, (2) loss of a recoverable sum of money, (3) irreversible loss of a large sum of money and (4) loss of life. Cockburn's 4th principle is that the more critical the product made by the project, the heavier methodology is called for. The additional weight is added as a result of greater precision and less tolerance for variations.
The methodology scope is the range of roles, activities and project deliverables the methodology attempts to cover. The deliverables follow the project's life cycle. [Cockburn, 1998]
It is important not to confuse project activities with the core organisation's activities. While handling requests for vacation is a part of the core organisation's activities, planning how to handle people being unavailable is part of the project manager's activities.
[Osterweil, 1987] writes in his article that software processes are software. In a follow-up paper, [Turk et al, 2000] shows that software process models are software too. Software processes are based on a process model and are key parts of a methodology and a project model. Thus, methodologies and project models can be considered to be software.
Osterweil notes that software processes are dynamic and therefore much harder to comprehend and reason about than static process descriptions. This thesis will look at the processes in FreeBSD as a snapshot in time, thus looking at them as static so that they can be analysed. In reality, discussing the processes contributes to their change or reenforcement.
Because the project model is considered software, we can use a software process model to create a project model. We will discuss the software process model used in Section 3.4.
Since software processes and software process models are considered software, they are victims of the same quality criteria as software. One of these to be particularly careful about is "rotting". Rotting is referred to when code becomes obsolete because identified problems are not fixed or the context in which the software is used evolves while the software is not updated. The same is true for methodologies and project models. If they are out of date, the information becomes unreliable. When parts of the information is unreliable, the users of the documents do not know which parts they can rely upon or not, making them not use it at all. This is actually worse than not having a methodology or project model as the trust in future, updated and maintained models is at stake. Examples of changes that happen are: new roles are created, roles merged, responsibilities broadened or split, techniques changed and communication lines altered. Changes in organisational goals can influence project methodologies even more radically.
[Osterweil, 1987] argues for formalising processes through programming them in a language where human tasks are represented as functions or procedures. In my experience, many people who have programming jobs do open source development on their spare time because it allows them to do the work they find interesting without the bureaucracy and demands they find at work. The processes described in the project model will therefore be descriptive, accompanied by a flow model, rather than normative as a programmed function implies.
[Osterweil, 1997] talks about process evaluation, saying that because process representation is often unavailable, process testing has to review the artifacts produced by the process. Although there are some guidelines available, this is the first written down project model for the FreeBSD Project. To evaluate it properly would require a study comparing what it says happens to what actually happens. This is outside the scope of the project. We have decided, like [Osterweil, 1997] suggests, to investigate the artifacts produced.
Brooks' Law, "adding manpower to a late software project makes it later" has since it was first published in 1975 been treated in many studies, but with the republishing in 1995 and discussing these studies, the author finds it as true today as it was in 1975. [Brooks, 1995]
Brooks' Law is of particular interest to the open source community as it depends on new actors coming in and studying the source code. Netscape did a bold attempt to draw on the benefits of the open source community when they released the source code to their browser in January 1998. The company decided to use an open source model for the continued development of their browser. But from releasing a new browser many times a year, it would take more than two years until they launched their first Mozilla-based browser in November 2000. [Wilson, 2002] Projects such as OpenOffice, a continuation of Sun's StarOffice, have had a similar fate of becoming late.
[Brooks, 1995] also discusses the "Second System Effect" and argues that the second system built is likely to include all the frills and ideas that the architect did not have the time for during the implementation of the first system.
Many open source implementations stress design less as they are re-implementations of systems found in the closed source world. For example, Unix systems were well understood when the FreeBSD Project started and browsers well understood when the Mozilla project was initiated. While it is often not the second time the developers build such a system, they are very familiar with the products they are copying, and inclined to make "corrections" where they see them fit. This has led many open source products to look more complex and feature rich than the products they copy. Yet they use not their time for true innovation, and this allows the closed source world to gain a technical advantage, forcing the open source community to play catch-up.
[Boehm et al, 1976] assert that attention to quality can lead to significant saving in software life-cycle cost. However, there is no good, quantifiable definition and metric to say what quality is. This is often illustrated by the classic quote from [Pirsig, 1989]:
"Quality .. you know what it is, yet you don't know what it is. But that's self-contradictory. But some things are better than others, that is, they have more quality. But when you try to say what the quality is, apart from the things that have it, it all goes poof! There's nothing to talk about. But if you can't say what Quality is, how do you know ... Obviously some things are better than others ... but what's the betterness? ... So round and round you go, spinning mental wheels and nowhere finding anyplace to get traction. What the hell is Quality? What is it?"
In order to avoid going "poof", while not solving the broader issue of what quality is, we need to have a good definition of what we mean by quality. [Boehm et al, 1976] defines and classifies 23 metrics of quality. The definitions given in his paper such as maintainability, completeness, conciseness, structuredness and understandability are extremely hard to quantify. Therefore we have investigated two ways of measuring quality and chosen one for our studies.
The open source community often talks about quality in terms of execution speed, fault tolerance and features available. The articles that immediately seemed in line both with this and the history of operating system development were the "Fuzz" articles [Miller et al, 2000].
In the Fuzz articles, the authors made tools that went systematically through common tools in multiple operating systems and exposed them to random input data and checked if the programs either went into an endless loop or crashed. They did this first in 1990, then repeated the test in 1995 and 2000. In 2000 they found that they had crashed all tested Win32 programs, 15-43% of all tested programs on available, commercial UNIX systems and 9% of the tested programs on the open source UNIX system Linux.
They found that testing the developed product repeatedly also said something about the methodologies used. During the tests they supplied error reports and even bug fixes to the vendors of the operating systems. They were therefore surprised to see that many of the bugs reported in 1990 were still in place for the modern systems in 2000.
Although robustness is a good way of measuring the quality of a product, I found that investigating only this dimension and trying to use this dimension to say something about the quality of the project model was not persuading. It may be that FreeBSD is very robust, but that ignores important issues as implementing new features, having regular releases etc.
GQM was first used for a set of projects in the NASA Goddard Space Flight Center environment and a widely used technique today, for instance used in evaluating projects with the Capability Maturity Model (CMM) for finding conformance with its key process areas [Basili et.al, 1994].
GQM assumes that measurement is only useful if it is backed by a goal, and is thus a top-down approach to measuring quality. It is divided into a conceptual level, operational level and quantitative level.
The "conceptual level" states the goal. The goal is defined for an object, that can be a product, process or resource. An example of a goal for a lamp is that we should be able to behave indoors as if we were outdoors in daylight.
At the "operational level", questions are used to characterise the way the achievement of the goal is going to be assessed. A question characterises the quality of an object from a viewpoint, and it can therefore be favourable to have multiple questions to a goal. An example of a question to our example goal is whether people can read a book as fast indoors with the lamp turned on as they can outdoors.
At the "quantitative level" we find metrics for our questions. Metrics can be measured either objectively or subjectively, and multiple metrics can be associated with one question. An example of an objective metric to our question is measuring how many words per minute were read by the test subject indoors and outdoors. An example of a subjective metric is asking the test subjects to fill out a form about how they found reading indoors with the lamp turned on as opposed to reading outdoors.
This is the approach chosen to measure quality in this thesis as it allows the investigation of many dimensions of the FreeBSD Project. In order to use it, we must be careful to have reasons for the goals selected as they must not be simply something selected out of the interest of the author, but be goals of practical value to the project.
The following chapter will describe the research process that has led to this thesis. In doing so, the different methods will be discussed in light of my experience through using them and literature concerning them.
The following figure describes best how I have been working:
The research process was an iterative process of researching and taking notes. The notes consisted of factual notes and theories that needed to be explored, and argumentation to support my findings through literature. When this research process was complete, the notes were structured into a project model and supporting chapters, and the conclusions in each chapter were brought together in an overall conclusion chapter.
My research coincided with the release of FreeBSD 5.0. Many examples will be used from this release and the work concerning it. As it is the latest release by April 1st, 2003, its continued development should be using the proposed project model.
During the research, three methods of data-gathering have been used: literature studies, quantitative and qualitative studies.
The literature studies will especially concern the development of BSD at Berkeley, but also draw upon articles written on the issue of the other projects. Also, the main study of process models and their critique and classification will be through literature studies. Much of this literature will be presented in the theory chapter and used in the discussions in chapters 4-7 and 9.
My two main methods of gathering qualitative data are interviews and reading mailing lists.
For this research I have used two kinds of interviews: semi-structured interviews and non-structured interviews. Since I live in Norway and most the people I have interviewed live in other countries in Europe and in the US, my interview media have been telephone, email and internet chat (IRC). Because of the cost of phone-calls, these have been the most well planned interviews and thus been the most structured. During these interviews the interviewees have mainly been elaborating on the questions without going outside the topic. Email interviews have been quite structured so as to encourage a quick response. My hypothesis has been that long emails containing vague phrasing takes longer time to answer and the replies are therefore more likely to be postponed by the interviewee. The use of IRC has been for ad-hoc interviews with people interested in the questions I had to discuss.
My second qualitative method was examining mailing lists. Because very little has been written about the history and methodologies of FreeBSD, it has been necessary to browse through mailing lists for topics that have been discussed that are of particular interest for this research. The mailing lists inspected have been selected on recommendation from senior members of the projects that have been reading the lists on a day-to-day basis for years and know where interesting kinds of discussions have tended to take place. More recent mailings on the lists have been the basis for short, email-based interviews.
The challenge of qualitative analysis lies, according to [Patton, 2002] "in making sense of massive amounts of data[,] ... sifting trivia from significance ... and constructing a framework for communicating the essence of what the data reveal". [Patton, 2002: 432] While it is easy to mentally distinguish qualitative information gathering and analysis, the information retrieved through analysis affects the questions asked further in the interview and in following interviews. The analysis process starts in the brain as soon as some information has been received, and is thus an ongoing process even at the time of information gathering.
The problem with analysis begining so soon is that first impressions have a stronger impact than later impressions and questions can be altered as to reenforce the first impressions rather than to question them.
There has not been written many articles on the development of the FreeBSD Project when it comes to software engineering. As the largest source of how things have been done in the FreeBSD project are its mailing lists, these have been studied to understand how processes in the project have worked. As noted in section 8.6.4, FreeBSD has 68 mailing lists of which 62 are both publicly available and archived. Mailing lists has been the primary form of communication since the projects' inception.
One particular feature of the FreeBSD mailing lists is that they are archived so that they serve well as research material. But this trait can deter people from writing. In order to allow discussions that are not wanted to be archived, a closed mailing list for committers has been created in which they can discuss anything openly without the outside world gaining access to it. It is considered bad manners to share opinions voiced in this mailing list in public forums without public consent [Lambert, 2002].
However, the amount of correspondence on these lists is too much to possibly study within the time constraint of this research. It has therefore been necessary to narrow down the scope of the search to periods of time on a low number of lists where it is most likely that there will be found evidence of processes being in use. I have selected the lists through recommendations by my interviewees.
The cutdown has been made on the basis of interviews with senior developers. When searching for a topic, the lists they recommended have been queried for. Then a period of time of at least five days before and after the hit have been read as well as the thread the corresponding mail was a part of. When researching things with a known date, the same approach has been taken. The result is that many snapshots in time on selected lists have been chosen
The mailing lists that have been studied are:
freebsd-stable
freebsd-current
freebsd-chat
freebsd-hackers
freebsd-announce
freebsd-audit
freebsd-qa
In the appendix is a set of mail threads that I have found of particular interest. As threads of mail can change topic as the thread evolves, the list has been ordered such that the first mail on the topic I have found of interest has been listed.
I have not been able to find any mailing list archives on the development of BSD at Berkeley, but I have deemed this unnecessary because of the coverage of the development process by Kirk McKusick's many articles on the topic.
The quantitative research has been done through writing scripts that will gather statistical data from sets of semi-structured data such as commit logs, source code and problem reports.
Data analysis was done on the data materials that can be found in the FreeBSD project. This was done primarily to find answers to the questions posed in Section 3.5. Due to the active use of CVS, mailing lists and GNATS in the project, there was enough semi-structured data to be able to find at least answer estimates.
This section will discuss the requirements for making a project model for the FreeBSD project and develop a project model that fits the FreeBSD project reasonably.
Even though I used an evolutionary model, my work can be grouped in traditional groups of requirements gathering and analysis, design, implementation and verification. This makes a seemingly chaotic process easier to grasp.
As discussed in Section 2.3.4, project models can be considered software, and we can thus use the same tools we would use to develop software to develop a project model. In order to build a project model for the FreeBSD project, a software process model is needed.
I have chosen an evolutionary model as shown in the figure in the beginning of this chapter. The scope and purpose has been defined in the first chapter of this thesis. The top of the figure is research that leads to either notes or argumentation, and this leads back to further research. After my research has been completed, I have restructured my notes into the project model and supporting chapters.
In the development of the methodology, the current activities will be undertaken: analysis, design, implementation, verification and maintenance. Analysis will draw upon the research done to say one activity is done in one particular setting. Design will based on the analysis generalise how similar problems are being solved. Implementation will integrate the design into the methodology saying how similar problems should be solved. The methodology will be verified by comparing it to the evidence found in the analysis and discussing the methodology with active committers of the FreeBSD project. Maintenance will be the activity of keeping the project and methodology in sync once the project has taken the methodology into use. As an evolutionary model, the maintenance part will include analysis, design, implementation and verification. All the four other stages will produce output that will affect the other stages so that each process is a continuous, ongoing process.
From Section 1.1, we find here are three main requirements for the project model:
The project model should be of use to the FreeBSD Project
The project model should be of a high academic quality
The project model should give people not familiar with the FreeBSD Project a better understanding of the project
For the project model to be of use to the project, it should be founded upon details of what the current state of the project is so that it can be used for setting goals. It should also provide easy access to the people and groups listed. It should be easy to read and should not require significant knowledge in order to understand. It should be a document new members of the project can read in order to familiarise themselves with the project, and a document project members can refer to when working on other domains than they usually work with.
With high academic quality, we will mean that it should contain clear, precise definitions of words that are not necessarily precise and clear. It should be backed up with references and be correct about the facts. It should follow guidelines for project models to ensure it contains what project workers expect to find in a project model.
In order to set goals, it is important to know the current project state. This is requires gathering data to generate descriptive statistics.
In order to make access easy, email addresses and names of the groups and people mentioned will be listed.
While the project model should be correct factually and have references, it should not be cluttered with references. This thesis will support the project model by providing more in-depth observations and references to literature.
As discussed by [Hohmann, 1997], every process has an outcome. In identifying the processes in the project, it is important to see the process' purpose, the activities, the associated roles and what the outcome of the process is.
The implementation of the project model is available in Chapter 8.
In order to verify the project model, it is sent to the interviewees and senior project managers for proof-reading and commenting with the sanction of the FreeBSD Documentation Project. After being revised, it is sent to the project members for further comments. Finally it is incorporated into FreeBSDs documentation.
To be able to say something about how well this project is running, we need to be able to measure set criteria. Without measuring, we can only say something about how we think things are. By measuring, we say something about how things are.
We will use the Goal/Question/Metric (GQM) framework described in the theory chapter to do our measurements as described in Section 2.4.2.
To identify our goals we need to ask who are the ones defining goals. FreeBSD is a UNIX based system, and the original UNIX was developed by developers, for the developers, to make their jobs of creating applications easier. Most developers will claim a personal interest and that they benefit from developing FreeBSD. The developers are stakeholders in the way that they invest time and equipment in the development.
Another group of stakeholders is users. Users can be both organisations and individuals. Like the developers, they invest their time and equipment in FreeBSD, hoping to gain the benefits of its promised features.
Having identified the users and developers as stakeholders, we can now set up goals that, based on experience, at least one of the groups have. Defining goals based on experience, or domain knowledge as Gilb calls it, is in accordance with [Gilb, 2003: 12]. Together with the goals, we will set up what questions we need answered to determine if the goals have been met, and what metric we will use to answer the questions.
The product specific goals come as a response to the users wishes. The users in the project can communicate directly to the developers through mailing lists. Product specific goals have been selected by going through the mailing list archives and noting requests that have often been voiced.
Third party software that is intended to run on the system should not be unnecessarily encumbered, as such encumbrance can lead to it dropping support for the system. Such an act would lead users that require new releases of this software to prefer using systems that are supported.
Does much third party software run on the system?
To measure whether third party software runs on the system, we will look at how much software is officially said to be supported by the system, and how much of this that actually runs at the time of writing.
As the technology evolves, so will the needs of the users and developers as their technological fantasy will continue to suggest new uses that they think should be possible. Some of these new uses require new features to be added. These should be developed, or users will move to systems that provide these features and the momentum of the project will decrease.
Are new features implemented? Do new features deprecate old features?
To measure if new features are implemented, we will investigate release announcements and interview some of the developers in the sub-projects that created the new features.
In order to stay visible at conventions, to give an impression of progress for people worried about version numbers and in order to supply new users with the most up-to-date version, releases should be made regularly.
How often are releases made?
The release history will be examined and how many days there are between the different releases will be examined.
The project specific goals are administrative goals that emphasis on developer productivity rather than having to have developers perform administrative tasks. Most goals refer to managing commit privileges and distributing written code. When these goals fail, developers have to spend time making workarounds for having code added and find alternative means of distributing the updated code. These goals have been selected on the basis of interviews with developers.
In order to stimulate to further recruitment to the project, commit privileges should be given to active contributors.
Is there a steady flow of new committers coming into the project?
This will be measured by monitoring how many committers have been added per month during the past year.
An inactive committer, that is a committer that has not performed a commit during the past 18 months, should have his commit privileges revoked. Commit privileges are also a security risk as they can be used to cause harm to the code. Dormant privileges can be picked up by intruders, and thus be a security problem for the project.
Are commit privileges regularly reviewed and revoked for inactive committers?
This will be measured by monitoring how many committers have their commit privilege removed and how regularly this is done.
For the project to be able to maintain its momentum, it is imperative that it is easy for developers to add their code.
How many commits happen? How easy is it for new committers to make commits?
To determine how many commits happen, we will watch commit logs for a year and split them into fit categories. To determine how easy it is for new committers to make commits, we will watch commit statistics for new committers for a three month period after they receive their commit privileges.
One of the main features of the FreeBSD project is its frequent releases. These releases are distributed by CVS, and users synchronise with these updates through a program called CVSup. This requires CVSup servers they can connect to, which in turn requires an infrastructure to handle the requests.
The CVSup system is by far the FreeBSDs project's largest logistical system. In addition, the project provides CD-ROM images that are used to produce installation CD-ROMs, either by companies that sell them for a profit or by individual users.
How many CVSup servers are there available? How long do users have to wait in order to use a CVSup server? Are CD-ROMs and DVDs easily available to users?
CVSup servers will be counted over time. An access simulation will be performed in order to estimate the availability of the servers. Sales statistics will be gathered and aggregated to measure the availability of discs.
Up until now, FreeBSD has been very little process oriented. It has good communications with its users through mailing lists, but mailing lists are not well suited for bug management. This is confirmed by [Swanson, 1976] who stresses the importance of a maintenance database. To handle this, they have had to create problem reporting routines for problems identified by users. The following goal is built upon the documents describing these, [FreeBSD, 2002C] [FreeBSD, 2002D]
There will inevitably be problems in any software. Therefore there should be ways that those who have not written a particular piece of software but experience trouble with it can alert the author about the problem, and possibly submit a patch.
With most of FreeBSDs users not being developers, it is very likely that they will identify problems that the developers do not know about. It is therefore important to have a standard way of error reporting that simplifies the task of duplicating the error so that the author can find and fix it.
In addition, there is little use in problem reporting without the problems being reviewed and solved. Ideally they should be resolved quickly so that the report originator sees that it is useful to report errors and is thus be stimulated to send further reports about other errors.
Are problem reports sent to the author? If so, how are they sent to the author? Are problems reported given attention? Are the problems resolved? If so, in what time frame are they resolved? How many problem are being worked on? How many problems are being resolved? How long does it take from a problem report is received until it is assigned to a committer? How long does it take before a problem is given attention? How long does it take before the problem is resolved?
To measure whether problem reports are sent, we will look at the frequency of problem reporting. Through interviewing a sample set of committers we will determine if it is common to receive problem reports in other ways than the standard format. To measure whether problem reports are given attention, we will see how quick the status of the report is changed from its initial state. We will also check how many have reached the state "closed". To measure how long it takes before the problem is assigned to the correct committer, the submit date and the first time the responsibility for the report is changed will be measured. To measure how long it takes before the problem is given attention, we will check the first time the state was changed from "open". To measure how long it takes before the problem is resolved, the time the report is submitted will be compared to the time the report is closed.
During the research process, I've used both quantitative and qualitative research methods. These are my main impressions of this information gathering, and a critique of this process
The "quantitative" research has consisted of gathering data from CVS logs, Problem Reports and classifying news announcements. This information has been easily available through a net browser and the CVS tool, and have been very consistent. The only major inconsistencies have been when some of the machines responsible for handling the logs and reports have had an incorrect time. This has not been discovered for many entries and has not caused significant errors.
The "qualitative" research has consisted of email interviews, reading mailing list correspondence, internet chat and regular interviews.
I have found it difficult to get in contact with people in projects I did not know well. [Mann et. al, 2000: 28] have found that many regard contact with outsiders as threatening and time consuming. I have also tried enlisting altruistic support that is perceived common in the open source and academic community [Hars et al, 2001]. The success of a good response has been perceived as dependant on how much the person's interests stand to gain from the output of my research.
I have had problems with people being gone from discussions for long periods of time because of other issues in life such as vacations hindering them in replying.
It has been difficult to get people to use their time to answer questions. Also, many of the interviewees that did respond would stop responding to follow-up questions. Because of the distance between people and the low resources available for transportation and phone, I have only been able to interview two committers by phone and in person. The interviews have been interesting and the interviewees answered most of my questions.
Monitoring mailing lists has proved to be a very time-consuming task, but has helped provide a more coloured view of the project and its processes than I got from interviews. The two have thus been complementary. Because of time constraints and the vast amount of both daily incoming and archived messages, I have only been able to read snapshots in time from selected lists.
In my research, I have only been in contact with people that have access to the net and the time to work on the project and engage in discussions. I have not been in contact with people that have been marginalised from the project because of any of these issues. I have thus not found how the project model includes or excludes people that cannot have regular access to the net.
I have drawn upon [Mann et. al, 2000] for handling consent. When interviewing people, I always started the interview by presenting what I was researching and clearly stating that I was asking them questions for my research. After my research was completed, I emailed all my interviewees an email with the notes I'd taken and asked if they minded anything in my notes so that I could include them with my research data. I have thus been explicit about gathering concent from my interviewees. However, I do not have signatures or such, except for the occational PGP-signed message. I have not explicitly sought consent when siting public mailing lists as people posting on such lists are aware that they are saying this in a public fora.
For the three last months of the thesis, the project model, most of the data and my thesis has been available to a larger and larger part of the community. The project model was presented to the community at large 1,5 months before the thesis was submitted and the thesis was presented two weeks before it was submitted. I have thus made sure to give people access to the data I have gathered about them.
During my research I have obeyed general nettiquette as described in [Mann et. al, 2000: 59-62] and the rules regarding netiquette of the Department of Informatics at the University of Oslo.
The following chapter will discuss the organisational structure of the FreeBSD Project. It will not include the community surrounding the project.
A contributor is a person who has done at least one of the following:
Developed software that FreeBSD is derived from or integrates
Made a change to FreeBSD that has been committed into the repository
Donated funds, hardware, entire servers and network connections
Sent a Problem Report
[FreeBSD, 2002F] has a list of the entire list except for contributors sending problem reports. These make up 1467 contributors that are not committers.
By going through the problem reports database, noting down only unique names by manually removing all duplicates on the letter 'A', I got a factor of 0.8870 for which I had to multiply the number of report originators with. Removing active committers from the count, we get an estimate of 6538 contributors. [1]
Acknowledging tha