TODO Group (Talk Openly Develop Openly), has been working on assembling a tool-filled “Open Source Program Office in A Box” starter package, which would give companies the ability to launch their open source efforts with a cohesive, pre-assembled kit of tools. The starter package isn’t yet ready, but the hope is that eventually it could make it easier for a company to deploy and configure the tools they need with less initial effort. Some of the members of the TODO Group working on this project include Adobe, Capital One, Comcast, Facebook, Google, eBay, IBM, Microsoft, Samsung, and Twitter.
Along with the proper tools, companies should also incorporate central dashboards which allow them to monitor and track their open source projects and development in real time. Many companies likely have such dashboards for existing development work and applications and may be able to integrate the existing dashboards with their open source work. If not, they should create or adopt new dashboards to improve the management of their open source deployments.
“On dashboards, there are many ways to create them, and it’s really an art in terms of how companies want to display development-related information. Some people build these fancy screens with rotating dashboards, but the key thing is to have a central location, preferably co-located with your existing dev dashboards, where people can go to learn more about open source project health, metrics, and so on.” – Chris Aniszczyk, COO of the Cloud Native Computing Foundation.
The abundance of tools available for managing and reporting on open source projects can quickly become overwhelming. If your open source program is just getting started, it helps to focus your research on just a few of the basic tools that you’ll need to get up and running.
Then as your program grows and you’ve gained more experience using these tools, you can start to adopt new tools to help you automate and streamline your processes as the need arises. Remember that you want the tools you choose to complement and support your internal culture and processes – not lead them.
The sections below give the basic categories of tools that pretty much all open source programs use on a daily basis. This is a good way to organize your research.
Tools which automate processes are among the most important you will select and use for your company’s open source program. The tasks for such tools are broad, including automating procedures for contributor license agreements (CLAs), which are legal documents stating that a developer created the code and didn’t copy it from anywhere else illegally. Traditionally these kinds of agreements were done manually by printing out the agreements and then signing and faxing them in to comply. But in a world of email and instant communications, that’s crazy today. Instead, the process can be automated using bots that request electronic signatures and then track and handle the submissions.
Other automation tools can tell you who exactly is contributing to your projects and can help
remove procedural friction which slows down progress in projects as they get larger and scale to meet the needs of companies.
In Microsoft’s open source program office, where some 8,000 repositories are managed on GitHub involving some 11,000 contributors, about 40,000 internal requests came in to use open source in projects in 2016, according to the company. To manage those requests as well as the code that’s created and the code versions which are being updated,the company turns to tools which can automate the chaos. And because the code is likely being used in potentially hundreds of other projects, it must be tracked carefully so that if a security bug arises all software impacts can quickly be mapped out and fixed. At such a large scale, automation is critical and manual updates would be almost impossible.
Other important tools to be considered and acquired are those which help manage critical tasks, such as project management, tracking project health and ensuring clear and quick communications between developers, open source communities, and others inside a company.
Most corporate software projects being developed through open source program offices use GitHub as their centralized hosting and development platform.
GitHub is an online source code management site that allows open source developers to manage and house their code in a central “repository” or storage space where participants can collaborate and build their code together. Some 64 million open source coding projects are hosted within GitHub today, involving some 23 million developers.
GitHub users can add code, review submitted code, propose changes, get and offer feedback and provide project management using the service. GitHub uses the Git Version Control System, the open source project developed by Linux creator Linus Torvalds which provides organization for the code and people who are collaborating on open source. Each “contributor” has their own copy of the project repository they are working on, where they can make changes on their own computer and then submit it back to the project for future inclusion. That “pull request,” (example here) or code contribution, is then reviewed, discussed, modified and approved or rejected by the project organizers.
Also important are code scanning and compliance tools, which help track code provenance and license requirements. It’s important for companies to watch over the open source code being brought into its own infrastructure, products, and services to ensure license requirements are met.
Your applications, for example, could include several thousand open source components. To protect your company from legal issues it’s critical to know these details. In scenarios that are high risk, users must dive into the code to deeply validate and verify that the licenses are what they say they are, depending on where your business is on a risk spectrum. (See our guide on using and distributing open source code.)
“You must understand your risk profile, because in the end scanning is all about risk management. You can stick your head in the sand at one end then just trust and hope that you are OK. Or you could say ‘If I get sued, it’s going to devastate my business.’ You need to really be sure. So, you crack open the package and you look through all the lines of code and you find everything that could possibly be in there.” – Jeff McAffer, director of the Open Source Programs Office at Microsoft.
As we discussed earlier GitHub is the go-to source code management system for most open source program offices these days. But GitHub alone won’t meet all your program’s code management needs – especially as you scale up your efforts.
Some of the tools used in the world of open source are aimed at improving GitHub itself by adding features it lacks, such as support for checking Developer Certificate of Origin (DCO) statements to be sure that code can be legally licensed and used in an open source project.
GitHub also has some deficiencies when it comes to code reviews, so there are available tools that can automatically send questionable code back to the contributors who created it and ask them to review and make needed changes. GitHub doesn’t have a way to force someone to review their code, so these clever tools make that happen to improve workflows.
Other GitHub-specific tools expand on GitHub’s performance metrics capabilities, which tend to be very project specific rather than providing detailed information across whole organizations. For companies that maintain many open source code repositories across multiple GitHub projects, better tools are needed to organize and aggregate them to make sense of it all. A wide range of such tools are available from Amazon, Netflix, and Microsoft to help with those tasks.
Here are some of the most popular and useful source code management tools which can streamline and help your GitHub presence:
Source code scanning and license compliance
Antepedia Reporter – A commercial, fee-based application from Antepedia, Reporter is a report-generation product which lets developers, project managers, legal advisors and others create license compliance audits and IP rights management reports about the open source, public and private components in your code base.
Black Duck Hub – The commercial Hub service scans code to identify all embedded open source components, and then automatically searches for known vulnerabilities for remediation. It can send alerts when new vulnerabilities are found in your code.
Black Duck Protex – Protex is a commercial, fee-based license compliance management tool from Black Duck which integrates with existing tools to automatically scan, identify and inventory open source software, while also enforcing license compliance and corporate policy requirements.
Copyright review tools – This collection of command line tools help make initial copyright file construction and subsequent review and update easier.
dep-checker – A dependency checker tool from The Linux Foundation, dep-checker performs a complete analysis of linkages between code packages.
FlexNet Code Insight – Flexera, which acquired licensing compliance vendor Palamida in 2016, offers FlexNet Code Insight to help automate corporate open source use among developers, legal teams and security staffers.
FOSSA – This is a commercial tool that automatically performs code dependency tracking, license compliance scanning in the background.
FOSSology – A Linux Foundation project, FOSSology is an open source license compliance software toolkit which can run license, copyright and export control scans from the command line. A database and web UI are also available to create compliance workflows.
janitor.git – Code Janitor is an open source tool that helps evaluate source code for compliance with open source licenses. From The Linux Foundation, Code Janitor can be used with other products to check code.
LicenseFinder – Detects the licenses of the code being used in your projects, compares those licenses against a user-defined whitelist and then provides an actionable report.
Protecode Enterprise Analyzer – This commercial application is used to analyze and identify all code in any directory to determine code ownership and ensure open source license compliance based on predetermined internal policies.
scancode-toolkit – From nexB, the ScanCode suite of utilities scans code for licenses, copyright and dependencies to find, discover and inventory open source and third-party components used in your code.
SPDX – The Software Package Data Exchange (SPDX) specification is a standard format used to describe the components, licenses and copyrights associated with software packages. The SPDX standard aids compliance with free and open source software licenses by standardizing the way license information is shared between developers and companies. The SPDX specification is developed by the SPDX workgroup, which is hosted by The Linux Foundation. The group offers open source tools to help users of SPDX documents.
WhiteSource – Provides licensing, security, code quality and reporting analysis for managing open source components in real-time by automatically and continuously scanning dozens of open source repositories.
Bugzilla – Server-based software featuring an advanced query tool that can remember searches, integrated email capabilities and a comprehensive permissions system. Bugzilla is used by Mozilla as its bug tracking system.
GitHub Issues – GitHub’s own integrated feedback and bug tracker, GitHub Issues is available as part of GitHub’s project hosting.
GitLab – This bug tracking tool unifies issue tracking, code review, Git repository management,
activity streams, wikis and more in a single UI to assist your open source projects.
JIRA – From Atlassian, JIRA contains custom filters, developer tool integrations, customizable workflows and rich APIs to integrate JIRA with other applications.
Artifactory – Also from JFrog, Artifactory is a repository manager which supports software packages created in any code language. It integrates with all major DevOps and continuous integration and continuous deployment tools.
Bintray – An archiving tool from JFrog that allows companies to publish their code release archives to maintain storage for older and larger files.
Docker Hub – A cloud-based registry service which allows users to link to code repositories and build and test their images. It also stores manually-pushed images and links to Docker Cloud so users can deploy images to project hosts. Docker Hub is a centralized resource for container image discovery, distribution and change management, collaboration and workflow automation throughout the development pipeline.
github-release – The built in functionality part of GitHub which lets users package and edit releases of projects on GitHub so they are available for use by other community members.
Monitoring and tracking the overall health of open source projects as they grow and mature is a core task for an enterprise open source program. To accomplish it, you must gather tools which report on how individual open source projects are performing and being received by their communities – often across dozens, hundreds or even thousands of projects at once. The tools also must be able to roll the data into meaningful, useful, and actionable information about overall project performance across your entire open source portfolio.
The bottom line here is it’s all about the critical and useful insights you can glean from the data – not about vanity metrics such as detailing how many “watcher” stars a project has logged, how many contributors have been part of the project since its start, or other metrics that lack important context.
The best project health tools must also help the project teams be responsive to the communities which support their efforts and encourage engagement and diversity with contributing developers. That means the tools help maintainers quickly respond to questions or feedback posted by community members so they remain enthusiastically engaged and don’t get bored and move on to other projects.
Some open source communities will have large groups of contributors, while others will have small niche groups of community members. The project health tools need to be able to work with projects of all sizes.
“Regarding existing tools and systems, my hope is that we’re quickly getting to a point where a company’s open source program office should not need to create any tools or technologies on their own. They should be able to find and use existing open source tools which can be used to manage their open source programs.” – Jeff McAffer, Director of the Open Source Programs Office at Microsoft
Here are some of the most popular and useful project statistics and project health tracking tools:
“The goal is to have the tools, along with transparent data and metrics-related information, which can be used to guide the organization.” – Chris Aniszczyk, COO of the Cloud Native Computing Foundation
The TODO Group also offers a helpful list that adds other tools as well:
CLA Assistant – Contributed by SAP, the CLA Assistant streamlines workflows by handling the legal side of contributions for users. The Assistant asks code contributors to sign CLAs as they make their code contributions and authenticates each contributor with his or her GitHub account. It also updates the status of a pull request when the contributor agrees to the CLA and automatically asks users to re-sign the CLA for each new pull request if changes are made to the CLA.
CLA Portal – From VMware, CLA Portal adds a workflow to enable contributors to digitally sign a CLA for pull requests to your GitHub repositories. When a developer opens a pull request, they are prompted to sign the agreement if needed. Also included is an administrator interface for CLA authoring, CLA-to-project mapping, and agreement reviews.
DCOB – A Developer Certificate of Origin Bot which helps to enforce developer certificate of origin sign-offs for each code change in a pull request. The DCOB sets the status for each accepted code change, as required by the Developer Certificate of Origin.
Of course, open source development isn’t just about the code. It also requires healthy communications and collaborations between a diverse group of people who are working on the projects inside and outside of enterprises,as well as by staff members in a company’s Open Source Program Office.
For that developers can lean on tools they may already be using for other projects, including Internet Relay Chat (IRC), where developers can post inquiries and get quick responses to development-related topics. Another example is TWiki, which is an open source enterprise Wiki and web collaboration platform where developers can discuss code and projects and related topics.
Communications can also be fostered through social media platforms, web portals, open source project repositories and other places where input, questions and discussions can be found and fostered.
Other useful tools include mention-bot from Facebook, which can help get fast input turnaround on pull requests by automatically mentioning potential reviewers for reviews. This is especially appreciated when GitHub projects get too big for community members to subscribe to all of a project’s notifications.
Then there’s Slack, which is an online team project management and communications platform where users can access and share messages and files, organize workflows, perform searches for information and more. Slack can be configured to receive notifications for support requests, code check-ins, error logs and other tasks as well.
And don’t forget your company’s public relations and marketing staff when it comes to shouting out your company’s participation and support of open source. Social media accounts with sites including Twitter, Reddit, Facebook, LinkedIn, Google+ and others are important, as well as the use of internal and external blogs and websites. Customer Relationship Management (CRM) software, as well as email blasts and newsletters, can help companies keep customers and clients informed about their open source progress.
When it comes to the tools your company provides and uses for its corporate open source projects, the most important ones are arguably those which help companies manage their corporate-scale GitHub operations. GitHub is a perfect platform for many operations, but for large, complex companies such as Google, Microsoft, Facebook, Twitter, LinkedIn and others, there can be many limitations to using the standard GitHub offerings.
Large enterprises need many more capabilities, including things like identity management, settings and permissions management, security and two-factor authentication enforcement, as well as deeper means to understand and track code repositories.
That’s where specialized, automated tools often need to be built to handle tasks such as onboarding, offboarding, enforcing security policies and giving developers request access to repositories.
Microsoft responded to its own unique requirements by building its own tools to handle many such tasks to streamline and improve its open source program. Microsoft has a healthy presence on GitHub, with some 1,345 repositories and involving about 3,580 developers to date.
“That management of your GitHub presence is something that as you scale, it becomes important. You get a GitHub organization, which is a collection of repositories, and then you get members and you have teams. Managing all of that stuff becomes a little bit complicated, especially if it starts to scale out to hundreds of repositories, hundreds of people and multiple organizations on GitHub.” – Jeff McAffer, Director of the Open Source Programs Office at Microsoft
One of the things Microsoft created was a custom-built self-service GitHub management and onboarding portal for organizing its projects, repositories, and teams. On its simplest level, the web-based portal allows developers to map their Microsoft company ID to their GitHub ID, which bolsters system security and helps simplify the organization of large numbers of developers who are involved in large numbers of important projects.
The portal also lets employees authenticate with GitHub and Microsoft, which creates a “virtual link” of their identities so they can do their work while giving them needed permissions for tasks depending on their work roles. If employees leave the company, the system can be adjusted to remove or reclassify their access rights as needed.
The portal runs on one or more cloud servers and relies on a cache to help with sessions and reduce pressure on the GitHub API. The Microsoft portal, which averages about 1,000 unique users daily as a tool for its engineers, is part of the company’s growing open source efforts, which now includes more than 10,000 engineers who are using, contributing to and releasing open source code.
Hey, nobody said it was going to be simple to move your company into the world of open source. But plenty of other companies, including giants like Microsoft and Google have done this before you and have provided detailed road maps, code, suggestions, and more to make your own journey easier.
The creation of an open source program office and the selection of a package of critical tools to get your efforts started are within your grasp. And they are likely already inspiring great anticipation among your developers, many of whom are probably already contributing to open source projects on their own (or at work, under cover of darkness).
By collaborating on open source projects and inviting others to collaborate with you, your company can gain immeasurable benefits and drive its progress forward with energy and innovation.
Having the right tools is critical to empowering your company’s open innovation.
Contributors: * Chris Aniszczyk, COO of the Cloud Native Computing Foundation. * Jeff McAffer, Director of the Open Source Programs Office at Microsoft.These resources were created in partnership with the TODO Group: the professional open source program networking group at The Linux Foundation. A special thank you to Pam Baker for writing assistance and the open source program managers who contributed their time and knowledge to making these comprehensive guides. Participating companies include Autodesk, Comcast, Dropbox, Facebook, Google, Intel, Microsoft, Netflix, Oath (Yahoo + AOL), Red Hat, Salesforce, Samsung and VMware. To learn more, visit: todogroup.org.