SoftWare Hash IDentifier
| Full name | SoftWare Hash IDentifier | 
|---|---|
| Acronym | SWHID | 
| Example | swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd | 
| Website | swhid.org  | 
The SoftWare Hash IDentifier (SWHID) is a persistent identifier used to uniquely identify a particular piece of software source code and its version. SWHID is a standard similar to the DOI, but is tailored specifically for software source code,[1] compatible with versioning software such as git.
An SWHID can be used to point to different components or versions of the source code of a software package.[1] The SWHID is an intrinsic identifier in the sense that it describes the software based only on the software's intrinsic properties, with no reliance on an external register.[2]
Format
The SWHID specification allows identifying different components of software source code. Object types relating to the software version are labelled as "snapshot", "release" or "revision"; a "directory" of files and possibly subdirectories can be identified; and a specific piece of a specific version of source code can be labelled as "content".[1] These are related to one another in a Merkle directed acyclic graph.[3]
The identifier has the following syntax:[4]
swh:<scheme_version>:<object_type>:<object_id>[;qualifiers]
Examples
According to the French National Centre for Scientific Research (CNRS), software source code archived with SWHIDs includes the source codes of Apollo 11 navigation and of the NCSA Mosaic web browser.[5]
Version 3.0 of the Linux kernel, released in July 2011, has the following SWHID:[6]
swh:1:dir:df32c75242bf8d797ccd43af8ce8e294f35cd8fd
The following example, drawn from the specification documentation,[7] illustrates the use of multiple qualifiers in an SWHID:
swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15
Standards
SWHID is an open standard licensed under the Community Specification License.[8]
SWHID was formalized as the ISO 18670 standard in April 2025.[9]
Creation and history
The SoftWare Hash IDentifier was developed by Software Heritage. Software Heritage's archives, identified by their SWHIDs, were publicly released starting in 2018.[5]
As of 2020, SWHIDs were in use for about nine billion versions of pieces of software,[5] termed "artefacts".[4] SWHIDs are integrated with research repositories including HAL, Zenodo and the French catalog of Academic Research Free Software.[10] The identifier can be used by package managers. Guix uses SWHIDs to retrieve source code in a software archive when unavailable at its original URL.[11]
The acronym SWHID originally referred to "Software Heritage Identifiers" used to catalog software artifacts in the early days of the Software Heritage archive.[12] It later evolved into an open standard through a dedicated working group[13] and was standardized as ISO in April 2025 under the more general name "Software Hash Identifier".[14]
Télécom Paris welcomed the ISO normalization arguing that it is a significant step in global digital infrastructure, providing traceability of software affected by vulnerabilities.[15] UNESCO stated that SWHID is useful for the reproducibility and long-term accessibility of software.[16]
References
- ^ a b c Sabrina Granger; Baptiste Mélès; Frédéric Santos (15 November 2024), Préserver et rendre identifiables les logiciels de recherche avec Software Heritage [Preserving and identifying research software with Software Heritage] (in French), doi:10.46430/PHFR0034, Wikidata Q134581061, archived from the original on 26 May 2025
- ^ "Intrinsic and Extrinsic identifiers". Software Heritage. Retrieved 2025-05-24.
- ^ Roberto Di Cosmo; Morane Gruenpeter; Stefano Zacchiroli (1 September 2018), Identifiers for Digital Objects: the Case of Software Source Code Preservation (PDF), doi:10.17605/OSF.IO/KDE56, Wikidata Q105094730, archived (PDF) from the original on 26 May 2025
- ^ a b Axel Thévenet (26 September 2023), SWHID: Tracking past software for future humans, Wikidata Q134580517, archived from the original on 26 May 2025
- ^ a b c Le CNRS apporte son soutien à Software Heritage [The CNRS supports Software Heritage] (in French), French National Centre for Scientific Research, 25 November 2020, Wikidata Q134581205, archived from the original on 26 May 2025
- ^ "Release v3.0 of torvalds/linux repository". Software Heritage. Retrieved 2025-05-24.
- ^ "Qualified identifiers". swhid.org. Retrieved 2025-05-27.
- ^ "Copyright Section of SWHID Specification v1.2". Retrieved 2025-05-24.
- ^ "ISO/IEC 18670:2025". ISO. Retrieved 2025-05-24.
- ^ "About the site". French Catalog of Academic Research Free Software. Retrieved 2025-05-24.
- ^ "Identifying software". GNU Guix Blog. Retrieved 2025-05-27.
- ^ "SoftWare Hash IDentifier (SWHID)". Software Heritage. Retrieved 2025-05-24.
- ^ "SWHID working group". Retrieved 2025-05-24.
- ^ "ISO/IEC 18670:2025". ISO. Retrieved 2025-05-24.
- ^ Une avancée significative pour l'infrastructure numérique mondiale : La norme ISO/IEC 18670 est désormais officielle [A significant advance for global digital infrastructure: the ISO/IEC 18670 standard is now official] (in French), Télécom Paris, 20 May 2025, Wikidata Q134580605, archived from the original on 26 May 2025
- ^ Archiving open software as human heritage, UNESCO, 2023, Wikidata Q134581397, archived from the original on 26 May 2025
