The Antlion Site - Setting up Repositories

Antlion

Welcome

License

How-To Guides

Getting Started

Libraries

Artifacts

Subprojects

Repositories

Policy Strategies

Format Strings

Extending Antlion

FAQ

Tutorials

First Tutorial: Simple

Ant Tasks

<artifact>

<libraryDef>

<library>

<library-policy>

inner processors

inner repositories

<library-type>

<library-repository>

<library-urlrepository>

<library-mavenrepository>

<library-repositoryset>

<create-artifact>

<subprojects>

<run-subproject>

<replace-target>

Optional Tasks

About optional tasks

RegexpTokenFormatter

Prev Next

Setting Up Repositories

This document contains some notes and pointers on setting up a third party repository. Don't expect a "how-to" guide here. These repositories can vary vastly depending on the content, available resources, restrictive network access, and the culture of the shop owning the build.

If you're going to have a local file-based, control managed, or intranet-based repository of library files, then this document can help give you some ideas as to how to go about doing that. If you're going to use the Maven internet-based IBiblio repository, then this file won't be of much help.

Finding a Home For Those Files

The first thing you'll need to do for your repository is find a home for them. Here's a quick list of some ways they can be stored and accessed, along with some pros and cons:

Local file-based: simplest approach, put all the libraries on the local user's harddrive. This allows for the fastest access time. However, it means that as the number of libraries grow, the user's harddrive will also grow. Also, in really large projects, or systems with multiple projects, the user may have end up storing a bunch of library files that they won't ever care about. Also, the user, if so inclined, could go and muck with the libraries without the build ever being the wiser. This leads to version control issues.

Shared network drive: all the libraries are stored on one central machine in the intranet, and all the developers point their repository root to this shared network drive. This avoids many pitfalls for the local file-based version. The drive can be restricted to read-only access, allowing for version control issues to be negligent, and the users don't need to worry about managing disk space. However, the users will find this to be a slower method than the local file-based repository, and now an administrator needs to maintain this machine, and keep the repository files up-to-date.

Network-based: similar to the shared network drive, but this uses a common internet protocol (such as ftp or http) to retrieve the libraries. It has all the plusses and minuses of a shared network drive, but it can be used for internet as well as intranet access. In these cases, because the network lag is normally very large, the user will want to cache copies of the library files on an as-needed basis. This presents its own share of headaches: if the library is updated but the location isn't, then the cache needs to discover that, and issues related to harddrive growth that the local file-based repository has are also present, but not as bad, and easily rectified with a cleaning of the cache directory.

CM-based: control managed ("CM" for short) repositories look just like local file-based repositories to the user, except that the list of libraries is pulled from the CM tool (e.g. CVS, Perforce, ClearCase, Subversion, MKS). This has the advantage that, now, libraries can be tracked with the code, the libraries are backed-up with the code, and the users can download the libraries when the download the code. However, with the right set of scripts, you can also make a CM-based repository look like the shared drive- and network-based respositories.
Future note: there's a feature being considered that will make a CM-based repository act like a network-based repository: only the required libraries will be pulled from the CM tool.

For any approach that you take, any serious project will require backups of the repository. CM-based repositories help in this respect.

Structuring Your Repository

So, now you have a place for your libraries to live. Next, you need to decide how to put them in the repository. This section describes some options.
The Maven Hierarchy
Probably the easiest way to start is to use a well-known repository setup. Maven has its primary internet repository hosted at ibiblio. The Maven documentation describes its layout in the Antlion form of [groupid]/[type]s/[artifactid]-[version].[type]. This means that every version of the group's jars are inside one directory, and a version release of a group will have its individual files split between directories. That is, if log4j releases the XML DTD file as well as the jar file, then you'd have a directory tree like (excuse the ASCII art):
repository-root
  `- log4j
       +- jars
       |    +- log4j-1.2.7.jar
       |    `- log4j-1.2.8.jar
       `- dtds
            +- log4j-1.2.7.dtd
            `- log4j-1.2.8.dtd
This splits the contents of each version apart from each other, and makes finding the contents of a release not simple. It also generally requires the repository administrator to change the names of each of the artifacts to conform to the repository, and to sort them into the correct buckets.

For sites that require the software used to build the project in CM, such as the application servers and the JDKs, this model breaks down, and an alternate method must be used for those artifacts. As an alternate example, some sites include the expected version of Ant to build the project with, so that they do not have users needing to tweak their Ant install. Since the Ant build scripts expect a very specific directory layout, these files cannot fit into this hierarchy.
By Version Hierarchy
An alternative to the Maven repository hierarchy is to organize the artifacts into per-release buckets.
repository-root
  `- log4j
       +- 1.2.7
       |    +- log4j-1.2.7.jar
       |    `- log4j.dtd
       `- 1.2.8
            +- log4j-1.2.8.jar
            `- log4j.dtd
This allows the repository maintainer to contain all the related files to a release in one tree. Even things like an entire app server can be stored correctly in this format. It also means that the original distribution's filenames can be kept, though this is optional, as it can lead to confusing policies.
Release And Version Hierarchy
To accomodate for the need to support both direct jar access and distribution access, these two can be combined for more easily maintainable hierarchies at the sacrifice of disk space. Keep in mind that, even though "disk is cheap", it's only cheap until you have to pay for it, especially in SCSI RAID configurations.
repository-root
  `- ant
       +- 1.5.1
       |    +- ant-1.5.1.jar
       |    +- ant-optional-1.5.1.jar
       |    `- distribution
       |         +- lib
       |         |    +- ant.jar
       |         |    `- optional.jar
       |         `- bin
       |              +- ant.bat
       |              `- ant
       `- 1.6.5
            +- ant-1.6.5.jar
            +- ant-nodeps-1.6.5.jar
            `- distribution
                 +- lib
                 |    +- ant.jar
                 |    `- ant-nodeps.jar
                 `- bin
                      +- ant.bat
                      `- ant
This allows the repository maintainer to contain all the related files to a release in one tree. Even things like an entire app server can be stored correctly in this format. It also means that the original distribution's filenames can be kept, though this is optional, as it can lead to confusing policies.

Locally Built Libraries

Antlion supports the <artifact> task so that module dependencies can better talk to each other. All these artifacts can be referenced by a single repository set that points to the module tree structure.

Inter Project Dependencies

Situations may arise where there are two separate projects, and one has a dependency upon another. Since these projects are designed to be built completely independently of one another, a scheme must be devised to discover the built artifacts.

Relying on the expectation that the dependent project keeps the right built version around after its build can lead to incorrect builds. If the dependent project failed its build, then it might have either no generated artifacts, or incorrect generated artifacts.

Putting all built artifacts inside a common location and using the "SNAPSHOT" terminology of Maven can lead to other errors, as well. If the dependent project can be built from multiple branches, then there is no reliable way for the other project to get the right version.

Also, there may be a need to publish these project artifacts, so that developers can merely access the most recent build for projects they aren't touching, rather than being forced to get the source and build the project.

Prev Next

Document version $Revision: 1.6 $ $Date: 2005/09/23 07:35:45 $