main > some notes on distribution

some notes on distribution

Work in progress here...


Presumably the starting point is to work with what we have. But I'm listing alternatives here just so I have them all to hand.

We're a windows-only shop, right? Shame, because a lot of the free stuff out there is unix-oriented.
Question: this definite?

Question: are we going to be a big old compute cluster, or is all this stuff going to be shared across the org later on in a more grid-like fashion?

application-level tools

The following is a rough taxonomy, and lines are blurred - for instance, the GrADS project is attempting to develop a compiler to generate Grid-aware code that will select appropriate execution models at runtime.

programming models

application execution environments

  • Parameter Sweep Applications (independent subtasks)
  • Workflow Applications
  • Portals (Condor-GPUNCH)

middleware of potential interest

timeline

1998: Condor starts.

1991: PVM version 2.0 released.

1993: PVM version 3.0 released.

1996: Globus starts. It eventually emerges as a de facto standard for Grid middleware.

1999: PVM version 3.4 released.

2000: Globus Toolkit 1 released

2001: The Global Grid Forum (GGF) is formed, and eventually becomes a focal point for standardisation via the Open Grid Services Architecture (OGSA). Globus is co-operating with the OGSA intiative.

2002: Globus Toolkit 2 released

2003: Globus Toolkit 3 released
2003: MPICH-NT released

2004: PVM version 3.4.5 released.

2005: Globus Toolkit 4 released
2005: Condor 6.7 released
2005: MPICH2 released

As far as major vendors go:

too many standards!

  • GGF (OGSA, OGSA-DAI, DRMAA, GridFTP, GridRPC, ...)
  • OASIS (WS-Resource Framework, WS-Security, WS-Transactions, ...)
  • W3C (SOAP, WSDL, ...)
  • EGA (Reference Model, Security Requirements)

architecture stuff

worktools to sort out sooner rather than later

Software:

  • Download ACE.
  • Install MediaWiki or something else with some structure!
  • Download CppUnit if we don't already have unit testing stuff

Bring:

  • stuff on ACE
  • papers on Condor etc

basic requirements

The first thing for me to understand is the problem(s) we are trying to solve, together with relevant context in the organization.

Stuff includes, but is not limited to:

  • who is paying for all this
  • history (how did we get here)
  • revisiting requirements in existing design docs
  • the universe of jobs across every relevant business area
  • the value and urgency of the various job types
  • other people in the organization who are looking at this kind of problem
  • other similar systems already in the organization (even if no longer used)

operating systems, getting updates out

Remote Desktop is not enough. We need to be able to upgrade everything across '000s of machines, no matter what kind of state they're in. This has got to be doable no matter what's running (or not running). The less of a custom solution here, the better - this surely must be something that's been solved elsewhere, and solved well. Can we reboot over the net off a stored image, and just switch the images - or something equivalently easy? See Leveraging HTC For UK eScience with Very Large Condor Pools, though we'll need something (a) larger and (b) more dedicated.

topology, scalability, etc

Do we want stuff on subnets in pools ("farmlets")? Probably. Event logging and job packages would then best be rooted via the pool topology. How big should a pool be? Depends on your I/O.

Here's the Winsock FAQ Page on I/O Strategies. They recommend Overlapped I/O for maximum throughput (Winsock 2 only, not that that should be a problem). There's a code sample here that aims for 2000+ clients with only 4 threads on a dual processor box. That's more like it! As far as ACE goes, there is an article on artima that contrasts some possible approaches and recommends the ACE Proactor since that's the one that can use Windows overlapped I/O (the article itself suggests an approach where trying to write a portable solution that can bridge reactor/proactor approaches).

Further reading:

job allocation and pool boundaries

Are we going to allocate jobs across pool boundaries, with some newly spawned process in charge of checking for completion? We'll get fragmentation problems a la heaps if so.

distributed event logging

Good place to start is the ACE Logging Service, which is documented on the ACE Network Services page. Do we need anything more than that?

Can our topology handle whatever we want, or do we need find another way to treat event messages - TCP? UDP? Tibco?

deploying test job binaries/support files

It needs to be fast and simple to deploy test job binaries across many machines so we can check things out. This is as distinct from the real low-level stuff we have running all the time. Otherwise it'll be too hard for us to test different approaches.

high-speed networking

Getting technical. How much effort to put into this one up front? Or will it be a subject for experimentation? See the Supercomputing 2005 Bandwidth Challenge Results for a snapshot of current work in this area.

commercial price points

  • Data Synapse: "$50-100k for < 50 procs, 6-7 figures for larger" (here)
  • Data Synapse: apparently goes down to $750/node when over 500 nodes, and enterprise is $2m (mate of JD's)
  • Sun N1 Grid Engine: "$10k for < 50 procs, $150k for 10k procs, $600k for enterprise" (here)

notes on various tools

Worth spending some time on the Windows Cluster Resource Centre at Southampton Uni.

Sadly, Globus appears to be java or unix-c only (globus toolkit 4.0.1) - though apparently there was a WinXP port attempt in 2004 - java "cog" (community grid) kit is probably the best option (download here).
Question: how do people feel about java middleware?
Question: is Globus appropriate for us? It's aimed at the larger, more heterogenous end of the grid family...

Condor has been around for ages, has thousands of users, and the originators have been using it for 1500+ workstations for years. Supported on WinXP/2000. However, Globus and PVM support isn't yet implemented on Windows (see here), and it's not open source, (although it does have its own public license). Also, the job submission is coarse-grained (executable and all data for every job).

bsp is not a hive of industry, seems pretty dead compared to the PVM-MPI coalition.

pvm worth checking out - however need extra stuff to kick off processes on remote machines, need winrshd, ataman rshd or similar. Not expensive, eg winrshd only 3,500 USD for sitewide license.
Question: do we already have a reasonable remote process starter? Quite possibly so!

mpich2 is also worth a look, need to try it on a .net-enabled site, don't know for sure how it creates the daemons, but one assumes an rsh-alike is required for this also. Note that the MPI site also contains mpich-g2, a globus2-enabled implementation of mpi.

Then there are commercial routes, most obviously data synapse.
Question: have license fee structures (eg per-processor? other?) been checked out?

Also, Microsoft's Windows Compute Cluster Server 2003: Beta 2, for the brave/foolhardy. 64-bit only, don't know if that's likely to be a problem.


Top