WORK IN PROGRESS | Services are in active development and are subject to change.

Charity Engine is "the crowdsourced cloud" – a high-throughput compute service running on a global network of volunteered devices.

Our distributed platform provides an ecosystem of compute, storage, and web crawling capabilities that can be used independently or as integrated services to support goals such as data discovery, processing, analysis, and other purposes.

Our integrated scheduler means there's no need to install and manage batch computing software or server clusters, allowing you to focus on data science rather than data infrastructure.

#high-throughput, #batch, #serverless, #distroless, #edge, #fog

Getting Started with Computing

From one-off compute jobs to huge, batch workloads, get up and running fast by following our Quick-Start Guide for Computing; or continue reading below for further details.

Getting Started with Web Crawling

For high-throughput, globally distributed web crawling, get up and running fast via our Quick-Start Guide for Proxy; or read more in "Web Crawling + Proxy" below.

Contents

Computing

Overview. Compute jobs can range from a single small task to a massively parallel batch of jobs spanning hundreds of thousands of nodes. The integrated batch scheduler will match jobs to available resources and manage their execution: your input files and application container (*where relevant) are all that's needed.

Compute jobs are run by executing standard Docker images from Docker Hub, custom docker containers, or participating proprietary applications (for an additional fee).

Interfaces. A variety of interfaces are available; all are sufficient for running jobs or batches on the Charity Engine network, but each is tailored for specific use cases.

The graphical web interface makes manual submission of a few jobs easy and is especially useful for testing.
The Remote API allows programmatic integration with existing systems.
The Remote CLI is best suited for scripted efforts and massively-parallel execution with tools such as GNU Parallel.

Once jobs are submitted through any of these interfaces, the integrated batch scheduler will match jobs to available resources and manage their execution: your container and input files are all that's needed.

Applications. You will need your docker image name/URL or a proprietary application name, your input files, and the command line to execute within the executed environment. The command line can be as simple as executing an application with a docker image, a command to execute the first input file, or a more involved script that handles the logic of your job.

Execution Environments. It's possible to run code in a range of languages directly on Charity Engine, by specifying the relevant execution environment.

Input+Output Files and Networking. All nodes have limited network access for retrieving input data via files submitted during job/batch creation and/or from internet resources available via HTTP(S).

Instance Types. Resources are provisioned in a familiar “instance type” system, where various compute nodes are made available in uniform sets of instances, differing in CPU capacity, memory and disk availability.

Getting Started. To start computing, obtain credentials to the system and submit jobs through one of the supported interfaces. Take a look at the Quick Start Guide for Computing, or read more below.

Interfaces

Remote API

This API is provided for integration of Charity Engine as a backend service, and summarizes functions for managing the entire lifecycle of jobs submitted to the network. See: Remote API

Remote CLI

The Remote CLI provides a means to run jobs on Charity Engine compute resources using simple command line tools. (e.g. custom scripts or tools such as GNU Parallel). This interface is geared toward running large batches of jobs, but can be used anytime a command line approach is more appropriate than a web API.

Standalone command-line interface: See: Remote CLI
GNU Parallel (using Parallel allows you to manage batches of work using just a single command line tool)

Wolfram Language / Wolfram Engine / Mathematica

Charity Engine resources can be accessed directly from within Wolfram Language. For details, see Wolfram Research documentation here.

Ethereum

Smart Contract
Dapp UI (*contact us for demo access)

Web

Charity Engine Dashboard

Custom Arrangements

Please contact us with any special needs, which may not be covered above.

Applications

Charity Engine supports several types of applications:

Docker images available on Docker Hub (e.g. docker:python:3.7)
Custom Docker images available anywhere online (e.g. docker:image-name https://example.com/file for 64bit docker images, or docker-x86:image-name https://example.com/file for 32bit docker images)
Applications deployed directly on the Charity Engine network, both open-source and proprietary (e.g. charityengine:wolframengine). For a list, see our App Library.

Each job should be a fully independent, self-contained unit of work. Structure your parameter space so that each job covers a distinct range — for example, a starting offset or task index passed as a command-line argument — and runs a predefined number of iterations calibrated to approximately one hour on an average CPU (e.g. a modern i5 or Ryzen 5).

Example: suppose your application searches a parameter space defined by a single integer offset N. Calibrate how many iterations complete in one hour, then assign each job a distinct starting point:

Job 1: process range starting at 0
myapp --start 0 --count 10000Job 2: process range starting at 10000
myapp --start 10000 --count 10000Job 3: ...
myapp --start 20000 --count 10000The --count value (iterations per job) should be determined so the expected wall time is approximately one hour. If the workload is uneven across the parameter space, err on the side of shorter jobs and submit more of them.

See instructions on the appropriate interface (section 1.1) for further execution details.

Execution Environments

To run raw code on Charity Engine, just upload your code and specify the relevant execution environment, either using a "named" environment in our App Library, or by specifying an appropriate container on Docker Hub or the web (see 1.2 "Applications", above)

A simple example in python would be to run a "hello world" command:

hello-world.py

print("hello world")

Upload and execute this file in Python by using the Remote CLI:

ce-cli --app "docker:python:slim" --commandline "python /local/input/hello-world.py > /local/output/hello.out" --inputfile hello-world.py --auth [...]

This produces an output file in the local directory named hello.out that contains the text string "hello world".

An example in NodeJS could perform a simple sum of all numbers passed as parameters. Consider a script named calc-sum.js:

calc-sum.js

const fs = require('node:fs');

var sum = 0;
for (let i = 2; i < process.argv.length; i++) {
  sum += Number(process.argv[i]);
}
try {
  fs.writeFileSync('/local/output/sum.out', sum.toString());
}
catch (e) {
  console.log(e);
}

Because the node Docker container does not handle I/O redirects in the same way as the python container, the script handles writing the output to the correct location within the container and the Remote CLI --commandline only specifies the script and the numbers to include in the sum:

ce-cli --app "docker:node:slim" --commandline "node /local/input/calc-sum.js 1 2 3 4.2" --inputfile calc-sum.js --auth [...]

This produces an output file named "sum.out" in the local directory, which contains the sum of the numbers given on the command line.

See Packaging as a Docker container for details on how to package your application into a self-sufficient Docker container.

See instructions on the appropriate interface (section 1.1) for further execution details.

Input + Output Files

Once the computation environment runs, the input files provided during job submission are downloaded from the internet and are made available for the applications on the compute nodes in the /local/input virtual disk location.

If the input files are marked as cacheable, which is the default behavior, they might get cached on the compute node (*see Interface documentation for details) and subsequent executions will skip the download and use the local cached files instead. Caching is performed based on the file URL. It should be assumed that files submitted to Charity Engine are immutable and that new URLs are required for any new version of a file to ensure consistent behavior throughout the network.

The output files of computations are expected to be written into /local/output virtual disk location. The URLs to the output files or the output files themselves are then made available in the output file section in all of the interfaces.

Some applications may not have options to change locations of their input/output files, or modifications to the locations are not desirable, for example, to minimize testing required when such changes are introduced. It is however possible to use simple shell tools to move or copy files to their final locations, for example, by specifying the application command line as follows:

containerized_app --param example; cp output_file /local/output/output_file

Working with Large Datasets

Charity Engine's distributed storage feature harnesses our distributed network for large-scale datasets. Among other benefits, this {i} greatly speeds up search of large datasets, via parallelization, and {ii} allows computing to take place directly on the nodes that contain the relevant data. These features enable much more compute-intensive interaction with large datasets, and otherwise reduce delays associated with transferring data to compute resources.

Networking

Network access through HTTP (port 80) and HTTPS (port 443) protocols is allowed. The network speeds may vary based on the node location and the network/system load. Network latency is artificially throttled to several seconds per request, but multiple parallel requests are allowed and recommended.

Each compute reservation receives a limited amount of free bandwidth. Once the allocated bandwidth is exhausted, any further network communication will incur additional fees.

Additionally, network requests can be routed through the Charity Engine Distributed Proxy Service (See "Web Crawling + Proxy", below).

Instance Types

We offer a range of instance types, for both CPU and GPU computing. For details, see the Instance Type documentation.

Custom arrangements may be possible; contact us if currently available instance types or feature-sets are not suitable for your workloads.

Web Crawling + Proxy

Data collection requests to any domain on the web can be routed through the Charity Engine Distributed Proxy Service, which uses the power and unique, geographically-dispersed nature of our network to originate requests from locations around the world.

Distributed Storage

Distributed Storage is a networked file storage system which can be used to host large quantities of data, with data persistently maintained and replicated across multiple devices distributed worldwide. Access to this network is possible using all of the Charity Engine interfaces such as the Remote API, Remote CLI, and Smart Contracts. [In development]

Support

Please contact us with any questions.

Page tree

Computing with Charity Engine