Charity Engine is "the crowdsourced cloud" – a high-throughput compute service running on a global network of volunteered devices.
Our distributed platform provides an ecosystem of compute, storage, and web crawling capabilities that can be used independently or as integrated services to support goals such as data discovery, processing, analysis, and other purposes.
Our integrated scheduler means there's no need to install and manage batch computing software or server clusters, allowing you to focus on data science rather than data infrastructure.
Getting Started with Computing
Getting Started with Web Crawling
Contents
Computing
Overview. Compute jobs can range from a single small task to a massively parallel batch of jobs spanning hundreds of thousands of nodes. The integrated batch scheduler will match jobs to available resources and manage their execution: your input files and application container (*where relevant) are all that's needed.
Compute jobs are run by executing standard Docker images from Docker Hub, custom docker containers, or participating proprietary applications (for an additional fee).
Interfaces. A variety of interfaces are available; all are sufficient for running jobs or batches on the Charity Engine network, but each is tailored for specific use cases.
- The graphical web interface makes manual submission of a few jobs easy and is especially useful for testing.
- The Remote API allows programmatic integration with existing systems.
- The Remote CLI is best suited for scripted efforts and massively-parallel execution with tools such as GNU Parallel.
Once jobs are submitted through any of these interfaces, the integrated batch scheduler will match jobs to available resources and manage their execution: your container and input files are all that's needed.
Applications. You will need your docker image name/URL or a proprietary application name, your input files, and the command line to execute within the executed environment. The command line can be as simple as executing an application with a docker image, a command to execute the first input file, or a more involved script that handles the logic of your job.
Execution Environments. It's possible to run code in a range of languages directly on Charity Engine, by specifying the relevant execution environment.
Input+Output Files and Networking. All nodes have limited network access for retrieving input data via files submitted during job/batch creation and/or from internet resources available via HTTP(S).
Instance Types. Resources are provisioned in a familiar “instance type” system, where various compute nodes are made available in uniform sets of instances, differing in CPU capacity, memory and disk availability.
Getting Started. To start computing, obtain credentials to the system and submit jobs through one of the supported interfaces. Take a look at the Quick Start Guide for Computing, or read more below.
Interfaces
Remote API
This API is provided for integration of Charity Engine as a backend service, and summarizes functions for managing the entire lifecycle of jobs submitted to the network. See: Remote API
Remote CLI
The Remote CLI provides a means to run jobs on Charity Engine compute resources using simple command line tools. (e.g. custom scripts or tools such as GNU Parallel). This interface is geared toward running large batches of jobs, but can be used anytime a command line approach is more appropriate than a web API.
- Standalone command-line interface: See: Remote CLI
- GNU Parallel (using Parallel allows you to manage batches of work using just a single command line tool)
Wolfram Language / Wolfram Engine / Mathematica
Charity Engine resources can be accessed directly from within Wolfram Language. For details, see Wolfram Research documentation here.
Ethereum
- Smart Contract
- Dapp UI (*contact us for demo access)
Web
[A web-based GUI is coming soon].
Custom Arrangements
Please contact us with any special needs, which may not be covered above.
Applications
Charity Engine supports several types of applications:
- Docker images available on Docker Hub (e.g.
docker:python:3.7
) - Custom Docker images available anywhere online (e.g.
docker:image-name https://example.com/file
for 64bit docker images, ordocker-x86:image-name https://example.com/file
for 32bit docker images) - Applications deployed directly on the Charity Engine network, both open-source and proprietary (e.g.
charityengine:wolframengine
). For a list, see our App Library.
See instructions on the appropriate interface (section 1.1) for further execution details.
Execution Environments
To run raw code on Charity Engine, just upload your code and specify the relevant execution environment, either using a "named" environment in our App Library, or by specifying an appropriate container on Docker Hub or the web (see 1.2 "Applications", above)
Examples:
See instructions on the appropriate interface (section 1.1) for further execution details.
Input + Output Files
Once the computation environment runs, the input files provided during job submission are downloaded from the internet and are made available for the applications on the compute nodes in the /local/input virtual disk location.
If the input files are marked as cacheable, which is the default behavior, they might get cached on the compute node (*see Interface documentation for details) and subsequent executions will skip the download and use the local cached files instead. Caching is performed based on the file URL. It should be assumed that files submitted to Charity Engine are immutable and that new URLs are required for any new version of a file to ensure consistent behavior throughout the network.
The output files of computations are expected to be written into /local/output virtual disk location. The URLs to the output files or the output files themselves are then made available in the output file section in all of the interfaces.
Some applications may not have options to change locations of their input/output files, or modifications to the locations are not desirable, for example, to minimize testing required when such changes are introduced. It is however possible to use simple shell tools to move or copy files to their final locations, for example, by specifying the application command line as follows:
containerized_app --param example; cp output_file /local/output/output_file
Working with Large Datasets
Charity Engine's distributed storage feature harnesses our distributed network for large-scale datasets. Among other benefits, this {i} greatly speeds up search of large datasets, via parallelization, and {ii} allows computing to take place directly on the nodes that contain the relevant data. These features enable much more compute-intensive interaction with large datasets, and otherwise reduce delays associated with transferring data to compute resources.
Networking
Network access through HTTP (port 80) and HTTPS (port 443) protocols is allowed. The network speeds may vary based on the node location and the network/system load. Network latency is artificially throttled to several seconds per request, but multiple parallel requests are allowed and recommended.
Each compute reservation receives a limited amount of free bandwidth. Once the allocated bandwidth is exhausted, any further network communication will incur additional fees.
Additionally, network requests can be routed through the Charity Engine Distributed Proxy Service (See "Web Crawling + Proxy", below).
Instance Types
We offer a range of instance types, for both CPU and GPU computing. For details, see the Instance Type documentation.
Custom arrangements may be possible; contact us if currently available instance types or feature-sets are not suitable for your workloads.
Web Crawling + Proxy
Data collection requests to any domain on the web can be routed through the Charity Engine Distributed Proxy Service, which uses the power and unique, geographically-dispersed nature of our network to originate requests from locations around the world.
Distributed Storage
Distributed Storage is a networked file storage system which can be used to host large quantities of data, with data persistently maintained and replicated across multiple devices distributed worldwide. Access to this network is possible using all of the Charity Engine interfaces such as the Remote API, Remote CLI, and Smart Contracts. [In development]
Support
Please contact us with any questions.