Skip to content
This repository has been archived by the owner on Oct 30, 2019. It is now read-only.

zakaf/glacier-cli

 
 

Repository files navigation

glacier-cli (modified version)

This tool provides a sysadmin-friendly command line interface to Amazon Glacier, turning Glacier into an easy-to-use storage backend. It automates tasks which would otherwise require a number of separate steps (job submission, polling for job completion and retrieving the results of jobs).

glacier-cli uses Amazon Glacier's archive description field to keep friendly archive names, although you can also address archives directly by using their IDs. It keeps a local cache of archive IDs and their corresponding names, as well as housekeeping data to keep the cache up-to-date. This will save you time because you won't have to wait spend hours retrieving inventories all the time, and will save you mental effort because you won't have to keep track of the obtuse archive IDs yourself.

glacier-cli is fully interoperable with other applications using the same Glacier vaults. It can deal gracefully with vaults changing from other machines and/or other applications, and introduces no new special formats from the point of view of a vault.

Example

$ glacier vault list

(empty result with zero exit status)

$ glacier vault create example-vault

(silently successful: like other Unix commands, only errors are noisy)

$ glacier vault list
example-vault

(this list is retrieved from Glacier; a relatively quick operation)

$ glacier archive list example-vault

(empty result with zero exit status; nothing is in our vault yet)

$ echo 42 > example-content
$ glacier archive upload example-vault example-content

(Glacier has now stored example-content in an archive with description example-content and in a vault called example-vault)

$ glacier archive list example-vault
example-content

(this happens instantly, since glacier-cli maintains a cached inventory)

$ rm example-content

(now the only place the content is stored is in Glacier)

$ glacier archive retrieve example-vault example-content
glacier: queued retrieval job for archive 'example-content'
$ glacier archive retrieve example-vault example-content
glacier: job still pending for archive 'example-content'
$ glacier job list
a/p 2012-09-19T21:41:35.238Z example-vault example-content
$ glacier archive retrieve --wait example-vault example-content

(...hours pass while Amazon retrieves the content...)

$ cat example-content
42

(content successfully retrieved from Glacier)

Costs

Before you use Amazon Glacier, you should make yourself familiar with how much it costs. Note that archive retrieval costs are complicated and may be a lot more than you expect. Costs are annoyingly complicated in many senses, so it is inefficient for this program to calculate. However, this program will notify the user what their free retrieval size is and how much they've retrieved today One thing to note is that free retrieval is based on the latest inventory that has been synced to the glacier-cli, which means that deleting or adding archives after inventory has been created, can cause discrepancies in such values.

Installation

Check out the [glacier branch of boto][glacier branch of boto] from Github (this branch is not released yet and is still under heavy development).

Create a symlink boto in the same directory as glacier.py to point to the boto directory in the glacier branch. Then you can run glacier.py directly, or symlink /usr/local/bin/glacier to it to make it generally available.

git clone -b glacier git://github.com/boto/boto.git
git clone git://github.com/basak/glacier-cli.git
ln -s ../boto/boto glacier-cli/boto

Then either, for all users:

sudo ln -s $PWD/glacier-cli/glacier.py /usr/local/bin/glacier

or for just yourself, if you have ~/bin in your path:

ln -s $PWD/glacier-cli/glacier.py ~/bin/glacier

Next, glacier-cli also requires two python modules.

sudo easy_install iso8601
sudo easy_install sqlalchemy

If you haven't done so already, create a credential file for boto

vim ~/.boto

In this file, include the following lines:

//From the below line [Credentials] aws_access_key_id = aws_secret_access_key = //Until the above line

Commands

  • glacier --region region-name
  • glacier config create config-file
  • glacier config load config-file
  • glacier config change {region,free,allownace} new-value
  • glacier config download config-file #PROGRESS (PROBABLY WHEN CONFIG FILE IS CONVERTED TO AN ACTUAL CONFIG FILE FORMAT)
  • glacier config stat
  • glacier vault list
  • glacier vault create vault-name
  • glacier vault delete vault-name
  • glacier vault sync [--wait] [--fix] [--max-age hours] vault-name
  • glacier archive list vault-name
  • glacier archive upload [--name archive-name] vault-name filename
  • glacier archive retrieve [--wait] [-o filename] [--multipart-size bytes] vault-name archive-name #PROGRESS
  • glacier archive retrieve [--wait] [--multipart-size bytes] vault-name archive-name [archive-name...]
  • glacier archive delete vault-name archive-name
  • glacier job list

Delayed Completion

If you request an archive retrieval, then this requires a job which will take some number of hours to complete. You have one of two options:

  1. If the command fails with a temporary failure—printed to stderr and with an exit status of EX_TEMPFAIL (75)—then a job is pending, and you must retry the command until it succeeds.
  2. If you prefer to just wait, then use --wait (or retry with --wait if you didn't use it the first time). This will just do everything and exit when it is done. Amazon Glacier jobs typically take around four hours to complete.

Without --wait, glacier-cli will follow this logic:

  1. Look for a suitable existing archive retrieval job.
  2. If such a job exists and it is pending, then exit with a temporary failure.
  3. If such a job exists and it has finished, then retrieve the data and exit with success.
  4. Otherwise, submit a new job to retrieve the archive and exit with a temporary failure. Subsequent calls requesting the same archive will find this job and follow these same four steps with it, resulting in a downloaded archive when the job is complete.

Cache Reconstruction

glacier-cli follows the XDG Base Directory Specification and keeps its cache in ${XDG_CACHE_HOME:-$HOME/.cache}/glacier-cli/db.

After a disaster, or if you have modified a vault from another machine, you can reconstruct your cache by running:

$ glacier vault sync example-vault

This will set off an inventory job if required. This command is subject to delayed completion semantics as above but will also respond to --wait as needed.

By default, existing inventory jobs that completed more than 24 hours ago are ignored, since they may be out of date. You can override this with --max-age=hours. Specify --max-age=0 to force a new inventory job request.

Note that there is a lag between creation or deletion of an archive and the archive's corresponding appearance or disappearance in a subsequent inventory, since Amazon only periodically regenerates vault inventories. glacier-cli will show you newer information if it knows about it, but if you perform vault operations that do not update the cache (eg. on another machine, as another user, or from another program), then updates may take a while to show up here. You will need to run a vault sync operation after Amazon have updated your vault's inventory, which could be a good day or two after the operation took place.

If something doesn't go as expected (eg. an archive that glacier-cli knows it created fails to appear in the inventory after a couple of days, or an archive disappears from the inventory after it showed up there), then vault sync will warn you about it. You can use --fix to accept the correction and update the cache to match the official inventory.

Addressing Archives

Normally, you can just address an archive by its name (which, from Amazon's perspective, is the Glacier archive description).

However, you may end up with multiple archives with the same name, or archives with no name, since Amazon allows this. In this case, you can refer to an archive by ID instead by prefixing your reference with id:.

To avoid ambiguity, prefixing a reference with name: works as you would expect. If you end up with archive names or IDs that start with name: or id:, then you must use a prefix to disambiguate.

Using Pipes

Use glacier archive upload <vault> --name=<name> - to upload data from standard input. In this case you must use --name to name your archive correctly.

Use glacier archive retrieve <vault> <name> -o- to download data to standard output. glacier-cli will not output any data to standard output apart from the archive data in order to prevent corrupting the output data stream.

Contact

Progress

*As i get into thinking about pricing model of Amazon Glacier, it seems as if it is impossible to calculate even the estimate of the pricing plan. The factors that make it almost impossible are the "peak billable rate" and timing. *Now when the user requests for retrieval, the program will display how much data has been retrieved today and how much data will be retrieved by this request. And based on config's only free row, the program will retrieve. (so the only case the retrieval wouln't work is if the retrieval is not free and only_free row has yes as its value) *Now config stat command gives basic stats

About

Command-line interface to Amazon Glacier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%