onsdag den 23. marts 2016

irods 2 figshare integration.

Post :  https://groups.google.com/forum/#!topic/irod-chat/bSB89t1Q2pc

We have been doing a iRODS to Figshare PoC py plugin.
Feel free to test it ;-)   

We are missing  : Use the script from the iDROP-Web and maybe have a figshare-button or menu thing.  (any got that working ? let me know! )


irods_figshare playground for irods figshare integration.
irods_to_figshare.py script is a PoC for copying files from iRODS to figshare via the figshare API.

Figshare access token
Create a figshare account, login and then access the Apllications section. There you can generate a Personal Access token that can be used to access the figshare API.
Required configuration
Check out irods_figshare.conf.json for some sample values.
Running the script

Requires python-irodsclient available here. 
Make sure that the file passed as argument to the script exists in iRODS and that it has associated the following metadata fields: title, description, tags. The script will search for and use these in order to create the figshare article.

Execute the script: python irods_to_figshare.py figshare_test.txt

Code :
file : irods_figshare.conf.json
{
"irods": {
"host": "localhost",
"port": 1247,
"user": "irods",
"passwd": "irods",
"zone": "tempZone",
"collection": "/tempZone/home/irods"
},
"figshare": {
"access_token": "6c5e3a8beb74a4e817f8994525d3af60e3ac2e21cbd70a54eabdcf9e925d9672d7a31364cf7233a957",
"api_endpoint": "https://api.figshare.com/v2"
}
}

----  CUT -----
file :  irods_to_figshare.py

import argparse
import requests
import json
import hashlib
import os

from irods.session import iRODSSession



#load config
config = json.loads(open('irods_figshare.conf.json').read())

#argument parser
arg_parser = argparse.ArgumentParser()
arg_parser.add_argument('file', help="iRODS file to work on")
args = arg_parser.parse_args()

#irods stuff
irods_session = iRODSSession(host=config['irods']['host'],
     port=config['irods']['port'],
                             user=config['irods']['user'],
                             password=str(config['irods']['passwd']),
                             zone=config['irods']['zone'])

irods_collection = irods_session.collections.get(config['irods']['collection'])

#figshare stuff
access_token = config['figshare']['access_token']
api_endpoint = config['figshare']['api_endpoint']

#request session
http_session = requests.Session()
api_headers = {
        "Authorization": "token {}".format(access_token),
        "Content-Type": "application/json"
}
http_session.headers.update(api_headers)

#lets get the file and associated metadata
file_path = os.path.join(config['irods']['collection'], args.file)
file_obj = irods_session.data_objects.get(file_path)
file_metadata = file_obj.metadata.items()

#create article
article_metadata = {
"title": "",
"description": "",
"tags": ""
}

for meta in file_metadata:
    name = meta.name.lower()
    if name == "title":
        article_metadata["title"] = meta.value
    elif name == "description":
        article_metadata["description"] = meta.value
    elif name == "tags":
        article_metadata["tags"] = meta.value.split(',')
    else:
        continue

article_create_endpoint = "{}/account/articles".format(api_endpoint)
request = http_session.post(article_create_endpoint, data=json.dumps(article_metadata), headers=api_headers)
article_id = request.headers["Location"].split('/')[-1]

#Add file to article - upload file
##initiate upload

m = hashlib.md5()
with file_obj.open('r') as fd:
for line in fd:
    m.update(line)
fd.seek(0, 0)
md5sum = m.hexdigest()

file_data = {
"name": file_obj.name,
"size": file_obj.size,
"md5": md5sum
}

create_file_endpoint = "{}/account/articles/{}/files".format(api_endpoint, article_id)
request = http_session.post(create_file_endpoint, data=json.dumps(file_data))
file_url = '{location}'.format(**request.json())
file_id = file_url.split('/')[-1]

file_info = http_session.get(file_url).json()
file_parts = http_session.get(file_info["upload_url"]).json()["parts"]

with file_obj.open('r') as fd:
    for part in file_parts:
        size = part['endOffset'] - part['startOffset'] + 1
        address = '{}/{}'.format(file_info['upload_url'], part['partNo'])
        http_session.put(address, data=fd.read(size))

#complete file
file_completed = http_session.post(file_url)

print "Done!"
print "Check out the article at {}/{}?access_token={}".format(article_create_endpoint, article_id, access_token)
print "Check out the file at {}/{}/files/{}?access_token={}".format(article_create_endpoint, article_id, file_id, access_token)