downloader#

This module provides a multiple download functions for downloading files from given urls.

Functions

Description

download_data()

download a single file from a given url

download_datas()

sequentially download multiple files from given urls

async_download_datas()

asynchronously download multiple files from given urls

mp_download_datas()

download multiple files from given urls using multiprocessing

Functions#

downloader.download_data(folder=None, file_name=None, client=None, engine='requests', follow_redirects=True, retry=0, authorize_from_browser=False)#

Download a single file.

Parameters:#

url: str

url of web file

folder: str

the folder to store output files. Default current folder.

file_name: str

the file name. If None, will parse from web response or url. file_name can be the absolute path if folder is None.

client: requests.Session() for requests engine or httpx.Client() for httpx engine

client maintaining connection. Default None

engine: one of [“requests”,”httpx”]

engine for downloading

follow_redirects: bool

Enables or disables HTTP redirects

retry: int

number of reconnection when status code is 503

authorize_from_browser: bool

Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.

downloader.download_datas(folder=None, file_names=None, engine='requests', authorize_from_browser=False, desc='')#

download data from a list like object which containing urls. This function will download files one by one.

Parameters:#

urls: iterator

iterator contains urls

folder: str

the folder to store output files. Default current folder.

engine: one of [“requests”,”httpx”]

engine for downloading

file_names: iterator

iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.

authorize_from_browser: bool

Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.

desc: str

description of data downloading

Examples:#

>>> from data_downloader import downloader

specify the urls and folder

>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif']
>>> folder = 'D:\data'

download data from urls and store them in folder

>>> downloader.download_datas(urls,folder)
downloader.async_download_datas(folder=None, file_names=None, limit=30, desc='', follow_redirects=True, retry=0, authorize_from_browser=False)#

Download multiple files simultaneously.

Parameters:#

urls: iterator

iterator contains urls

folder: str

the folder to store output files. Default current folder.

authorize_from_browser: bool

Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.

file_names: iterator

iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.

limit: int

the number of files downloading simultaneously

desc: str

description of datas downloading

follow_redirects: bool

Enables or disables HTTP redirects

retry: int

number of reconnection when status code is 503

Example:#

>>> from data_downloader import downloader

specify the urls and folder

>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif']
>>> folder = 'D:\data'

download data from urls and store them in folder

>>> downloader.async_download_datas(urls,folder,None,desc='interferograms')
downloader.mp_download_datas(folder=None, file_names=None, ncore=None, desc='', follow_redirects=True, retry=0, engine='requests', authorize_from_browser=False)#

download data from a list like object which containing urls. This function will download multiple files simultaneously using multiprocess.

Parameters:#

urls: iterator

iterator contains urls

folder: str

the folder to store output files. Default current folder.

engine: one of [“requests”,”httpx”]

engine for downloading

file_names: iterator

iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.

ncore: int

Number of cores for parallel processing. If ncore is None then the number returned by os.cpu_count() is used. Default None.

desc: str

description of data downloading

authorize_from_browser: bool

Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.

Examples:#

>>> from data_downloader import downloader

specify the urls and folder

>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif',
'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif']
>>> folder = 'D:\data'

download data from urls and store them in folder

>>> downloader.mp_download_datas(urls,folder)