downloader#
This module provides a multiple download functions for downloading files from given urls.
Functions |
Description |
|---|---|
download a single file from a given url |
|
sequentially download multiple files from given urls |
|
asynchronously download multiple files from given urls |
|
download multiple files from given urls using multiprocessing |
Functions#
- downloader.download_data(folder=None, file_name=None, client=None, engine='requests', follow_redirects=True, retry=0, authorize_from_browser=False)#
Download a single file.
Parameters:#
- url: str
url of web file
- folder: str
the folder to store output files. Default current folder.
- file_name: str
the file name. If None, will parse from web response or url. file_name can be the absolute path if folder is None.
- client: requests.Session() for requests engine or httpx.Client() for httpx engine
client maintaining connection. Default None
- engine: one of [“requests”,”httpx”]
engine for downloading
- follow_redirects: bool
Enables or disables HTTP redirects
- retry: int
number of reconnection when status code is 503
- authorize_from_browser: bool
Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.
- downloader.download_datas(folder=None, file_names=None, engine='requests', authorize_from_browser=False, desc='')#
download data from a list like object which containing urls. This function will download files one by one.
Parameters:#
- urls: iterator
iterator contains urls
- folder: str
the folder to store output files. Default current folder.
- engine: one of [“requests”,”httpx”]
engine for downloading
- file_names: iterator
iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.
- authorize_from_browser: bool
Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.
- desc: str
description of data downloading
Examples:#
>>> from data_downloader import downloader
specify the urls and folder
>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif'] >>> folder = 'D:\data'
download data from urls and store them in folder
>>> downloader.download_datas(urls,folder)
- downloader.async_download_datas(folder=None, file_names=None, limit=30, desc='', follow_redirects=True, retry=0, authorize_from_browser=False)#
Download multiple files simultaneously.
Parameters:#
- urls: iterator
iterator contains urls
- folder: str
the folder to store output files. Default current folder.
- authorize_from_browser: bool
Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.
- file_names: iterator
iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.
- limit: int
the number of files downloading simultaneously
- desc: str
description of datas downloading
- follow_redirects: bool
Enables or disables HTTP redirects
- retry: int
number of reconnection when status code is 503
Example:#
>>> from data_downloader import downloader
specify the urls and folder
>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif'] >>> folder = 'D:\data'
download data from urls and store them in folder
>>> downloader.async_download_datas(urls,folder,None,desc='interferograms')
- downloader.mp_download_datas(folder=None, file_names=None, ncore=None, desc='', follow_redirects=True, retry=0, engine='requests', authorize_from_browser=False)#
download data from a list like object which containing urls. This function will download multiple files simultaneously using multiprocess.
Parameters:#
- urls: iterator
iterator contains urls
- folder: str
the folder to store output files. Default current folder.
- engine: one of [“requests”,”httpx”]
engine for downloading
- file_names: iterator
iterator contains names of files. Leaving it None if you want the program to parse them from website. file_names can contain the absolute paths if folder is None.
- ncore: int
Number of cores for parallel processing. If ncore is None then the number returned by os.cpu_count() is used. Default None.
- desc: str
description of data downloading
- authorize_from_browser: bool
Whether to load cookies used by your web browser for authorization. This means you can use python to download data by logging in to website via browser (So far the following browsers are supported: Chrome,Firefox, Opera, Edge, Chromium”). It will be very useful when website doesn’t support “HTTP Basic Auth”. Default is False.
Examples:#
>>> from data_downloader import downloader
specify the urls and folder
>>> urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif', 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif'] >>> folder = 'D:\data'
download data from urls and store them in folder
>>> downloader.mp_download_datas(urls,folder)