downloader: Download files from urls#

There are several functions in the downloader module. The following table lists all the functions used to download files from urls.

Functions

Description

download_data()

download a single file from a given url

download_datas()

sequentially download multiple files from given urls

async_download_datas()

asynchronously download multiple files from given urls

mp_download_datas()

download multiple files from given urls using multiprocessing

You can import downloader at the beginning.

from data_downloader import downloader

Following is a brief introduction to those functions.

download_data#

This function is design for downloading a single file.

Note

This function can only download one file at a time. If a lot of files need to be downloaded, try to use download_datas, async_download_datas or mp_download_datas functions.

Example:

In [6]: url = 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_201
   ...: 41211.geo.unw.tif'
   ...:
   ...: folder = 'D:\\data'
   ...: downloader.download_data(url,folder)

20141117_20141211.geo.unw.tif:   2%|                   | 455k/22.1M [00:52<42:59, 8.38kB/s]

download_datas#

Download multiple files from a list like object that contains urls one by one.

Example:

In [12]: from data_downloader import downloader
    ...:
    ...: urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20
    ...: 141211.geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221
    ...: .geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221
    ...: .geo.cc.tif']
    ...:
    ...: folder = 'D:\\data'         G, param_names = GC.ftc_model1(t1s, t2s, t3s, t4s, years, ftc)
    ...: downloader.download_datas(urls,folder)

20141117_20141211.geo.unw.tif:   6%|           | 1.37M/22.1M [03:09<2:16:31, 2.53kB/s]

async_download_datas#

Download files simultaneously with asynchronous mode.

Note

Not all websites support downloading multiple files simultaneously. If the website doesn’t allow parallel downloading, the download by this function may be incomplete. Try to use download_datas instead if you encounter this problem.

Example:

In [3]: from data_downloader import downloader
   ...:
   ...: urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049
   ...: _131313/interferograms/20141117_20141211/20141117_20141211.geo.unw.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150221/20141024_20150221.geo.unw.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150128/20141024_20150128.geo.cc.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141024_20150128/20141024_20150128.geo.unw.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141211_20150128/20141211_20150128.geo.cc.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141117_20150317/20141117_20150317.geo.cc.tif',
   ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_13131
   ...: 3/interferograms/20141117_20150221/20141117_20150221.geo.cc.tif']
   ...:
   ...: folder = 'D:\\data'
   ...: downloader.async_download_datas(urls,folder,limit=3,desc='interferograms')

>>> Total | Interferograms :   0%|                          | 0/7 [00:00<?, ?it/s]
    20141024_20150221.geo.unw.tif:  11%|    | 2.41M/21.2M [00:11<41:44, 7.52kB/s]
    20141117_20141211.geo.unw.tif:   9%|    | 2.06M/22.1M [00:11<25:05, 13.3kB/s]
    20141024_20150128.geo.cc.tif:  36%|██▏   | 1.98M/5.42M [00:12<04:17, 13.4kB/s]
    20141117_20150317.geo.cc.tif:   0%|               | 0.00/5.44M [00:00<?, ?B/s]
    20141117_20150221.geo.cc.tif:   0%|               | 0.00/5.47M [00:00<?, ?B/s]
    20141024_20150128.geo.unw.tif:   0%|              | 0.00/23.4M [00:00<?, ?B/s]
    20141211_20150128.geo.cc.tif:   0%|               | 0.00/5.44M [00:00<?, ?B/s]

mp_download_datas#

Download files simultaneously using multiprocessing.

Note

Not all websites support downloading multiple files simultaneously. If the website doesn’t allow parallel downloading, the download by this function may be incomplete. Try to use download_datas instead if you encounter this problem.

Example:

In [12]: from data_downloader import downloader
    ...:
    ...: urls=['http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20141211/20141117_20
    ...: 141211.geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150221/20141024_20150221
    ...: .geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141024_20150128/20141024_20150128
    ...: .geo.unw.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141211_20150128/20141211_20150128
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150317/20141117_20150317
    ...: .geo.cc.tif',
    ...: 'http://gws-access.ceda.ac.uk/public/nceo_geohazards/LiCSAR_products/106/106D_05049_131313/interferograms/20141117_20150221/20141117_20150221
    ...: .geo.cc.tif']
    ...:
    ...: folder = 'D:\\data'
    ...: downloader.mp_download_datas(urls,folder)

 >>> 12 parallel downloading
 >>> Total | :   0%|                                         | 0/7 [00:00<?, ?it/s]
20141211_20150128.geo.cc.tif:  15%|██▊                | 803k/5.44M [00:00<?, ?B/s]

status_ok#

Simultaneously detecting whether the given links are accessible.

Example:

In [1]: from data_downloader import downloader
   ...: import numpy as np
   ...:
   ...: urls = np.array(['https://www.baidu.com',
   ...: 'https://www.bai.com/wrongurl',
   ...: 'https://cn.bing.com/',
   ...: 'https://bing.com/wrongurl',
   ...: 'https://bing.com/'] )
   ...:
   ...: status_ok = downloader.status_ok(urls)
   ...: urls_accessable = urls[status_ok]
   ...: print(urls_accessable)

['https://www.baidu.com' 'https://cn.bing.com/' 'https://bing.com/']