How to Download Multiple Files Concurrently in Python
Python has a very powerful library called requests for initiating http requests programmatically. You can use requests for downloading files hosted over http protocol. Run the following command to install requests python library. This assumes that you already have python 3 installed on your system.
You may need to prefix the above command with sudo if you get permission error in your linux system.
The following python 3 program downloads a given url to a local file. The following example assumes that the url contains the name of the file at the end and uses it as the name for the locally saved file.
import requests def download_url(url): # assumes that the last segment after the / represents the file name # if the url is http://abc.com/xyz/file.txt, the file name will be file.txt file_name_start_pos = url.rfind("/") + 1 file_name = url[file_name_start_pos:] r = requests.get(url, stream=True) if r.status_code == requests.codes.ok: with open(file_name, 'wb') as f: for data in r: f.write(data) # download a sngle url # the file name at the end is used as the local file name download_url("https://jsonplaceholder.typicode.com/posts")
After running the above program, you will find a file named "posts" in the same folder where you have the script saved.
The following python 3 program downloads a list of urls to a list of local files. However the download may take sometime since it is executed sequentially.
import requests def download_url(url): print("downloading: ",url) # assumes that the last segment after the / represents the file name # if url is abc/xyz/file.txt, the file name will be file.txt file_name_start_pos = url.rfind("/") + 1 file_name = url[file_name_start_pos:] r = requests.get(url, stream=True) if r.status_code == requests.codes.ok: with open(file_name, 'wb') as f: for data in r: f.write(data) # download a sngle url # the file name at the end is used as the local file name download_url("https://jsonplaceholder.typicode.com/posts") download_url("https://jsonplaceholder.typicode.com/comments") download_url("https://jsonplaceholder.typicode.com/photos") download_url("https://jsonplaceholder.typicode.com/todos") download_url("https://jsonplaceholder.typicode.com/albums")
The download program above can be substantially speeded up by running them in parallel. The following python program shows how to download multiple files concurrently by using multiprocessing library which has support for thread pools. Note the use of results list which forces python to continue execution until all the threads are complete. Without the iteration of the results list, the program will terminate even before the threads are started. Also note that we are running 5 threads concurrently in the script below and you may want to increase it if you have a large number of files to download. However, this puts substantial load on the server and you need to be sure that the server can handle such concurrent loads.
import requests from multiprocessing.pool import ThreadPool def download_url(url): print("downloading: ",url) # assumes that the last segment after the / represents the file name # if url is abc/xyz/file.txt, the file name will be file.txt file_name_start_pos = url.rfind("/") + 1 file_name = url[file_name_start_pos:] r = requests.get(url, stream=True) if r.status_code == requests.codes.ok: with open(file_name, 'wb') as f: for data in r: f.write(data) return url urls = ["https://jsonplaceholder.typicode.com/posts", "https://jsonplaceholder.typicode.com/comments", "https://jsonplaceholder.typicode.com/photos", "https://jsonplaceholder.typicode.com/todos", "https://jsonplaceholder.typicode.com/albums" ] # Run 5 multiple threads. Each call will take the next element in urls list results = ThreadPool(5).imap_unordered(download_url, urls) for r in results: print(r)