In this tutorial I’ll show you how we can implement Wikipedia API in Python to fetch information from a Wikipedia article. Let’s see how to do it.
First we have to install Wikipedia. To install it, open your command prompt or terminal and type this command.
pip install wikipedia
That’s all we have to do. Now we can fetch the data from Wikipedia very easily.
To Get the Summary of an Article
import wikipedia print(wikipedia.summary("google"))
It will fetch the summary of google from wikipedia and print it on the screen.
To Get a Given Number of Sentences From the Summary of an Article
import wikipedia print(wikipedia.summary("google", sentences=1))
Output:
Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware.
Same way you can pass any number as a parameter to get the number of sentences you want.
To Change the Language of the Article
import wikipedia wikipedia.set_lang("fr") print(wikipedia.summary("google", sentences=1))
Output:
Google (prononcé [ˈguːgəl]) est une entreprise américaine de services technologiques fondée en 1998 dans la Silicon Valley, en Californie, par Larry Page et Sergueï Brin, créateurs du moteur de recherche Google.
Here fr stands for French. You can use any other code instead of fr to get the information in other language. But make sure that the Wikipedia should have that article in the language you want.
To see the code of other languages open this link https://www.loc.gov/standards/iso639-2/php/code_list.php
Search to Get the Titles of the Articles
import wikipedia print(wikipedia.search("google"))
Output:
[‘Google’, ‘Google+’, ‘Google Maps’, ‘Google Search’, ‘Google Translate’, ‘Google Chrome’, ‘.google’, ‘Google Earth’, ‘Gmail’, ‘Google Scholar’]
The method search() will return a list which consist of all the article’s titles that we can open.
To Get the URL of the Article
import wikipedia page = wikipedia.page("google") print(page.url)
Output:
https://en.wikipedia.org/wiki/Google
First wikipedia.page() will store all the relevant information in variable page. Then we can use the url property to get the link of the page.
To Get the Title of the Article
import wikipedia page = wikipedia.page("google") print(page.title)
Output:
To Get Complete Article
import wikipedia page = wikipedia.page("google") print(page.content)
Complete article from starting to end will be printed on the screen.
To Get the Images Included in Article
import wikipedia page = wikipedia.page("google") print(page.images[0])
Output:
https://upload.wikimedia.org/wikipedia/commons/1/1d/20_colleges_with_the_most_alumni_at_Google.png
So it will return us the URL of the particular image present at index 0. To fetch another image use 1, 2, 3, etc, according to images present in the article.
But if you want image to be downloaded into your local directory instead of printing the result then we can use urllib. Here’s the program which will help you to download an image from the link.
import urllib.request import wikipedia page = wikipedia.page("Google") image_link = page.images[0] urllib.request.urlretrieve(image_link , "local-filename.jpg")
The image present at index 0 will be saved as local-filename.jpg into the same directory where your program is saved. The above program will work for python 3.x, if you’re using Python 2.x then please see the program below.
import urllib import wikipedia page = wikipedia.page("Google") image_link = page.images[0] urllib.urlretrieve(image_link , "local-filename.jpg")
That’s all for this article, for more information please visit https://pypi.org/project/wikipedia/
If you’ve any problem or suggestion related to wikipedia python api then please comment below.
awesome explanantion going to built it in my search app Thanks keep posting this kinda posts
Just want to say your article is as surprising.
The clarity on your post is simply nice and i could suppose you are
knowledgeable on this subject. Fine together with your permission allow me to clutch your RSS feed to stay up to date with forthcoming post.
Thanks one million and please carry on the rewarding work.