This is time for another blog on cosmosdb explaining how to stream tweets from twitter using hashtags and store them in cosmosdb in real time. You should be able to setup and run this demo within 15 minutes.
Pre-Requisites Needed:
I have the following in my local environment , hope you guys have already have😊, if not start setting up.
· Windows 10 OS
· Python 2.7
· Visual Studio Code or PyCharm (Any editor)
· Azure subscription
Ok folks let’s get started.
Step 1: Install Python
Step 2: Install Tweepy and PyDocumentDB
Pip install tweepy
C:\>set PATH=%PATH%;C:\Python27\Scripts
Pydocumentdb:
As mentioned above we will be storing the tweets in Azure’s cosmosdb , In order to do that we need the python package for cosmosdb which is pydocumentdb. Install it with the following command.
Pip install pydocumentdb
Now we have everything needed. Lets dive into coding.
Step 3: Creating Listener to invoke the cosmosdb client
Create a listener named CosmosDBListener with the following methods
__init__ Initializes the client to make sure the connection is available.
On_data will load the data retrieved from the stream and write to the Cosmosdb.
On_error will throw if there is any network/key issues on console.
from config import * import json from tweepy.streaming import StreamListener class CosmosDBListener(StreamListener): def __init__(self, client, collLink): self.client = client self.collLink = collLink def on_data(self, data): try: dictData = json.loads(data) dictData["id"] = str(dictData["id"]) self.client.CreateDocument(self.collLink, dictData) return True except BaseException as e: print("Error on data: %s" % str(e)) return True def on_error(self, status): print(status) return True
Step 4: Stream data from twitter to cosmosDB
auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_secret) api = tweepy.API(auth)
connectionPolicy = documents.ConnectionPolicy() connectionPolicy.EnableEndpointDiscovery connectionPolicy.PreferredLocations = preferredLocations
client = document_client.DocumentClient(host, {'masterKey': masterKey}, connectionPolicy) dbLink = 'dbs/' + databaseId collLink = dbLink + '/colls/' + collectionId twitter_stream = Stream(auth, CosmosDBListener(client, collLink)) twitter_stream.filter(track=['#CosmosDB', '#Microsoft', '#MVP', '#BigData', '#DataScience', '#Mongo', '#Graph'], async=True)
# Enter CosmosDB config details below. masterKey = ' ' host = ' ' #Enter your database, collection and preferredLocations here. databaseId = 'tweepyDemo' collectionId = 'tweets' preferredLocations = '' # Enter twitter OAuth keys here. consumer_key = '' consumer_secret = '' access_token = '' access_secret = ''
You also need to register the script as a new application at twitter developer portal. After choosing a name and application for your app, you will be provided with a consumer key , Consumer secret, access token and access token secret – which need to be filled into the above config.py to provide the app programmatic access to Twitter.
py cosmosdbdriver.py
I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.Logistics SoftwareFleet Management SoftwareERP Software Companies
LikeLike
This was truly awesome. thanks so much for this..!!..Azure Online Training Hyderabad
LikeLike