Downloading Twits using Twitter Public Stream API

In this post will try to show how to connect to Twitter Public Stream API and download live Twits. We can save these Twits in any format like comma or pipe delimited. These Twits can be use for Data Mining or Data Analysis. Have used this Twits to analyse the current News in given region.

This program creates files of 50MB each, and keep rotating till the time the stream is running. we can run Map-Reduce program using Hadoop Framework to analysis the data at much faster rate.

Prerequisite:

1. Need to have Twitter Application
2. Twitter OAuth Tokens
     https://dev.twitter.com/docs/auth/obtaining-access-tokens
3. Twitter4j Libraries
    http://twitter4j.org/en/


Below are the twitter4j API used:

TwitterStreamFactory : This API returns the object of type TwitterStream

We need to set below authorization tokens to connect to Twitter Public Stream. You will find below tokens once you create Twitter Application.

-Dtwitter4j.oauth.consumerKey=XX  
-Dtwitter4j.oauth.consumerSecret=XX
-Dtwitter4j.oauth.accessToken=XX
-Dtwitter4j.oauth.accessTokenSecret=XX

If you are running this program inside Firewall you need to set proxy details to connect to Internet.

-Dhttp.proxyHost=proxy 
-Dhttp.proxyPort=80 

Have created simple TwitterListener.java class which implements StatusListener Interface. This class is act as the Listener to public twits. It returns the Status object which basically is the Java object the Json string returned by the Twitter. This object contains all the information related to that Twit eg. Location, Message, DateTime etc.

Have created simple Logger (TwitterLogger) to log the Twits in file. Have used "|||" as a field separator and new line as record separator. The logger will rotate the file if the size reach around 50MB.

Also, you will find the window's command line scripts to start the Twitter downloading. It basically calls (TwitterConnector) main method. Set user.home variable where you want the Twits to be download. The user.home must have logs folder.

In one day it downloads around 1-2 GB data. We can use this data for analyzing information or run few BigData programs.

References:

https://dev.twitter.com/docs/streaming-apis
https://dev.twitter.com/docs/auth/obtaining-access-tokens
https://dev.twitter.com/docs
http://twitter4j.org/en/javadoc.html

Code:
TwitterConnector Download











Popular posts from this blog

API Design First approach: Implementing quick mock API's using swagger hub and postman

Combine or Merge XML documents in Single XML using Boomi & Groovy

JAVA embedding in Oracle SOA 12c