Posts

Showing posts with the label Hadoop

Downloading Twits using Twitter Public Stream API

In this post will try to show how to connect to Twitter Public Stream API and download live Twits. We can save these Twits in any format like comma or pipe delimited. These Twits can be use for Data Mining or Data Analysis. Have used this Twits to analyse the current News in given region. This program creates files of 50MB each, and keep rotating till the time the stream is running. we can run Map-Reduce program using Hadoop Framework to analysis the data at much faster rate. Prerequisite: 1. Need to have Twitter Application 2. Twitter OAuth Tokens       https://dev.twitter.com/docs/auth/obtaining-access-tokens 3. Twitter4j Libraries      http://twitter4j.org/en/ Below are the twitter4j API used: TwitterStreamFactory  : This API returns the object of type  TwitterStream We need to set below authorization tokens to connect to Twitter Public Stream. You will find below tokens once you create Twitter Application. -Dtwitter4j.oauth...

Download Hadoop 1.0.3 plugin for Eclipse

I tried searching the Hadoop 1.0.3 plugin for Eclipse, but was not able to find it. Apache has remove the plugin from Hadoop installation folder. Instead you can find Eclipse Plugin source code with build.xml file at "${HADOOP_HOME}\hadoop-1.0.3\src\contrib\eclipse-plugin" .  I have build the plugin project and created hadoop-eclipse-plugin-1.0.3.jar. You can simply download ( hadoop-eclipse-plugin-1.0.3.jar ) and use the same. Thanks, Rohan L