Twitter data source
The Twitter data source gathers content using the Twitter V1.1 API. Twitter have deprecated this API and this will most likely break when Twitter decides to disable it. When this happens, it is unlikely there will be any free option available for indexing Tweets. A Twitter V2 API gatherer plugin, which gathers content from Twitter using the V2 API, is now also available. However, in order to use this API you must have a paid Twitter API account. We recommend that for most users you continue to use the older Twitter data source, and you should also start considering whether to either discontinue your Twitter index, or upgrade to a paid service. |
Twitter data sources are used to index content from the Twitter social media platform.
Usage of Funnelback to gather content from Twitter must comply with Twitter’s terms of service. |
Getting authentication keys and secrets
Before you can crawl Twitter, ensure that you have:
-
A Twitter account
-
Created an application within Twitter (your Twitter login will be required)
-
Application Name
-
Application Description
-
Website
-
CallbackURL
-
Once complete, note your OAuth consumer key/consumer secret and OAuth access token / token secret.
Discarding old tweets
Tweets older than a certain date can be discarded by enabling the date filter plugin and configuring it to discard the older items.
User mentions and hash-tags
User mentions and hash-tags within Twitter content can be made searchable by enabling the social tags plugin.
Metadata mappings
Twitter data sources include a number of default Twitter specific metadata mappings:
Class ID | Type | Behaviour | Explanation | Metadata fields included |
---|---|---|---|---|
|
text |
content |
|
|
|
text |
display |
|
|
|
text |
content |
Tweet |
|
|
text |
content |
|
|
|
date |
date |
Date |
|
|
text |
content |
|
|
|
text |
display |
|
|
|
text |
display |
|
|
|
text |
display |
|
|
|
geospatial x/y co-ordinate |
N/A |
|
|
|
text |
display |
|
|
|
text |
display |
|
|
|
text |
display |
|
|
|
text |
content |
|
|
|
text |
content |
|
Use the -SF
query processor option to access these metadata fields on the
search response and in the templates (i.e. `-SF=[author,hashtag]).
Limits
Please note that Twitter applies limits to the volume of content which can be retrieved from their APIs, and so in the case of large Twitter streams Funnelback may be unable to gather all historical content.
Working with the fetched data
Funnelback will crawl Twitter and convert responses into XML. You can use the metadata customization tool to map elements to a metadata class.
To preview the crawled records please enable debug mode by setting the twitter.debug=true data source configuration option.
|
The XML that Funnelback generates for a Twitter data source is as follows:
<com.funnelback.socialmedia.twitter.TwitterXmlRecord>
<id>tweet_id</id>
<username>username</username>
<screenName>some username</screenName>
<profileImageUrl/>
<tweet>tweet content</tweet>
<createdDate>2018-06-20 14:58:03.0 UTC</createdDate>
<url>https://twitter.com/user_name/status/tweet_id</url>
<hashtags>
<Hashtag>
<start>110</start>
<end>119</end>
<text>hashtag conetnt</text>
</Hashtag>
</hashtags>
<linkedURLs>
<URL>
<start>133</start>
<end>156</end>
<shortUrl>https://t.co/qwert</shortUrl>
<expandedURL>http://bit.ly/qwert</expandedURL>
<displayURL>bit.ly/qwert</displayURL>
</URL>
</linkedURLs>
<isReTweet>false</isReTweet>
<linkedMediaURLs>
<MediaURL>
<baseUrl>http://pbs.twimg.com/media/qwert.jpg</baseUrl>
<thumbnail>
<pictureUrl>http://pbs.twimg.com/media/qwert.jpg:thumb</pictureUrl>
<width>150</width>
<height>150</height>
<resizeMethod>CROP</resizeMethod>
</thumbnail>
<small>
<pictureUrl>http://pbs.twimg.com/media/qwert.jpg:small</pictureUrl>
<width>430</width>
<height>430</height>
<resizeMethod>FIT</resizeMethod>
</small>
<medium>
<pictureUrl>http://pbs.twimg.com/media/qwert.jpg:medium</pictureUrl>
<width>430</width>
<height>430</height>
<resizeMethod>FIT</resizeMethod>
</medium>
<large>
<pictureUrl>http://pbs.twimg.com/media/qwert.jpg:large</pictureUrl>
<width>430</width>
<height>430</height>
<resizeMethod>FIT</resizeMethod>
</large>
</MediaURL>
</linkedMediaURLs>
</com.funnelback.socialmedia.twitter.TwitterXmlRecord>
xml
See also
-
Twitter V2 API gatherer plugin - alternate plugin that gathers content from Twitter using the Twitter V2 API.