Jay Taylor's notes
back to listing index(1) What are good tools to extract key words and/or topics/tags from a random paragraph of text? - Quora
[web search]
Original source (www.quora.com)
Tags:
nlp
www.quora.com
Clipped on: 2016-06-29
What are good tools to extract key words and/or topics/tags from a random paragraph of text?
I know Open Calais -- but what other tools could pull out keywords, names and phrases?
22 Answers
Abhishek Shivkumar, IBM Watson Labs
First, check with the answers on What is the best text analytics API + service? All the answers there provide pointers to good API functions that extract keywords and/or topics.
Also, recently Topicmarks [1] released their capability which can read your text and provide facts, summaries and keywords. I looked at their results and it is something you might want to explore.
[1]: http://topicmarks.com/(closed operations)
Also, recently Topicmarks [1] released their capability which can read your text and provide facts, summaries and keywords. I looked at their results and it is something you might want to explore.
[1]: http://topicmarks.com/(closed operations)
Vineet Yadav, M.Tech Computational linguistics , IIIT-H Text analytics and Natural language...
- open source tools
- kea(http://www.nzdl.org/Kea/) supervised approach uses training data and controlled vocabulary
- maui indexer(http://code.google.com/p/maui-in...) it is basically extension of kea which provide facility to use encyclopedia for key phrase extraction.
- carrot2(http://project.carrot2.org/) unsupervised approach for key phrase extraction. it supports lot of variation of input, output format and parameters for key phrase extraction.
- mallet topic modeling module(http://mallet.cs.umass.edu/topic...)
- Stanford topic modeling tool (http://nlp.stanford.edu/software...)
- Mahout clustering algorithms(http://mahout.apache.org/)
Commercial api
- Alchemy API(http://www.alchemyapi.com/api/ke...)
- zemanta API(http://developer.zemanta.com/)
- yahoo term extraction api(http://developer.yahoo.com/searc...)
Antonio Matarranz, High-tech Marketing, Madrid (Spain)
Textalytics (Meaning as a Service) is a cloud-based semantic API that offers a Topic Extraction service (entities, concepts)
In addition, you can tag your text with theme categories, feature-level sentiment, etc.
Disclosure: I work for Daedalus, the company that manufactures Textalytics.
UPDATE: Textalytics has been rebranded to MeaningCloud Web Services for Text Analytics and Mining | MeaningCloud
and it features a Topics Extraction service Topics extraction & named entity recognition | MeaningCloud
In addition, you can tag your text with theme categories, feature-level sentiment, etc.
Disclosure: I work for Daedalus, the company that manufactures Textalytics.
UPDATE: Textalytics has been rebranded to MeaningCloud Web Services for Text Analytics and Mining | MeaningCloud
and it features a Topics Extraction service Topics extraction & named entity recognition | MeaningCloud
Sujit Pal, search engineer interested in semantic search, text analytics and NLP, machin...
All great answers, but I see no one has mentioned RAKE so I am mentioning... It works quite nicely and is quite light on performance. The algorithm works by removing stopwords from the text and finding runs of high frequency words, then finds high frequency runs across stopwords if any. It is described in Michael W Berry's book Text Mining Applications and Theory (free PDF available if you search or use Amazon's look inside feature). There is also a Python implementation at https://github.com/aneesha/RAKE/...
Kumar Ishan, Founder @ ReaderDeck
17.1k Views
You can also look at jatetoolkit - Java Automatic Term Extraction toolkit
It implements following statistical algorithms for keyword/keyphrase extraction.
It implements following statistical algorithms for keyword/keyphrase extraction.
- basic term frequency
- Average term frequency in the corpus (term frequency/ document frequency)
- TF-IDF
- RIDF - Inverse Document Frequency (IDF): A Measure of Deviation from Poisson
- Weirdness - Weirdness indexing for logical document extrapolation and retrieval
- C-value - A methodology for automatic term recognition.
- GlossEx - Glossary extraction and knowledge in large organisations via semantic web technologies.
- TermEx - Termextractor: a web application to learn the shared terminology of emergent web communities.
Yura Koroliov, Natural Language Processing developer
There are good tutorial on significant phrase extraction at ling-pipe site http://alias-i.com/lingpipe/demo...
I found most useful and scalable Xtract algorithm implementation in Dragon Toolkit (dragon.ischool.drexel.edu). It is English only, but it's use smart wordnet stemmer and POS taggers in addition to pure probabilistic(Chi/Info gain) phrase scoring.
I found most useful and scalable Xtract algorithm implementation in Dragon Toolkit (dragon.ischool.drexel.edu). It is English only, but it's use smart wordnet stemmer and POS taggers in addition to pure probabilistic(Chi/Info gain) phrase scoring.
There's more on Quora...
Pick new people and topics to follow and see the best answers on Quora.
Top Stories from Your Feed
Read In Feed
Popular on Quora
What is the biggest tip that you have ever received as an employee?
Read In Feed
Popular on Quora
What are some unconventional ridiculously high paying jobs?
Zaki Islam, British born Muslim. Passionate about cross-people + cultural collaboration
565.8k Views