Jay Taylor's notes
back to listing indexWhat is the best entity extraction API + service? - Quora
[web search]
Original source (www.quora.com)
Clipped on: 2017-02-21
What is the best entity extraction API + service?
There are a lot of entity extraction APIs and services out there. What is your experience and what API do you think is the best one?
Answer Wiki
In alphabetical order:
- AlchemyAPI
- Ambiverse Natural Language Understanding API
- BitextAPI
- CogitoAPI
- Diffbot
- http://www.quora.com/Entitizer (Romanian, Russian, Bulgarian, Hungarian)
- Magnet by Klangoo (40+ languages, including Klingon ...)
- MeaningCloud
- Ingen.io
- Rosette (many European, Middle Eastern and Asian langauges)
- Saplo
- Semantria
- Aylien.com
- TextRazor
18 Answers
the Self Service Semantic Suite (S4) by Ontotext provides various services for text analytics and semantic data management.
In particular:
- entity and relation extraction service for biomedical texts (Semantic Biomedical Tagger)
- entity and relation extraction service for news (News Annotation)
- a News Classifier service
- and a service for Twitter analytics (based on the GATE Twitter IE component)
the S4 text analytics services are easily accessible either as RESTful services, or via the Java/C#/uimaFIT SDKs for S4. There are also S4 plugins for General Architecture for Text Engineering (GATE) and Apache UIMA which make accessing S4 services from these language engineering platforms easier
Ontotext S4 also provides RDF database-as-a-service capabilities, so that information extracted from text documents can be integrated into knowledge graphs and published/queried as Linked Data
in the near future more text analytics services will be available via S4, incl. sentiment analytics and entity/relation extraction for multiple languages
disclaimer: I work for Ontotext and I'm involved in the S4 product line
Statistical entity extractors do best on the data they have been trained on, which is usually news articles. So if your data is significantly different in format, style, or language from well-formed news, the accuracy of the extractor will degrade. Basis Technology has a field training mechanism where the user can add a quantity of their own data (annotated with entities or unannotated) to increase accuracy in those cases.
Basis Technology’s Rosette Entity Extractor supports 16 languages (covering Europe, Asia, and the Middle East) at production-level speeds--whether running on a development machine or in the cloud, and has a Java or C++ API.
It has some unique features: Support for 8 languages, entity disambiguation, linked data, entity quotations extraction, entity relations extraction, entity subtype/ontology mapping, and entity-targeted sentiment analysis.
I'm obviously biased as I work at Alchemy, however the API is used more than 2 billion times monthly, has tens of thousands of registered developers, and powers the semantic engines for dozens of companies.
Other engines worth checking out are Calais, Basis Tech, Attensity, IBM LanguageWare, etc.
The API extracts:
- People
- Places
- Companies/Products
- Email addresses
- URLs
- Twitter handles
You can try several demos at Saplo Demo Applications.
Disclosure I'm one of the founders of Saplo
From what I’ve read on their website (www.semantria.com), their entity extraction is model-based (i.e. doesn’t require curated dictionaries that constantly go out of data) and detects entities even with typos. By default they extract:
- People
- Places
- Companies
- Products
- Job Titles
- Dates/Times
- Quotes
- Twitter handles/hashtags
- Addresses
- Currencies
- Etc
Another feature that helped us a lot is ability to submit your own entity dictionary with custom entity type, to augment their out-of-the-box output. This way we submitted a list of our competitors with type “Competitor” and were able to deploy a simple social media competitive analysis dashboard in minutes.
The cool part about Semantria entity extraction is their ability to detect granular sentiment towards each entity they extract. So in a sentence “Unfortunately for Toyota with their bad recall history, new Honda products are working very well” you will get entity Toyota with negative sentiment and entity Honda with positive sentiment. Try for yourself at www.semantria.com/demo!
Besides API access (we used Java SDK but they provide SDKs for all popular languages), there is a very cool and easy to use Excel add-in (www.semantria.com/excel) that lets you run entity extraction from within your Excel. With our limited dev resources, it was very easy to get 90% of the work done without ever leaving Excel environment.
We’ve tried other services, such as AlchemyAPI and OpenAmplify and none of them were as easy to use and furthermore, none worked as out-of-the-box as Semantria. Free is available at www.semantria.com/trial
- People
- Places
- Companies
- Products
- Job Titles
- Dates/Times
- Quotes
- Twitter handles/hashtags
- Addresses
- Currencies
- Etc
Another feature that helped us a lot is ability to submit your own entity dictionary with custom entity type, to augment their out-of-the-box output. This way we submitted a list of our competitors with type “Competitor” and were able to deploy a simple social media competitive analysis dashboard in minutes.
The cool part about Semantria entity extraction is their ability to detect granular sentiment towards each entity they extract. So in a sentence “Unfortunately for Toyota with their bad recall history, new Honda products are working very well” you will get entity Toyota with negative sentiment and entity Honda with positive sentiment. Try for yourself at www.semantria.com/demo!
Besides API access (we used Java SDK but they provide SDKs for all popular languages), there is a very cool and easy to use Excel add-in (www.semantria.com/excel) that lets you run entity extraction from within your Excel. With our limited dev resources, it was very easy to get 90% of the work done without ever leaving Excel environment.
We’ve tried other services, such as AlchemyAPI and OpenAmplify and none of them were as easy to use and furthermore, none worked as out-of-the-box as Semantria. Free is available at www.semantria.com/trial
If you do not wish to upload your data, you can use GATE.ac.uk - index.html to do it and write your own web service to call that uses GATE to do NER.
http://www.monkeylearn.com
It's cheap and has different classifiers and extractos including an Entity extractor for Locations, People and Organizations.
It's cheap and has different classifiers and extractos including an Entity extractor for Locations, People and Organizations.
paper." http://kmi.open.ac.uk/publicatio...
2.6k Views