Jay Taylor's notes

back to listing index

What is the best entity extraction API + service? - Quora

[web search]
Original source (www.quora.com)
Tags: nlp text-classification text-analytics www.quora.com
Clipped on: 2017-02-21

Sign In

What is the best entity extraction API + service?

There are a lot of entity extraction APIs and services out there. What is your experience and what API do you think is the best one?

Answer Wiki

In alphabetical order:

18 Answers
Image (Asset 1/22) alt= shares great discussion on comparison between nlp api like yahoo term
extractor, opencalsis, BriefNetworks, openamplify, AlchemyApi and Envi.
7.5k Views · View Upvotes
Image (Asset 2/22) alt=

the Self Service Semantic Suite (S4) by Ontotext provides various services for text analytics and semantic data management.

In particular:
  1. entity and relation extraction service for biomedical texts (Semantic Biomedical Tagger)
  2. entity and relation extraction service for news (News Annotation)
  3. a News Classifier service
  4. and a service for Twitter analytics (based on the GATE Twitter IE component)
The  entity extraction services also perform disambiguation and mapping to  knowledge graphs such as DBpedia, Freebase, and Geonames (news  analytics), or to specialised biomedical databases (biomedical tagger)

the  S4 text analytics services are easily accessible either as RESTful  services, or via the Java/C#/uimaFIT SDKs for S4. There are also S4  plugins for General Architecture for Text Engineering (GATE) and Apache UIMA which make accessing S4 services from these language engineering platforms easier

Ontotext  S4 also provides RDF database-as-a-service capabilities, so that  information extracted from text documents can be integrated into  knowledge graphs and published/queried as Linked Data

in  the near future more text analytics services will be available via S4,  incl. sentiment analytics and entity/relation extraction for multiple  languages

disclaimer: I work for Ontotext and I'm involved in the S4 product line
9.3k Views · View Upvotes
Image (Asset 3/22) alt=
Statistical entity extractors do best on the data they have been trained on, which is usually news articles. So if your data is significantly different in format, style, or language from well-formed news, the accuracy of the extractor will degrade. Basis Technology has a field training mechanism where the user can add a quantity of their own data (annotated with entities or unannotated) to increase accuracy in those cases.

Basis Technology’s Rosette Entity Extractor supports 16 languages (covering Europe, Asia, and the Middle East) at production-level speeds--whether running on a development machine or in the cloud, and has a Java or C++ API.
5.3k Views · View Upvotes
Image (Asset 4/22) alt=
It has some unique features: Support for 8 languages, entity disambiguation, linked data, entity quotations extraction, entity relations extraction, entity subtype/ontology mapping, and entity-targeted sentiment analysis.

I'm obviously biased as I work at Alchemy, however the API is used more than 2 billion times monthly, has tens of thousands of registered developers, and powers the semantic engines for dozens of companies.

Other engines worth checking out are Calais, Basis Tech, Attensity, IBM LanguageWare, etc.
10.9k Views · View Upvotes
Image (Asset 5/22) alt=
The API extracts:
  • People
  • Places
  • Companies/Products
  • Email addresses
  • URLs
  • Twitter handles

You can try several demos at Saplo Demo Applications.


Disclosure I'm one of the founders of Saplo
4.7k Views · View Upvotes
Image (Asset 6/22) alt=From what I’ve read on their website (www.semantria.com), their entity extraction is model-based (i.e. doesn’t require curated dictionaries that constantly go out of data) and detects entities even with typos. By default they extract:
-         People
-         Places
-         Companies
-         Products
-         Job Titles
-         Dates/Times
-         Quotes
-         Twitter handles/hashtags
-         Addresses
-         Currencies
-         Etc
Another feature that helped us a lot is ability to submit your own entity dictionary with custom entity type, to augment their out-of-the-box output. This way we submitted a list of our competitors with type “Competitor” and were able to deploy a simple social media competitive analysis dashboard in minutes.
The cool part about Semantria entity extraction is their ability to detect granular sentiment towards each entity they extract. So in a sentence “Unfortunately for Toyota with their bad recall history, new Honda products are working very well” you will get entity Toyota with negative sentiment and entity Honda with positive sentiment. Try for yourself at www.semantria.com/demo!
Besides API access (we used Java SDK but they provide SDKs for all popular languages), there is a very cool and easy to use Excel add-in (www.semantria.com/excel) that lets you run entity extraction from within your Excel. With our limited dev resources, it was very easy to get 90% of the work done without ever leaving Excel environment.
We’ve tried other services, such as AlchemyAPI and OpenAmplify and none of them were as easy to use and furthermore, none worked as out-of-the-box as Semantria. Free is available at www.semantria.com/trial
24.4k Views · View Upvotes
Image (Asset 7/22) alt=
If you do not wish to upload your data, you can use GATE.ac.uk - index.html to do it and write your own web service to call that uses GATE to do NER.
2.1k Views · View Upvotes · Answer requested by Michael R. Bernstein
Image (Asset 8/22) alt=http://www.monkeylearn.com
It's cheap and has different classifiers and extractos including an Entity extractor for Locations, People and Organizations.
2.3k Views · View Upvotes
Image (Asset 10/22) alt=