How to use Stanford NER with Spanish text

Last week I was trying to find a Java library to execute NER (Named Entity Recognition) in Spanish. I have used FreeLing in the past and I have to say that it’s quite good. The point is that this time I wanted to avoid making calls to C code from Java. My first intention was to try using OpenNLP. I made some tests with his Spanish models but I felt quite disappointed with the accuracy so I decided to move on and look for another library. The next one was the Stanford Named Entity Recognizer. This is a pure Java 8 library from the Stanford Natural Language Processing Group. After some tests, I realized that this would be my choice. The precision was really good, both in English and Spanish.

These are the steps to follow in order to integrate it as a Java library:

  1. Download latest version of Stanford Entity Recognizer (currently 3.6.0)
  2. Unzip it. You will find some shell scripts to play, the jar (stanford-ner.jar), javadoc and sources. There is also a folder named ‘classifiers’ with the English models and a folder ‘lib’ with the extra needed dependencies.
  3. Download spanish models (this is a jar file, unzip it and go to /edu/stanford/nlp/models/ner folder. Then copy the “spanish.ancora.distsim.s512.crf.ser.gz” file to a folder included on your application classpath. For example src/main/resources.
  4. Add stanford-ner.jar to the classpath of your application. I decided to “mavenize” this jar as a local maven artifact, using the standard Maven command:
mvn install:install-file -Dfile=stanford-ner.jar -DartifactId=stanford-jar-Dversion=3.6.0 -Dpackaging=jar

After this I added to my pom.xml the dependency:


(maybe you need to some extra dependencies, like slf4j or joda-time. In my case I had only to add slf4j-api)

Finally, to use it you can try something like:

String spanishSerializedClassifier = "spanish.ancora.distsim.s512.crf.ser.gz";

String englishSerializedClassifier = "english.all.3class.distsim.crf.ser.gz";
AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(spanishSerializedClassifier);

List<List<CoreLabel>> classifier.classify("David Bowie toma las calles del mundo.");

for (List<CoreLabel> coreLabels : apply) {

    for (CoreLabel word : coreLabels) {
        System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
The result will be:

[David, Bowie, toma, las, calles, del, mundo, .]
David/PERSON Bowie/PERSON toma/O las/O calles/O del/O mundo/O ./O  

 You can download a sample code project on my github page.

Esta entrada fue publicada en Artificial Intelligence, named entity recognition, natural language processing, ner, nlp, Uncategorized. Guarda el enlace permanente.


Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de

Estás comentando usando tu cuenta de Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s