Jsoup is a java library for parsing the html either from URL or String or File.


Jsoup dependency can be downloaded from here : http://jsoup.org/download

Quick Start Example

package com.codeforeach.demo;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupQuickStartExample {
  public static void main(String[] args) throws IOException {

    // Connect to url and get the parsed html response
    Document doc = Jsoup.connect("http://www.jsoup.org").get();

    // Print the Title of response page
    System.out.println("Title : " + doc.title());

    // Print all anchor tags having href as attribute
    Elements anchorTagsWithHrefs = doc.select("a[href]");
    for (Element tag : anchorTagsWithHrefs) {
      System.out.println("\nhref : " + tag.attr("href"));
      System.out.println("text : " + tag.text());

Note : method select(cssQuery) accepts CSS-like selector pattern for finding the elements.

Title : jsoup Java HTML Parser, with best of DOM, CSS, and jquery

href : /
text : jsoup

href : /news/
text : News

href : /bugs
text : Bugs

href : /discussion
text : Discussion

href : /download
text : Download

href : /apidocs/
text : API Reference

*** omitted the remaining output for readability ***



