Apache Nutch · JSON Structure

Apache Nutch Fetch Node Db Info Structure

Information about a fetched node in the FetchDB.

Type: object Properties: 4 Required: 1
Web CrawlerIndexingSearchApacheJavaHadoopOpen Source

FetchNodeDbInfo is a JSON Structure definition published by Apache Nutch, describing 4 properties, of which 1 is required. It conforms to the https://json-structure.org/meta/core/v0/# meta-schema.

Properties

url status numOfOutlinks children

Meta-schema: https://json-structure.org/meta/core/v0/#

JSON Structure

Raw ↑
{
  "$schema": "https://json-structure.org/meta/core/v0/#",
  "$id": "https://raw.githubusercontent.com/api-evangelist/apache-nutch/refs/heads/main/json-structure/apache-nutch-fetch-node-db-info-structure.json",
  "name": "FetchNodeDbInfo",
  "description": "Information about a fetched node in the FetchDB.",
  "type": "object",
  "properties": {
    "url": {
      "type": "string",
      "description": "The URL of the fetched node."
    },
    "status": {
      "type": "int32",
      "minimum": 0,
      "maximum": 2147483647,
      "description": "The HTTP status code of the fetch."
    },
    "numOfOutlinks": {
      "type": "int32",
      "minimum": 0,
      "maximum": 2147483647,
      "description": "The number of outgoing links discovered."
    },
    "children": {
      "type": "array",
      "items": {
        "type": "object",
        "description": "A child (outlink) of a fetched node.",
        "properties": {
          "childUrl": {
            "type": "string",
            "description": "The URL of the child node."
          },
          "anchorText": {
            "type": "string",
            "description": "The anchor text of the link."
          }
        }
      },
      "description": "The outgoing links from this node."
    }
  },
  "required": [
    "children"
  ]
}