Golang encoding/xml

From wikinotes

go's builtin library for parsing xml.
See also: golang encoding, xml, golang x/net

A more detailed introduction to go's encoding interface can be seen in golang encoding/json.

NOTE:

Golang's builtin xml library does not support schema validation.

Documentation

official docs https://pkg.go.dev/encoding/xml@go1.18.3

Tutorials

tutorialedge https://tutorialedge.net/golang/parsing-xml-with-golang/

Struct Tags

See full details here

type User struct {
    Name  `xml:"Name"`       // <User><Name>value</Name></User>
    Color `xml:"color,attr"` // <User color="value"></User>
    Skip  `xml:"-"`          // <User></User>
}

XML namespaces are supported, by providing a prefix to the xml tag.

type Html struct {
    XMLName xml.Name `xml:"http://www.w3.org/1999/xhtml html`
}

// <html xmlns="http://www.w3.org/1999/xhtml">
//   <!-- ... -->
// </html>

Known issues:

  • schema validation does not appear to be supported

Serialization

Basics

type User struct {
    Id   int    `xml:"id"`
    Name string `xml:"name,attr"`
}

func main() {
    user := User{123, "will"}
    bytes, _ := xml.Marshal(&user)
    fmt.Println(string(bytes))  // <User name="will"><id>123</id></User>
}

You can also marshall with an indent

xml.MarshallIndent(
  &user,
  "  ",    // (2) indent entire object this many spaces
  "    ",  // (4) indent-width
)

//| | <-- 2x spaces
//  <User name="will">
//      <id>123</id>     // <-- indented 4x spaces
//  </User>

Deserialization

Basics

type User struct {
    Id   int    `xml:"id"`
    Name string `xml:"name,attr"`
}

func main() {
    raw := `<User name="will"><id>123</id></User>`
    var user User
    xml.Unmarshall([]byte(raw), &user)
    fmt.Println(user)
}

Non-Homogenous XML

XML is generally not homogenous.
Record each possible sub-element as a field on your object.
If an element can occur multiple times, declare it as an array.
You can ignore elements by not defining fields for them.

encoded := `
  <mediawiki>
      <siteinfo>
          abc
      </siteinfo>
      <page>
          <title>Main Page</title>
      </page>
      <page>
          <title>Linux</title>
      </page>
  </mediawiki>
`

type Result struct {
    XMLName  xml.Name `xml:"mediawiki"` // root node
    SiteInfo string   `xml:"siteinfo"`  // only one 'siteinfo' element under 'mediawiki'
    Page     []Page   `xml:"page"`      // multiple 'page' elements under 'mediawiki'
}

type Page struct {
    Title string `xml:"title"`
}

var result Result
xml.Unmarshall([]byte(encoded), &result)
fmt.Println(result.Page[0].Title.Text)  // Linux

Arbitrary XML

type Node struct {
	XMLName xml.Name
	Attrs   []*xml.Attr `xml:",any,attr"` // each attr.Name, attr.Value
	Data    string      `xml:",chardata"` // 'Title' in <h1>Title</h1>
	Nodes   []*Node     `xml:",any"`      // child-nodes
}

// deserialize
var parsed Node
xml.Unmarshal([]byte(raw), &parsed)

// re-serialize
bytes, err := xml.Marshal(&parsed)

Example: Deserialize, Modify, Re-serialize an arbitrary xml object.


WARNING:

Do not parse HTML with XML (elements like <br/> are invalid XML).

import (
	"encoding/xml"
	"fmt"
)

type Node struct {
	XMLName xml.Name
	Attrs   []*xml.Attr `xml:",any,attr"` // each attr.Name, attr.Value
	Data    string      `xml:",chardata"` // 'Title' in <h1>Title</h1>
	Nodes   []*Node     `xml:",any"`      // child-nodes
}

func addDotHtmlToAHrefs(node *Node) {
    // Adds a '.html' suffix to each href in a '<a href="foo">foo</a>'
	if node.XMLName.Local == "a" {
		for _, attr := range node.Attrs {
			if attr.Name.Local == "href" {
				attr.Value = fmt.Sprint(attr.Value, ".html")
			}
		}
	}

	for _, child := range node.Nodes {
		addDotHtmlToAHrefs(child)
	}
}

func main() {
	// define xml
	raw := `<html>
          <p><a href="abc">ABC</a></p>
          <blockquote><p><a href="def">DEF</a></p></blockquote>
        </html>`

	// unmarshall
	var parsed Node
	xml.Unmarshal([]byte(raw), &parsed)

	// modify
	addDotHtmlToAHrefs(&parsed)

	// re-marshall, modified
	bytes, _ := xml.MarshalIndent(&parsed, "", "  ")
	fmt.Println(string(bytes))
}

Outputs

<html>
  <p>
    <a href="abc.html">ABC</a>
  </p>
  <blockquote>
    <p>
      <a href="def.html">DEF</a>
    </p>
  </blockquote>
</html>


Custom Deserializers

TODO:

finish

func (revision *Revision) UnmarshalXML(d *xml.Decoder, start xml.Startelement) error {
    return nil
}