src/pkg/encoding/gob/doc.go - The Go Programming Language

Golang

Source file src/pkg/encoding/gob/doc.go

     1	// Copyright 2009 The Go Authors. All rights reserved.
     2	// Use of this source code is governed by a BSD-style
     3	// license that can be found in the LICENSE file.
     4	
     5	/*
     6	Package gob manages streams of gobs - binary values exchanged between an
     7	Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
     8	arguments and results of remote procedure calls (RPCs) such as those provided by
     9	package "rpc".
    10	
    11	A stream of gobs is self-describing.  Each data item in the stream is preceded by
    12	a specification of its type, expressed in terms of a small set of predefined
    13	types.  Pointers are not transmitted, but the things they point to are
    14	transmitted; that is, the values are flattened.  Recursive types work fine, but
    15	recursive values (data with cycles) are problematic.  This may change.
    16	
    17	To use gobs, create an Encoder and present it with a series of data items as
    18	values or addresses that can be dereferenced to values.  The Encoder makes sure
    19	all type information is sent before it is needed.  At the receive side, a
    20	Decoder retrieves values from the encoded stream and unpacks them into local
    21	variables.
    22	
    23	The source and destination values/types need not correspond exactly.  For structs,
    24	fields (identified by name) that are in the source but absent from the receiving
    25	variable will be ignored.  Fields that are in the receiving variable but missing
    26	from the transmitted type or value will be ignored in the destination.  If a field
    27	with the same name is present in both, their types must be compatible. Both the
    28	receiver and transmitter will do all necessary indirection and dereferencing to
    29	convert between gobs and actual Go values.  For instance, a gob type that is
    30	schematically,
    31	
    32		struct { A, B int }
    33	
    34	can be sent from or received into any of these Go types:
    35	
    36		struct { A, B int }	// the same
    37		*struct { A, B int }	// extra indirection of the struct
    38		struct { *A, **B int }	// extra indirection of the fields
    39		struct { A, B int64 }	// different concrete value type; see below
    40	
    41	It may also be received into any of these:
    42	
    43		struct { A, B int }	// the same
    44		struct { B, A int }	// ordering doesn't matter; matching is by name
    45		struct { A, B, C int }	// extra field (C) ignored
    46		struct { B int }	// missing field (A) ignored; data will be dropped
    47		struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
    48	
    49	Attempting to receive into these types will draw a decode error:
    50	
    51		struct { A int; B uint }	// change of signedness for B
    52		struct { A int; B float }	// change of type for B
    53		struct { }			// no field names in common
    54		struct { C, D int }		// no field names in common
    55	
    56	Integers are transmitted two ways: arbitrary precision signed integers or
    57	arbitrary precision unsigned integers.  There is no int8, int16 etc.
    58	discrimination in the gob format; there are only signed and unsigned integers.  As
    59	described below, the transmitter sends the value in a variable-length encoding;
    60	the receiver accepts the value and stores it in the destination variable.
    61	Floating-point numbers are always sent using IEEE-754 64-bit precision (see
    62	below).
    63	
    64	Signed integers may be received into any signed integer variable: int, int16, etc.;
    65	unsigned integers may be received into any unsigned integer variable; and floating
    66	point values may be received into any floating point variable.  However,
    67	the destination variable must be able to represent the value or the decode
    68	operation will fail.
    69	
    70	Structs, arrays and slices are also supported.  Strings and arrays of bytes are
    71	supported with a special, efficient representation (see below).  When a slice is
    72	decoded, if the existing slice has capacity the slice will be extended in place;
    73	if not, a new array is allocated.  Regardless, the length of the resulting slice
    74	reports the number of elements decoded.
    75	
    76	Functions and channels cannot be sent in a gob.  Attempting
    77	to encode a value that contains one will fail.
    78	
    79	The rest of this comment documents the encoding, details that are not important
    80	for most users.  Details are presented bottom-up.
    81	
    82	An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
    83	as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
    84	(high byte first) byte stream holding the value, preceded by one byte holding the
    85	byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
    86	256 is transmitted as (FE 01 00).
    87	
    88	A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
    89	
    90	A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
    91	upward contain the value; bit 0 says whether they should be complemented upon
    92	receipt.  The encode algorithm looks like this:
    93	
    94		uint u;
    95		if i < 0 {
    96			u = (^i << 1) | 1	// complement i, bit 0 is 1
    97		} else {
    98			u = (i << 1)	// do not complement i, bit 0 is 0
    99		}
   100		encodeUnsigned(u)
   101	
   102	The low bit is therefore analogous to a sign bit, but making it the complement bit
   103	instead guarantees that the largest negative integer is not a special case.  For
   104	example, -129=^128=(^256>>1) encodes as (FE 01 01).
   105	
   106	Floating-point numbers are always sent as a representation of a float64 value.
   107	That value is converted to a uint64 using math.Float64bits.  The uint64 is then
   108	byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
   109	exponent and high-precision part of the mantissa go first.  Since the low bits are
   110	often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
   111	three bytes (FE 31 40).
   112	
   113	Strings and slices of bytes are sent as an unsigned count followed by that many
   114	uninterpreted bytes of the value.
   115	
   116	All other slices and arrays are sent as an unsigned count followed by that many
   117	elements using the standard gob encoding for their type, recursively.
   118	
   119	Maps are sent as an unsigned count followed by that man key, element
   120	pairs. Empty but non-nil maps are sent, so if the sender has allocated
   121	a map, the receiver will allocate a map even no elements are
   122	transmitted.
   123	
   124	Structs are sent as a sequence of (field number, field value) pairs.  The field
   125	value is sent using the standard gob encoding for its type, recursively.  If a
   126	field has the zero value for its type, it is omitted from the transmission.  The
   127	field number is defined by the type of the encoded struct: the first field of the
   128	encoded type is field 0, the second is field 1, etc.  When encoding a value, the
   129	field numbers are delta encoded for efficiency and the fields are always sent in
   130	order of increasing field number; the deltas are therefore unsigned.  The
   131	initialization for the delta encoding sets the field number to -1, so an unsigned
   132	integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value
   133	= 7 or (01 07).  Finally, after all the fields have been sent a terminating mark
   134	denotes the end of the struct.  That mark is a delta=0 value, which has
   135	representation (00).
   136	
   137	Interface types are not checked for compatibility; all interface types are
   138	treated, for transmission, as members of a single "interface" type, analogous to
   139	int or []byte - in effect they're all treated as interface{}.  Interface values
   140	are transmitted as a string identifying the concrete type being sent (a name
   141	that must be pre-defined by calling Register), followed by a byte count of the
   142	length of the following data (so the value can be skipped if it cannot be
   143	stored), followed by the usual encoding of concrete (dynamic) value stored in
   144	the interface value.  (A nil interface value is identified by the empty string
   145	and transmits no value.) Upon receipt, the decoder verifies that the unpacked
   146	concrete item satisfies the interface of the receiving variable.
   147	
   148	The representation of types is described below.  When a type is defined on a given
   149	connection between an Encoder and Decoder, it is assigned a signed integer type
   150	id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
   151	the type of v and all its elements and then it sends the pair (typeid, encoded-v)
   152	where typeid is the type id of the encoded type of v and encoded-v is the gob
   153	encoding of the value v.
   154	
   155	To define a type, the encoder chooses an unused, positive type id and sends the
   156	pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
   157	description, constructed from these types:
   158	
   159		type wireType struct {
   160			ArrayT  *ArrayType
   161			SliceT  *SliceType
   162			StructT *StructType
   163			MapT    *MapType
   164		}
   165		type arrayType struct {
   166			CommonType
   167			Elem typeId
   168			Len  int
   169		}
   170		type CommonType struct {
   171			Name string // the name of the struct type
   172			Id  int    // the id of the type, repeated so it's inside the type
   173		}
   174		type sliceType struct {
   175			CommonType
   176			Elem typeId
   177		}
   178		type structType struct {
   179			CommonType
   180			Field []*fieldType // the fields of the struct.
   181		}
   182		type fieldType struct {
   183			Name string // the name of the field.
   184			Id   int    // the type id of the field, which must be already defined
   185		}
   186		type mapType struct {
   187			CommonType
   188			Key  typeId
   189			Elem typeId
   190		}
   191	
   192	If there are nested type ids, the types for all inner type ids must be defined
   193	before the top-level type id is used to describe an encoded-v.
   194	
   195	For simplicity in setup, the connection is defined to understand these types a
   196	priori, as well as the basic gob types int, uint, etc.  Their ids are:
   197	
   198		bool        1
   199		int         2
   200		uint        3
   201		float       4
   202		[]byte      5
   203		string      6
   204		complex     7
   205		interface   8
   206		// gap for reserved ids.
   207		WireType    16
   208		ArrayType   17
   209		CommonType  18
   210		SliceType   19
   211		StructType  20
   212		FieldType   21
   213		// 22 is slice of fieldType.
   214		MapType     23
   215	
   216	Finally, each message created by a call to Encode is preceded by an encoded
   217	unsigned integer count of the number of bytes remaining in the message.  After
   218	the initial type name, interface values are wrapped the same way; in effect, the
   219	interface value acts like a recursive invocation of Encode.
   220	
   221	In summary, a gob stream looks like
   222	
   223		(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
   224	
   225	where * signifies zero or more repetitions and the type id of a value must
   226	be predefined or be defined before the value in the stream.
   227	
   228	See "Gobs of data" for a design discussion of the gob wire format:
   229	http://golang.org/doc/articles/gobs_of_data.html
   230	*/
   231	package gob
   232	
   233	/*
   234	Grammar:
   235	
   236	Tokens starting with a lower case letter are terminals; int(n)
   237	and uint(n) represent the signed/unsigned encodings of the value n.
   238	
   239	GobStream:
   240		DelimitedMessage*
   241	DelimitedMessage:
   242		uint(lengthOfMessage) Message
   243	Message:
   244		TypeSequence TypedValue
   245	TypeSequence
   246		(TypeDefinition DelimitedTypeDefinition*)?
   247	DelimitedTypeDefinition:
   248		uint(lengthOfTypeDefinition) TypeDefinition
   249	TypedValue:
   250		int(typeId) Value
   251	TypeDefinition:
   252		int(-typeId) encodingOfWireType
   253	Value:
   254		SingletonValue | StructValue
   255	SingletonValue:
   256		uint(0) FieldValue
   257	FieldValue:
   258		builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
   259	InterfaceValue:
   260		NilInterfaceValue | NonNilInterfaceValue
   261	NilInterfaceValue:
   262		uint(0)
   263	NonNilInterfaceValue:
   264		ConcreteTypeName TypeSequence InterfaceContents
   265	ConcreteTypeName:
   266		uint(lengthOfName) [already read=n] name
   267	InterfaceContents:
   268		int(concreteTypeId) DelimitedValue
   269	DelimitedValue:
   270		uint(length) Value
   271	ArrayValue:
   272		uint(n) FieldValue*n [n elements]
   273	MapValue:
   274		uint(n) (FieldValue FieldValue)*n  [n (key, value) pairs]
   275	SliceValue:
   276		uint(n) FieldValue*n [n elements]
   277	StructValue:
   278		(uint(fieldDelta) FieldValue)*
   279	*/
   280	
   281	/*
   282	For implementers and the curious, here is an encoded example.  Given
   283		type Point struct {X, Y int}
   284	and the value
   285		p := Point{22, 33}
   286	the bytes transmitted that encode p will be:
   287		1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
   288		01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
   289		07 ff 82 01 2c 01 42 00
   290	They are determined as follows.
   291	
   292	Since this is the first transmission of type Point, the type descriptor
   293	for Point itself must be sent before the value.  This is the first type
   294	we've sent on this Encoder, so it has type id 65 (0 through 64 are
   295	reserved).
   296	
   297		1f	// This item (a type descriptor) is 31 bytes long.
   298		ff 81	// The negative of the id for the type we're defining, -65.
   299			// This is one byte (indicated by FF = -1) followed by
   300			// ^-65<<1 | 1.  The low 1 bit signals to complement the
   301			// rest upon receipt.
   302	
   303		// Now we send a type descriptor, which is itself a struct (wireType).
   304		// The type of wireType itself is known (it's built in, as is the type of
   305		// all its components), so we just need to send a *value* of type wireType
   306		// that represents type "Point".
   307		// Here starts the encoding of that value.
   308		// Set the field number implicitly to -1; this is done at the beginning
   309		// of every struct, including nested structs.
   310		03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
   311			// structType starts with an embedded CommonType, which appears
   312			// as a regular structure here too.
   313		01	// add 1 to field number (now 0); start of embedded CommonType.
   314		01	// add 1 to field number (now 0, the name of the type)
   315		05	// string is (unsigned) 5 bytes long
   316		50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
   317		01	// add 1 to field number (now 1, the id of the type)
   318		ff 82	// wireType.structType.CommonType._id = 65
   319		00	// end of embedded wiretype.structType.CommonType struct
   320		01	// add 1 to field number (now 1, the field array in wireType.structType)
   321		02	// There are two fields in the type (len(structType.field))
   322		01	// Start of first field structure; add 1 to get field number 0: field[0].name
   323		01	// 1 byte
   324		58	// structType.field[0].name = "X"
   325		01	// Add 1 to get field number 1: field[0].id
   326		04	// structType.field[0].typeId is 2 (signed int).
   327		00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
   328		01	// Add 1 to get field number 0: field[1].name
   329		01	// 1 byte
   330		59	// structType.field[1].name = "Y"
   331		01	// Add 1 to get field number 1: field[0].id
   332		04	// struct.Type.field[1].typeId is 2 (signed int).
   333		00	// End of structType.field[1]; end of structType.field.
   334		00	// end of wireType.structType structure
   335		00	// end of wireType structure
   336	
   337	Now we can send the Point value.  Again the field number resets to -1:
   338	
   339		07	// this value is 7 bytes long
   340		ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
   341		01	// add one to field number, yielding field 0
   342		2c	// encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
   343		01	// add one to field number, yielding field 1
   344		42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
   345		00	// end of structure
   346	
   347	The type encoding is long and fairly intricate but we send it only once.
   348	If p is transmitted a second time, the type is already known so the
   349	output will be just:
   350	
   351		07 ff 82 01 2c 01 42 00
   352	
   353	A single non-struct value at top level is transmitted like a field with
   354	delta tag 0.  For instance, a signed integer with value 3 presented as
   355	the argument to Encode will emit:
   356	
   357		03 04 00 06
   358	
   359	Which represents:
   360	
   361		03	// this value is 3 bytes long
   362		04	// the type number, 2, represents an integer
   363		00	// tag delta 0
   364		06	// value 3
   365	
   366	*/