Source file src/pkg/encoding/gob/doc.go
1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 Package gob manages streams of gobs - binary values exchanged between an 7 Encoder (transmitter) and a Decoder (receiver). A typical use is transporting 8 arguments and results of remote procedure calls (RPCs) such as those provided by 9 package "rpc". 10 11 A stream of gobs is self-describing. Each data item in the stream is preceded by 12 a specification of its type, expressed in terms of a small set of predefined 13 types. Pointers are not transmitted, but the things they point to are 14 transmitted; that is, the values are flattened. Recursive types work fine, but 15 recursive values (data with cycles) are problematic. This may change. 16 17 To use gobs, create an Encoder and present it with a series of data items as 18 values or addresses that can be dereferenced to values. The Encoder makes sure 19 all type information is sent before it is needed. At the receive side, a 20 Decoder retrieves values from the encoded stream and unpacks them into local 21 variables. 22 23 The source and destination values/types need not correspond exactly. For structs, 24 fields (identified by name) that are in the source but absent from the receiving 25 variable will be ignored. Fields that are in the receiving variable but missing 26 from the transmitted type or value will be ignored in the destination. If a field 27 with the same name is present in both, their types must be compatible. Both the 28 receiver and transmitter will do all necessary indirection and dereferencing to 29 convert between gobs and actual Go values. For instance, a gob type that is 30 schematically, 31 32 struct { A, B int } 33 34 can be sent from or received into any of these Go types: 35 36 struct { A, B int } // the same 37 *struct { A, B int } // extra indirection of the struct 38 struct { *A, **B int } // extra indirection of the fields 39 struct { A, B int64 } // different concrete value type; see below 40 41 It may also be received into any of these: 42 43 struct { A, B int } // the same 44 struct { B, A int } // ordering doesn't matter; matching is by name 45 struct { A, B, C int } // extra field (C) ignored 46 struct { B int } // missing field (A) ignored; data will be dropped 47 struct { B, C int } // missing field (A) ignored; extra field (C) ignored. 48 49 Attempting to receive into these types will draw a decode error: 50 51 struct { A int; B uint } // change of signedness for B 52 struct { A int; B float } // change of type for B 53 struct { } // no field names in common 54 struct { C, D int } // no field names in common 55 56 Integers are transmitted two ways: arbitrary precision signed integers or 57 arbitrary precision unsigned integers. There is no int8, int16 etc. 58 discrimination in the gob format; there are only signed and unsigned integers. As 59 described below, the transmitter sends the value in a variable-length encoding; 60 the receiver accepts the value and stores it in the destination variable. 61 Floating-point numbers are always sent using IEEE-754 64-bit precision (see 62 below). 63 64 Signed integers may be received into any signed integer variable: int, int16, etc.; 65 unsigned integers may be received into any unsigned integer variable; and floating 66 point values may be received into any floating point variable. However, 67 the destination variable must be able to represent the value or the decode 68 operation will fail. 69 70 Structs, arrays and slices are also supported. Strings and arrays of bytes are 71 supported with a special, efficient representation (see below). When a slice is 72 decoded, if the existing slice has capacity the slice will be extended in place; 73 if not, a new array is allocated. Regardless, the length of the resulting slice 74 reports the number of elements decoded. 75 76 Functions and channels cannot be sent in a gob. Attempting 77 to encode a value that contains one will fail. 78 79 The rest of this comment documents the encoding, details that are not important 80 for most users. Details are presented bottom-up. 81 82 An unsigned integer is sent one of two ways. If it is less than 128, it is sent 83 as a byte with that value. Otherwise it is sent as a minimal-length big-endian 84 (high byte first) byte stream holding the value, preceded by one byte holding the 85 byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and 86 256 is transmitted as (FE 01 00). 87 88 A boolean is encoded within an unsigned integer: 0 for false, 1 for true. 89 90 A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1 91 upward contain the value; bit 0 says whether they should be complemented upon 92 receipt. The encode algorithm looks like this: 93 94 uint u; 95 if i < 0 { 96 u = (^i << 1) | 1 // complement i, bit 0 is 1 97 } else { 98 u = (i << 1) // do not complement i, bit 0 is 0 99 } 100 encodeUnsigned(u) 101 102 The low bit is therefore analogous to a sign bit, but making it the complement bit 103 instead guarantees that the largest negative integer is not a special case. For 104 example, -129=^128=(^256>>1) encodes as (FE 01 01). 105 106 Floating-point numbers are always sent as a representation of a float64 value. 107 That value is converted to a uint64 using math.Float64bits. The uint64 is then 108 byte-reversed and sent as a regular unsigned integer. The byte-reversal means the 109 exponent and high-precision part of the mantissa go first. Since the low bits are 110 often zero, this can save encoding bytes. For instance, 17.0 is encoded in only 111 three bytes (FE 31 40). 112 113 Strings and slices of bytes are sent as an unsigned count followed by that many 114 uninterpreted bytes of the value. 115 116 All other slices and arrays are sent as an unsigned count followed by that many 117 elements using the standard gob encoding for their type, recursively. 118 119 Maps are sent as an unsigned count followed by that man key, element 120 pairs. Empty but non-nil maps are sent, so if the sender has allocated 121 a map, the receiver will allocate a map even no elements are 122 transmitted. 123 124 Structs are sent as a sequence of (field number, field value) pairs. The field 125 value is sent using the standard gob encoding for its type, recursively. If a 126 field has the zero value for its type, it is omitted from the transmission. The 127 field number is defined by the type of the encoded struct: the first field of the 128 encoded type is field 0, the second is field 1, etc. When encoding a value, the 129 field numbers are delta encoded for efficiency and the fields are always sent in 130 order of increasing field number; the deltas are therefore unsigned. The 131 initialization for the delta encoding sets the field number to -1, so an unsigned 132 integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value 133 = 7 or (01 07). Finally, after all the fields have been sent a terminating mark 134 denotes the end of the struct. That mark is a delta=0 value, which has 135 representation (00). 136 137 Interface types are not checked for compatibility; all interface types are 138 treated, for transmission, as members of a single "interface" type, analogous to 139 int or []byte - in effect they're all treated as interface{}. Interface values 140 are transmitted as a string identifying the concrete type being sent (a name 141 that must be pre-defined by calling Register), followed by a byte count of the 142 length of the following data (so the value can be skipped if it cannot be 143 stored), followed by the usual encoding of concrete (dynamic) value stored in 144 the interface value. (A nil interface value is identified by the empty string 145 and transmits no value.) Upon receipt, the decoder verifies that the unpacked 146 concrete item satisfies the interface of the receiving variable. 147 148 The representation of types is described below. When a type is defined on a given 149 connection between an Encoder and Decoder, it is assigned a signed integer type 150 id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for 151 the type of v and all its elements and then it sends the pair (typeid, encoded-v) 152 where typeid is the type id of the encoded type of v and encoded-v is the gob 153 encoding of the value v. 154 155 To define a type, the encoder chooses an unused, positive type id and sends the 156 pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType 157 description, constructed from these types: 158 159 type wireType struct { 160 ArrayT *ArrayType 161 SliceT *SliceType 162 StructT *StructType 163 MapT *MapType 164 } 165 type arrayType struct { 166 CommonType 167 Elem typeId 168 Len int 169 } 170 type CommonType struct { 171 Name string // the name of the struct type 172 Id int // the id of the type, repeated so it's inside the type 173 } 174 type sliceType struct { 175 CommonType 176 Elem typeId 177 } 178 type structType struct { 179 CommonType 180 Field []*fieldType // the fields of the struct. 181 } 182 type fieldType struct { 183 Name string // the name of the field. 184 Id int // the type id of the field, which must be already defined 185 } 186 type mapType struct { 187 CommonType 188 Key typeId 189 Elem typeId 190 } 191 192 If there are nested type ids, the types for all inner type ids must be defined 193 before the top-level type id is used to describe an encoded-v. 194 195 For simplicity in setup, the connection is defined to understand these types a 196 priori, as well as the basic gob types int, uint, etc. Their ids are: 197 198 bool 1 199 int 2 200 uint 3 201 float 4 202 []byte 5 203 string 6 204 complex 7 205 interface 8 206 // gap for reserved ids. 207 WireType 16 208 ArrayType 17 209 CommonType 18 210 SliceType 19 211 StructType 20 212 FieldType 21 213 // 22 is slice of fieldType. 214 MapType 23 215 216 Finally, each message created by a call to Encode is preceded by an encoded 217 unsigned integer count of the number of bytes remaining in the message. After 218 the initial type name, interface values are wrapped the same way; in effect, the 219 interface value acts like a recursive invocation of Encode. 220 221 In summary, a gob stream looks like 222 223 (byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))* 224 225 where * signifies zero or more repetitions and the type id of a value must 226 be predefined or be defined before the value in the stream. 227 228 See "Gobs of data" for a design discussion of the gob wire format: 229 http://golang.org/doc/articles/gobs_of_data.html 230 */ 231 package gob 232 233 /* 234 Grammar: 235 236 Tokens starting with a lower case letter are terminals; int(n) 237 and uint(n) represent the signed/unsigned encodings of the value n. 238 239 GobStream: 240 DelimitedMessage* 241 DelimitedMessage: 242 uint(lengthOfMessage) Message 243 Message: 244 TypeSequence TypedValue 245 TypeSequence 246 (TypeDefinition DelimitedTypeDefinition*)? 247 DelimitedTypeDefinition: 248 uint(lengthOfTypeDefinition) TypeDefinition 249 TypedValue: 250 int(typeId) Value 251 TypeDefinition: 252 int(-typeId) encodingOfWireType 253 Value: 254 SingletonValue | StructValue 255 SingletonValue: 256 uint(0) FieldValue 257 FieldValue: 258 builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue 259 InterfaceValue: 260 NilInterfaceValue | NonNilInterfaceValue 261 NilInterfaceValue: 262 uint(0) 263 NonNilInterfaceValue: 264 ConcreteTypeName TypeSequence InterfaceContents 265 ConcreteTypeName: 266 uint(lengthOfName) [already read=n] name 267 InterfaceContents: 268 int(concreteTypeId) DelimitedValue 269 DelimitedValue: 270 uint(length) Value 271 ArrayValue: 272 uint(n) FieldValue*n [n elements] 273 MapValue: 274 uint(n) (FieldValue FieldValue)*n [n (key, value) pairs] 275 SliceValue: 276 uint(n) FieldValue*n [n elements] 277 StructValue: 278 (uint(fieldDelta) FieldValue)* 279 */ 280 281 /* 282 For implementers and the curious, here is an encoded example. Given 283 type Point struct {X, Y int} 284 and the value 285 p := Point{22, 33} 286 the bytes transmitted that encode p will be: 287 1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00 288 01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00 289 07 ff 82 01 2c 01 42 00 290 They are determined as follows. 291 292 Since this is the first transmission of type Point, the type descriptor 293 for Point itself must be sent before the value. This is the first type 294 we've sent on this Encoder, so it has type id 65 (0 through 64 are 295 reserved). 296 297 1f // This item (a type descriptor) is 31 bytes long. 298 ff 81 // The negative of the id for the type we're defining, -65. 299 // This is one byte (indicated by FF = -1) followed by 300 // ^-65<<1 | 1. The low 1 bit signals to complement the 301 // rest upon receipt. 302 303 // Now we send a type descriptor, which is itself a struct (wireType). 304 // The type of wireType itself is known (it's built in, as is the type of 305 // all its components), so we just need to send a *value* of type wireType 306 // that represents type "Point". 307 // Here starts the encoding of that value. 308 // Set the field number implicitly to -1; this is done at the beginning 309 // of every struct, including nested structs. 310 03 // Add 3 to field number; now 2 (wireType.structType; this is a struct). 311 // structType starts with an embedded CommonType, which appears 312 // as a regular structure here too. 313 01 // add 1 to field number (now 0); start of embedded CommonType. 314 01 // add 1 to field number (now 0, the name of the type) 315 05 // string is (unsigned) 5 bytes long 316 50 6f 69 6e 74 // wireType.structType.CommonType.name = "Point" 317 01 // add 1 to field number (now 1, the id of the type) 318 ff 82 // wireType.structType.CommonType._id = 65 319 00 // end of embedded wiretype.structType.CommonType struct 320 01 // add 1 to field number (now 1, the field array in wireType.structType) 321 02 // There are two fields in the type (len(structType.field)) 322 01 // Start of first field structure; add 1 to get field number 0: field[0].name 323 01 // 1 byte 324 58 // structType.field[0].name = "X" 325 01 // Add 1 to get field number 1: field[0].id 326 04 // structType.field[0].typeId is 2 (signed int). 327 00 // End of structType.field[0]; start structType.field[1]; set field number to -1. 328 01 // Add 1 to get field number 0: field[1].name 329 01 // 1 byte 330 59 // structType.field[1].name = "Y" 331 01 // Add 1 to get field number 1: field[0].id 332 04 // struct.Type.field[1].typeId is 2 (signed int). 333 00 // End of structType.field[1]; end of structType.field. 334 00 // end of wireType.structType structure 335 00 // end of wireType structure 336 337 Now we can send the Point value. Again the field number resets to -1: 338 339 07 // this value is 7 bytes long 340 ff 82 // the type number, 65 (1 byte (-FF) followed by 65<<1) 341 01 // add one to field number, yielding field 0 342 2c // encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22 343 01 // add one to field number, yielding field 1 344 42 // encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33 345 00 // end of structure 346 347 The type encoding is long and fairly intricate but we send it only once. 348 If p is transmitted a second time, the type is already known so the 349 output will be just: 350 351 07 ff 82 01 2c 01 42 00 352 353 A single non-struct value at top level is transmitted like a field with 354 delta tag 0. For instance, a signed integer with value 3 presented as 355 the argument to Encode will emit: 356 357 03 04 00 06 358 359 Which represents: 360 361 03 // this value is 3 bytes long 362 04 // the type number, 2, represents an integer 363 00 // tag delta 0 364 06 // value 3 365 366 */