bigchaindb/docs/build/html/_sources/Serialization.txt

We need to clearly define how to serialize a json object to calculate the hash.

The serialization should produce the same byte output independently of the architecture running the software. If
there are diferences in the serialization hash validations will fail altough the transaction is correct

##### Example

```python
a = r.expr({'a': 1}).to_json().run(b.connection)
u'{"a":1}'

b = json.dumps({'a': 1})
'{"a": 1}'

a == b
False
```

We should provide the serialization and deserialization so that the following is always true.

##### Example

```python
deserialize(serialize(data)) == data
True
```

### Standard serialization for the bigchain

After looking at this further I think that the python json module is still the best bet because it
complies with the RFC. We can specify the encoding, separators used and enforce it to order by the keys to
make sure that we obtain maximum interopelability.

```python
import json

json.dumps(data, skipkeys=False, ensure_ascii=False, encoding="utf-8",
           separators=(',', ':'), sort_keys=True)
```

- `skipkeys`: With skipkeys `False` if the provided keys are not a string the serialization will fail. This way we
enforce all keys to be strings
- `ensure_ascii`: The RFC recommends `utf-8` for maximum interoperability. By setting ensure_ascii to `False` we
allow unicode characters and force the encoding to `utf-8`.
- `separators`: We need to define a standard separator to use in the serialization. We did not do this different
implementations could use different separators for serialization resulting in a still valid transaction but with
a different hash e. g. an extra whitespace introduced in the serialization would not still create a valid json object
but the hash would be different

##### Example

Everytime we need to perform some operation on the data like calculating the hash or signing/verifying the transaction
we need to use the previous criteria to serialize the data and then use the `byte` representation of the serialized
data (if we threat the data as bytes we eliminate possible enconding errors e.g. unicode characters)

```python
# calculate the hash of a transaction
# the transaction is a dictionary
tx_serialized = bytes(serialize(tx))
tx_hash = hashlib.sha3_256(tx_serialized).hexdigest()

# signing a transaction
tx_serialized = bytes(serialize(tx))
signature = sk.sign(tx_serialized)

# verify signature
tx_serialized = bytes(serialize(tx))
vk.verify(signature, tx_serialized)
```