Quality Coding
Shares

How to Safely Parse JSON into Immutable Models, All with TDD

Shares

How can we unit test JSON parsing, handling every possible error? Can we generate immutable models? And for Swift, how can we keep our Response Models free of optionals?

Of course, there areb many JSON parsing libraries out there. Plug one in, define all fields as non-optional, and you’re good to go! …Until your app crashes, because something was different in the actual JSON data.

Unlikely? “The backend team would never do that to us”? I’ve had a released app crash because the backend folks changed one field from a string to an integer. I’ve seen app development and QA forced to pause because a commit assumed all fields were non-optional. (It crashed on the missing field, because Swift.)

So let’s look at a pattern that will help us

  • Handle required types
  • Avoid optionals
  • Deliver immutable models

Even if you never plan to do your own parsing, we’ll learn things along the way about design and testing.

Problems with all-in-one unit tests

I often see unit tests like this for JSON parsing:

    func test_parseJSON() {
        let json = // Read JSON saved in external file
        let actual = Model.parse(json)
        let expected = Model(
            // Define each field, including each subcomponent,
            // to match the JSON
        )
        XCTAssertEqual(actual, expected)
    }

This may be a fine acceptance test. But we can do better for unit testing. Here are the things I find problematic:

  1. Large, often externalized input. I discuss this at the very beginning of my JSON parsing screencast.
  2. The need to define Equatables all the way down. I discuss this in Let’s Stop Overusing Swift Equatables in Unit Tests
  3. Hunting for mismatches. When a mismatch occurs, the assertion fails. But the failure message will not tell us which field was wrong. We’ll have to go hunting, because the feedback is too coarse. A good unit test gives precise feedback.
  4. No way to grow a solution. One of the benefits of TDD is emergent design, where you gradually grow and refine a solution. An all-in-one test isn’t friendly toward emergent design. It all works, or it doesn’t.
  5. Difficult to test errors. We should test one error scenario at a time. But this is tricky with large input. We’ll end up copying and pasting the good input, then mutating one small piece of it. It’s the worst kind of duplication: “Everything’s the same, except for this one bit.”

These problems all point to the same thing: the test is too big. How can we break it into smaller parts?

Construction Builder to the rescue

“I want to combine individual parts into an immutable composite.” This leads us to the Construction Builder pattern. Despite the similar name, this is not the same as the GoF Builder pattern. Construction Builder comes from Domain Specific Languages by Martin Fowler. (Disclosure: This book link is an affiliate link. If you buy anything, I earn a commission, at no extra cost to you.)

Here’s the one-line description of the pattern:

Incrementally create an immutable object with a builder that stores constructor arguments in fields.

In a Construction Builder, each field can be set independently of the other fields.

This is the clue we need. We can first parse a flat structure into a set of fields. “Parse this field. Then that other field.” The tests can focus on one field at a time. And we’ll have tests to drive handling of unexpected JSON types.

Finally, everything is assembled into an immutable object. The build step can confirm that we have every field required.

A simple, flat JSON example

JSON parsing: Response Builder creates Response Model

Here, ResponseModel has two immutable fields. We define a corresponding ResponseBuilder, but its fields are mutable and optional.

The parse method decides how to set each field. It will:

  • Set the field if the dictionary provides the desired type.
  • (Optional) Attempt to coerce the input into the desired type.

The build method creates the ResponseModel. It will:

  • Determine if it has all required fields. If not, either return nil or raise an exception.
  • Call the initializer. It can also do further conversion from the JSON-like fields to other data types.

Simple parse

Let’s say the first field is numeric. Here’s how I’d parse it in Swift:

struct ResponseBuilder {
    let field1: Int?

    parse(dictionary dict: [String: Any]) {
        field1 = dict["field1"] as? Int
    }
}

It’s straightforward: If there is a dictionary entry, and it can be downcast to an Int, do so.

In Objective-C, things are more wordy. Having TDD’d the need for fields of different types, here’s part of what I refactored to:

static id requireType(id object, Class type) {
    if (![object isKindOfClass:type])
        return nil;
    return object;
}

NSNumber *QCORequireNumber(id object)
{
    return requireType(object, [NSNumber class]);
}

@implementation QCOResponseBuilder

- (void)parseDictionary:(NSDictionary *)dict
{
    self.field1 = QCORequireNumber(dict[@"field1"]);
}

@end

We can potentially extend QCORequireNumber to handle string-to-number conversion. Remember my story of the shipping app that started crashing due to a JSON type change? After that, I began adding forward compatibility with defensive string-to-number and number-to-string converters.

Simple build

Once parsing finishes, the build method can check to see if it has all required fields. In this Swift example, we return nil if anything is missing. But if all is well, we call the initializer:

    func build() -> ResponseModel? {
        guard let field1 = field2,
              let field2 = field2 else {
            return nil
        }
        return ResponseModel(field1: field1, field2: field2)
    }

Here’s the same thing in Objective-C, with a slight twist. “field1” is an NSNumber in the builder, so a nil value means “we didn’t get valid input for field1.” But in the response model, it’s easier to use C types instead of NSNumbers. build can handle that final conversion:

- (QCOResponseModel *)build
{
    if (!self.field1 || !self.field2)
        return nil;
    return [[QCOResponseModel alloc]
        initWithField1:self.field1.integerValue field2:self.field2];
}

What about nested JSON data?

So far, we’ve only looked at a flat example. How can we use Construction Builders to parse JSON for more complex data?

The MarvelBrowser TDD projects call the Marvel API for information about comic characters. Here’s an example response:

{
    "code": 200,
    "status": "Ok",
    "data": {
        "offset": 20,
        "total": 22,
        "results": [
            {
                "name": "Cyclops"
            },
            {
                "name": "Phoenix"
            }
        ]
    }
}

The “data” section represents paging. There are 22 total results, and we’re skipping over the first 20. There are other fields, but I’m only showing a subset. And in the array of results for each character, I’m only showing the character name.

For each nested data structure, we define nested builders:

JSON parsing: Nested Response Builders create nested Response Models

Let’s focus for now on the object at the top center. In CharactersSliceResponseBuilder, “offset” and “total” each have their own properties for incremental construction. But what is the type of “results”? It’s an array… of what?

It turns out to be an array of builders for the next level:

    let offset: Int?
    let total: Int?
    let results: [CharacterResponseBuilder]?

Tests to drive nested data

While writing the Swift version, I discovered that I could simplify the code by changing the parse methods to initializers.

Then, I needed tests to drive handling of nested data. First, I wanted to handle an array with a single item:

    func testInit_WithOneResult_ShouldCaptureOneCharacterInBuilder() {
        let dict: [String: Any] = ["results": [
                ["name": "ONE"],
        ]]

        let sut = CharactersSliceResponseBuilder(dictionary: dict)

        XCTAssertEqual(sut.results?.count, 1)
        XCTAssertEqual(sut.results?[0].name, "ONE")
    }

With TDD, the production code to pass this test has no loops. We tell ourselves, “Forget looping for now. Let’s solve this little bit first.”

The second test has two items. This drives us to iterate the array, not just take the first element.

    func testInit_WithTwoResults_ShouldCaptureTwoCharactersInBuilder() {
        let dict: [String: Any] = ["results": [
                ["name": "ONE"],
                ["name": "TWO"],
        ]]

        let sut = CharactersSliceResponseBuilder(dictionary: dict)

        XCTAssertEqual(sut.results?.count, 2)
        XCTAssertEqual(sut.results?[0].name, "ONE")
        XCTAssertEqual(sut.results?[1].name, "TWO")
    }

Once this second test passes, we can delete the first test. It was a “stepping stone” test, and is no longer needed.

If the input isn’t an array, the captured array should be nil:

    func testInit_WithNonArrayResult_ShouldCaptureNilInBuilder() {
        let dict: [String: Any] = ["results": ["name": "DUMMY"]]

        let sut = CharactersSliceResponseBuilder(dictionary: dict)

        XCTAssertNil(sut.results)
    }

Finally, what if the input is an array, but not everything inside is a dictionary? Let’s ignore non-dictionary entries:

    func testInit_WithTwoResultsButFirstNotDictionary_ShouldCaptureValidSecondResult() {
        let dict: [String: Any] = ["results": [
                "DUMMY",
                ["name": "TWO"],
        ]]

        let sut = CharactersSliceResponseBuilder(dictionary: dict)

        XCTAssertEqual(sut.results?.count, 1)
        XCTAssertEqual(sut.results?[0].name, "TWO")
    }

I encourage you to try TDDing the initializer. Do one test at a time, writing the simplest code that passes. Then see what you can be refactor. Remember to keep everything green during refactoring!

Building a response model with nested data

What does the build method look like when we have an array of builders? Basically, we call build on each sub-builder, and put the results into an array:

    func build() -> CharactersSliceResponseModel? {
        guard let offset = offset,
              let total = total else {
            return nil
        }
        return CharactersSliceResponseModel(offset: offset, total: total, characters: buildCharacters())
    }

    private func buildCharacters() -> [CharacterResponseModel] {
        return results?.flatMap { $0.build() } ?? []
    }

I arrived at this production code thanks to various tests which investigate CharactersSliceResponseModel. The tests focus on the characters array. Now the initializer requires offset and total, but the tests don’t care about those fields. Specifying them over and over in the tests is a problem, because

  • The repetition is noise. It hides the important input, namely the character names.
  • What happens when we add more required fields? I don’t want to have to change every single test.

To solve this, I extracted a test helper:

    private func addRequiredFields(to dict: [String: Any]) -> [String: Any] {
        var dictPlusData = dict
        dictPlusData["offset"] = 0
        dictPlusData["total"] = 0
        return dictPlusData
    }

Here’s an example of a test that uses it:

    func testBuild_WithRequiredFieldsButNoResults_ShouldHaveEmptyCharactersArray() {
        let dict = addRequiredFields(to: [:])
        let sut = CharactersSliceResponseBuilder(dictionary: dict)

        let response = sut.build()

        XCTAssertEqual(response?.characters.count, 0)
    }

I’d still have to change a few tests if I add a new required field. But thanks to the helper, not that many.

Final JSON edge cases to consider

Here’s that diagram again of the relationship between the Construction Builders and the Response Models:

JSON parsing: Nested Response Builders create nested Response Models

I want to guard against a few more edge cases:

  • What if the FetchCharactersResponseBuilder receives no code?
  • What if it receives a success code of 200, but has no data?

These both indicate a problem in the data provided by the backend service. So what I’ll do is pretend I received a 500 code for “Internal Server Error”.

For Objective-C, the outermost response model contains the code, the status, and the slice. So the edge cases above can be handled by the FetchCharactersResponseBuilder.

But there’s one last edge case: What if the JSON is malformed and can’t be parsed?

The outermost Construction Builder is initialized from a dictionary. The JSON is already parsed before then. Let’s initiate the parsing from a standalone function:

QCOFetchCharactersResponseModel *QCOParseFetchCharactersJSONData(NSData *jsonData)
{
    id object = [NSJSONSerialization JSONObjectWithData:jsonData
                                                options:(NSJSONReadingOptions)0
                                                  error:NULL];
    QCOFetchCharactersResponseBuilder *builder = [[QCOFetchCharactersResponseBuilder alloc]
            initWithDictionary:QCORequireDictionary(object)];
    return [builder build];
}

If the JSON parsing fails, object will be nil. But a builder will still be created. When it calls build, there will be no code set because there was nothing to parse. Then end result, validated by a test, is that we’ll get a 500 error code:

- (void)testParse_WithMalformedJSON_ShouldReturnCode500
{
    NSString *json = @"{"cod";
    NSData *jsonData = [json dataUsingEncoding:NSUTF8StringEncoding];
   
    QCOFetchCharactersResponseModel *response = QCOParseFetchCharactersJSONData(jsonData);
   
    assertThat(@(response.code), is(@500));
}
@end

But it’s best to package the response differently in Swift. We don’t want downstream code to check whether the code is 200 or not. We simply want success or failure. (And for a successful response, we really don’t care about the status message.) So the Swift version of the FetchCharactersResponseModel i:s

typealias FetchCharactersResponseModel = Result

The FetchCharactersResponseBuilder handles one edge case. When there is no data, we’ll return a “Bad data” error:

    func build() -> FetchCharactersResponseModel {
        return data?.build()
                    .flatMap { .success($0) } ?? .failure("Bad data")
    }

It presumes that the code has already been confirmed to be 200. So the standalone function takes care of:

  • JSON parsing,
  • checking that the result is a dictionary, and
  • looking for code and status.

Conclusion

By using the Construction Builder pattern, we separated parsing (incremental construction) from building (an immutable Response Model).

Even if you avoid all this by using a JSON library, I hope this exercise has illustrated some TDD principles:

  • If something is too big to handle with small tests, break the problem into smaller parts. Write tests against the intermediate results.
  • There is no TDD rule saying, “Don’t do any design.” We need just enough design to proceed.
  • If there’s repeated test input that is necessary but irrelevant, look for a way to isolate it.

Have any questions or feedback? Please share your thoughts in the comments below!

About the Author Jon Reid

Jon is a coach and consultant on iOS Clean Code (Test Driven Development, unit testing, refactoring, design). He's been practicing TDD since 2001. You can learn more about his background, or see what services he can bring to your organization.

follow me on:

Leave a Comment:

10 comments
Add Your Reply