Skip to main content

Data Separation

As mentioned on the Dark Launch page there are different ways an application cal collect and store data. For this guide, we will stick to the ways that are used in the prodtest-demo project: a database and a message queue.

Database

When storing data in a database in your application, you want to make sure that the data generated by Vnext will not be shown to the user. Therefore, you need to find a way to separate that data from production data. There are various ways to accomplish such a thing. The following three are described in this guide: separate database, separate table and extending the data model.

Note: If you want a quick overview of the options and the pros and cons, click here!

Separate database

One of the options is to use a separate database for Vnext. Vnext will setup a connection with the separate database and store all generated data in that database. This raises the question; what should you do if your application also needs data from the production database to function?

There are two basic options here; retrieving data directly from the production database or creating and managing a copy of the database. Both options are far from perfect. As retrieving data from a database twice (once for Vnext and once for Vlatest) will double the load on that database, meaning that you also have to double your resources. The second option might seem interesting at first, but think about the fact that you now will have to keep two database up to date all the time. Because if you do not keep your Vnext database up to date with Vlatest, you will most likely get errors on reads or writes because of data missing from the database.

This does not mean, however, that this option will never be a valid choice. It will always depend on your application.

Separate table

Another option is to use a separate 'shadow' table. This is a table within the same database specifically created and solely used by Vnext. The Vnext application will use the same database for all other requests, but add data only to the specific shadow table(s). That means that you no longer need to keep two separate databases up to date. It does mean that you might still have an extra load on your database, since both Vlatest and Vnext will use (some) of the same tables.

Again, it will depend on your application and your needs if this options might be a proper fit.

Extend data model

The last option that will be discussed is to extend your data model. You will have to create a new field to your model, for example generatedBy that specifies by which version of the service the record is produced. You could, for example, use Vlatest for your service that responds to the user input, and Vnext for the service that is 'in' dark launch. In your code, you can then specify in your queries that you only want to use data that has a specific tag.

This option seems quite good. You have little to no extra work, and it is almost the same as a separate table. However, you should consider the cleanliness of your code and database. Since you table will contain data of all possible versions you have and your code/query has to check the version for each request. This is generally considered to be a bad practice.

Overview

OptionDescriptionProsCons
Separate databaseCreate a separate database to be used by Vnext. Could start as a replica.Most robust. Lowest risk of pollutionQuite complex, possibly expensive.
Separate tableCreate a separate table to be used by Vnext. (For example TableShadow)Quite simple, low cost.Less robust, minor data pollution risk.
Extend data modelExtend the existing data model with a new field. For example: generatedBy that specifies what is production data, and what is not.Cheapest solution, quite simpleNot very clean, higher risk of pollution.

Message Queue

A message queue receives messages from a publisher. A consumer will in turn subscribe to new messages. Depending on the implementation, the messages are or are not deleted from the queue. There are various ways to make sure that certain messages will only be received by certain consumers. The three main options are a separate queue, routing or topics/exchanges.

Note: If you want a quick overview of the options and the pros and cons, click here!

Note: this guide uses the technical terms as used by RabbitMQ for queues/topics etc. This may result in different terminology than you are used to. In that case, please consult the documentation of RabbitMQ.

Separate queue

The first and probably most robust solution is to use an entirely different queue for your dark launched service. In this way you will minimize the risk of data pollution and are sure that each specific queue will only contain messages from a specific publisher. The downside of this approach, depending on you queue provider, is that this will likely double the cost in money and/or resources. Since now you have to keep two queues running all the time.

Routing

Routing, for message queues, is a principle that let's specific workers listen to only specific messages. For example, you can append a parameter version to each message on the queue whose value could be either Vlatest or Vnext. You can than make sure that the parts of your application that are not in a dark launch, will only listen for messages that contain the version Vlatest. You will be able to publish all messages to the same queue, and let you consumers pick what messages they should pay attention to. In this way, you do not have to double all of you resources immediately. You will however see an increase in traffic to you queue, and possibly from your queue as well, depending on you implementation. The downside of this is that you could say that it does pollute you data. Since all messages are placed on the same queue, this queue technically does not conform to the single responsibility principle.

Note: this guide uses the technical terms as used by RabbitMQ for queues/topics etc. This may result in different terminology than you are used to. In that case, please consult the documentation of RabbitMQ.

Topics (definition by RabbitMQ)

RabbitMQ also describes topics in their documentation. The benefits and costs of topics is considered the same af those when routing is used.

Overview

OptionDescriptionProsCons
Separate queueCreate a separate queue to be used by Vnext.Most robust. No pollution. SimplePossibly expensive.
RoutingAdd a specific parameter to your message to specify the version of the publisher.Cheap. Simple.Less robust. Data pollution risk.
TopicsPre- or postfix your message to differentiate the origin.Cheap. Simple.Less robust. Data pollution risk.