Image Alt

The Singapore Law Gazette

Data Beyond Borders

Tapping into Emerging Data for Effective Cross-Border Investigations and Disputes in Asia

The U.S. Department of Justice(DOJ) recently issued guidance for prosecutors when evaluating whether, and to what extent, a corporation’s compliance programme was effective at the time of a violation.1 With this guidance, the DOJ has outlined that communications from ephemeral chat applications are now explicitly in scope as relevant sources of information during investigations and document productions.2

This DOJ guidance adds to a growing list of regulatory agency references to modern and emerging forms of data in investigations and compliance reviews, including the U.S. Federal Trade Commission and the U.S. Securities and Exchange Commission. While this guidance is useful for prosecutors, organisations continue to grapple with managing, monitoring and reviewing data from an increasing amount of messaging applications, collaboration platforms and users worldwide.

The variety of sources today and the pace of change at which new functionality is introduced is unprecedented. New challenges exist regarding shared access, linked content, versions and chat interfaces.

Research by eMarketer outlined that more than 75 per cent of internet users worldwide will use a messaging application monthly by 2024. Furthermore, the latest messaging application usage statistics show that WhatsApp has two billion users, Facebook Messenger 1.3 billion users and WeChat one billion users worldwide.3 Several financial institutions are also creating and using their own proprietary applications for internal communications and for storing high-value intellectual property. The proliferation of these mobile chat applications in business has led to communication data now persisting in almost every investigation.

While this shifting data landscape impacts disputes and investigations workflows globally, a more regional-specific nuance is becoming apparent in Asia. For example, while WhatsApp Messenger, Snapchat and Facebook Messenger are the top three messaging applications in the U.S.,4 none are in the top-three messaging applications in China. This underscores the vast diversity of data types that organisations must address across domestic and international matters.

Sina Weibo, commonly referred to as the Twitter of China, is one of the most popular Chinese social media applications, with more than 573 million monthly active users.5 There are many widely used chat and social applications across Asia. These include Zalo in Vietnam, with more than 73 million active users,6 and in India, 81 per cent of digital users use WhatsApp and 57 per cent use Telegram. Similarly, Alibaba, Huawei and Tencent Cloud are the largest cloud providers in China, again representing a unique set of platforms from those commonly used in the U.S. and Europe.7

Despite apparent similarities in the user interface and functionality of these various cloud and communication platforms, there are significant technological differences in the way they are designed, structured and encoded — nuances which require unique solutions during an investigation.

Emerging Data Collection and Processing in Asia

One of the fundamental requirements with any data source is a thorough understanding of the underlying data structures. Thus, some of the common challenges organisations encounter when handling emerging data sources in Asia Pacific include:

  • Limited expertise with Asia-specific communication platforms. Different communication and cloud platforms typically have different underlying data structures. Each one requires in-depth knowledge and experience with the specific data source. Without the proper expertise to handle each step from preservation and collection to processing, analysis, review and production, data can be overlooked, misinterpreted or otherwise mishandled, leading to mistakes and downstream issues.
  • Data loss resulting from the use of misaligned processing tools. Given the specificity of data format and architecture within each chat application, custom processing methods are often required. Many tools have not caught up to the current state of emerging data, and if processing tools are used without the necessary adjustments, data may be lost. For example, specific tools are required to transpose and accurately interpret WeChat data, as compared with more “traditional” processing tools that were once considered the industry standard.
  • Collection mistakes. In addition to processing challenges, collecting data from emerging data sources is not as straightforward as taking a full disk image of a computer device. Ensuring that the appropriate methodologies are followed during the forensic collection phase of emerging data is crucial in order to prevent issues such as incompatible data formats, unusable or inaccessible data or data loss.

It’s critical for legal and compliance teams to understand that emerging data sources cannot not be handled in the same way as traditional data sources such as electronic documents or e-mail. Because handling emerging data sources requires specialised experience, and the types of sources in Asia Pacific are particularly nuanced, it is recommended that organisations assemble e-discovery teams with significant experience in the region. A team experienced with regional needs and emerging data sources generally will be best equipped to develop custom workflows and adapt to the requirements of various regulatory agencies and courts.

Review and Interpretation

Beyond the challenges that emerging data poses for forensic collection and processing of data, other factors need to be considered surrounding the review workflow in order to analyse the substantive content of these data sources. Although these emerging data platforms were not designed with e-discovery in mind, the courts and regulators have made clear that if emerging data systems contain relevant communications, documents or other records, these must be collected, reviewed and produced just as traditional sources must be, or the company will face the risk of spoliation or other violations.

For example, chat data is perhaps the most critical and yet most difficult emerging data source to review. Chat messages generally increase the overall volume of data and can dilute the context across dozens, hundreds or thousands of short messages. Also, chat rooms, groups and channels pose challenges when identifying specific custodians or recipients, and when it comes to the inevitable questions every investigator must answer: who knew what and when, who said what and when, and who did what and when. Only properly collected and translated messaging data can answer these key questions.

In some highly regulated industries such as finance, chat has become the primary communication mechanism. As a result, chat logs are of great importance during investigations or for legal and regulatory responses. While logs are usually available, they are viewable only in difficult-to-parse formats, which forces legal teams to review them at a much slower rate than for traditional data sources such as email or office documents. To provide the full context of an issue during the review process, it is important to normalise the chat data and weave it in with other structured and unstructured data.

When reviewing substantive content, it is also critical to understand shared content — that is, who owns and has access to what. Shared content items also significantly increase the volume of data. In investigations where it’s important to ascertain who has access to what, shared content documents play a key role.

During the review of documents, necessary care should be taken not to overlook critical information. For example, hyperlinked information may not be available within the review set, but it gives a sense of what additional information needs to be collected and reviewed in order to see the complete picture.

Version control is another area where a review team perusing multiple document versions needs to be aware of potential issues. Tracking numerous versions of files can provide a historical view of content that previously was difficult to access. Sufficient planning also needs to take place during the production stage in order to determine what version to produce.

Ephemeral Chat Applications and Bring-Your-Own-Device Policies

Ephemeral chat applications offer the ability to set disappearing or self-destructing messages. Snapchat is a classic example of an ephemeral chat application, where messages and multimedia can be set to disappear once the content has been viewed — a feature that has now been extended to other applications such as WhatsApp and Telegram. This means corporations that utilise applications such as WhatsApp for Business face the risk of being unable to produce relevant communication data in the event of a legal dispute or regulatory query.

This is where a policy framework comes into play. A policy methodology needs to be designed — and recorded — to support best practices; this needs to be initiated by clarifying strategic intent through the development of top-level policies. The governance framework applicable to the organisation’s communications will require a holistic approach to identifying data risk, with privacy, security and data governance as the three pillars of the data lifecycle. This framework will enable the development of strategies for proactive risk minimisation related to both ephemeral chat applications and BYOD scenarios.

The risk management data lifecycle outlines various steps that will need to be considered in relation to mobile chat application data, such as:

  • Assess security risk. Enforcing security policies for chat applications across corporations is typically not available for the majority of mobile chat applications, hence the importance of developing policies around best practices.
  • Archive. Regulatory requirements may require the corporation to ensure that chat communication data is retained for a certain period. The risk assessment should outline whether a WhatsApp data archiving solution8 needs to be implemented.
  • Destroy. Similar to archiving, this holistic approach will also outline the requirements in relation to end-of-lifecycle requirements. Alternatively, the ephemeral chat feature can be utilised to ensure that communication data is retained only for a certain period on mobile devices, and that third-party vendors can control archived data expiration.


In summary, there are significant technological differences between the communication platforms utilised across the Asia Pacific region. The resulting challenges will require investigation teams with significant experience—both in the region and with emerging data sources generally. Having an in-depth understanding of the underlying data structures, inclusive of linguistic challenges, is a key requirement for enabling efficient and accurate collection, processing, review and production of emerging data sources. With that foundation, investigators will be positioned to uncover the facts and answer the crucial questions of who knew what and when, who said what and when, and who did what and when.

FTI Technology

FTI Technology