Azure Worflow data in Bing

(German text, possibly paywall?)

A reader of c’t in Germany stumbled over personal data for “clients” when searching in DuckDuckGo (data provided through Bing) for IFA, the German technology show. The Bing sourced results showed a list of personal information - names, addresses, social security numbers etc. - each of which had an identifier number starting with IFA (E.g. IFA0203545) all linking back to an Azure Workflow written in Azure Logic Apps. Because of the generic (westeurope.logic.azure.com) address, it wasn’t possible to identify the owner of the data, there were also no identifying marks on the “client cards” that were being called up - apart from the leaked PPI.

The reader contacted the editors (part of the heise publishing group and home of heise Security security researchers). They took on the case and investigated further.

It looked like the information was from the agency dealing with asylum seekers, so they contacted them. They quickly checked their systems, but quickly came back with the answer that they didn’t use any Azure applications at all. They also suggested, because of the language used, it probably came from the colleagues over in Austria - the heise team had already ruled out German speaking Switzerland.

The team then contacted the Austrian agency responsible for asylum seeker registration and they quickly replied (within 2 hours), that they had confirmed the leak and were working on the problem and the links had been closed off from direct access and had contacted Microsoft to try and find out why the quasi private sites were being indexed. This, obviously, still left the problem of the search results themselves; they no longer showed a site, they returned a 404 error, when clicked, but the preview on the search page still held all of the PPI that had been leaked.

Microsoft were also quick to react, the results disappeared from Bing within 2 days.

The team working on the Azure Workflow application had an idea about how the data came to be indexed, but it has as yet neither been confirmed nor rebutted.

There was no robots.txt file to get search engines to index the pages and the results only appeared in the Bing search engine.

The agency that processes the data uses MS Edge as its standard browser and if a link doesn’t start with a URI (E.g. http, https, ftp etc.), the text goes automatically to Bing to perform a search. The theory was, that if users kept copying the links from the internal workflow into the browser for manual action (the process should run automatically through the internal workflow, but sometimes the users want/need to kick off an action manually and can do so by copying a link out of the application and manually pasting it into the browser) and there was a space or other character in front of the URI, it goes automatically to Bing and Bing at some point got “fed up” with the whole thing, interpreted the supplied link and added it to its index - a total of 35 such IFA cards were found in Bing and Bing derived search engines, such as Yahoo!, DuckDuckGo or Ecosia.

The heise Team tried to replicate the process, but after several days, their tests still came up negative, but it isn’t conclusive proof one was or the other.

But, this is a big problem with many cloud applications and sharing services. They work on the premise that the links are “private”. For example, links for Google Drive, One Drive, NextCloud etc. and even Teams, Zoom etc. are “public but private”, in that whoever has/knows the link has full access to the data behind it, either read only or read/write access.

A good example of this in this community is the Google Drive link for the photos from the recent TWiT Cruise. Whoever has the link and view the photos, or if they have a read/write link, can add their own photos to the album.

There is often absolutely no verification when opening the links, you are just given complete access, because you passed the hurdle of having the “private” link. If, as in this case, the link somehow gets indexed, anyone can find it, likewise, careless sharing with 3rd parties can quickly get the link shared beyond the originally intended audience.

When I share something like this, I often open it up for a couple of hours, double check the other side has made a copy of the files and close the link again, or I use a service, like Strato HyperDrive, which limits the time and number of accesses to the data.

I use “private” link shares like this (albeit with an expiration date set) pretty regularly on my personal OneDrive. It’s a convenient feature but I can see how we’d end up in a situation like this if folks are using the feature for sensitive data. The security group at my company started requiring authentication on any links shared through our O365 service last year. Sadly another case of “this is why we can’t have nice things.”

1 Like