Can It Be Searchable — But Not Readable?
Working with brainiacs, great discussions spontaneously occur. The following situation arose: A client wanted all data to be searchable – but not all data readable. The instance they gave was an HR person knowing that specific forms were there — but not allowing them to be read. What were best practices? Our engineering and field teams weighed in:
- Generally speaking, database permissions do not distinguish between letting you know a document exists and letting you read the document. The default is to have security role-based so that if you cannot read a document, you aren’t even allowed to know it exists. You could make a function that would have elevated permissions to see documents that the calling user may not otherwise be able to see, but the function would only return to the caller only whether or not documents exist.
We implemented something like this for a national lab. The search was elevated to a higher user context to return back a set of search results and limited metadata for an asset. When users attempted to view the asset and they lacked permission, they could fill out a form to request access to the asset.
- While this is a capability we can deliver well, the prospect should think through the security implications of even letting users know that documents exist that match a query. For example, if you allow full-text search on HR docs and store comp plans in a standard format, it would be easy to brute-force everyone’s salary by searching for names and possible salary values (“john smith ‘salary: $1,000′”, “john smith ‘salary: $2,000′”, etc.) until you get a hit back, even if you can’t see the matching document.
- In media, clients will make documents searchable – so that the asset is discoverable — but gate the documents behind a paywall if the reader does not have proper credentials.
- In some cases people require parts of a document be secured for reading them, but are fine with them being in the search index. E.g. national security – ‘Show me every document pertaining to Organisation X.” Knowing it exists is a GoodThing[™] as it then allows intelligence personnel to request access to the full content. In some cases, some users will be able to read the information in an intelligence report, but not the report section saying ‘Future surveillance,’ for example. It is possible to embed security within document content and use MarkLogic to redact these documents on the fly. This relies on the application assigning a standard in-document security tagging mechanism though.
According to my colleague Adam Fowler, “There really is no easy answer – it depends entirely on the data, what you need to index — and the organizational mandates and restrictions that exist.”