BLOG_

Review of Synthesized Social Signals
A novel concept for social platforms charting a path in the open source ecosystem.
published 2024-01-03

Introduction

Recently I read through a paper in the field of Computer-Human Interaction (CHI) describing a novel concept for social software platforms called "Synthesized Social Signals: Computationally-Derived Social Signals from Account Histories" by Jane Im, Sonali Tandon, Eshwar Chandrasekharan, Taylor Denby, & Eric Gilbert. I found the concept useful to think about and something I wish was considered more as a standard design practice across social platforms.

The paper centers on a concept called synthesized social signals, or S3s. These are signals that can be computationally derived from historical behavioral data associated with a given user’s account and then rendered into an easy to parse interface. These signals attempt to give a more accurate projection of user behavior by taking a hard to fake signal (i.e. posting history) and then using algorithms to distill the data into an easier to read signal.

Personally, I think S3s may be more common than the paper seems to imply and they are likely already being generated by corporations for advertising or internal moderation purposes. The drive of this paper though is primarily to discuss techniques and benefits of putting these S3s into the hands of actual end users.

The paper also discusses a reference implementation of an S3 driven user experience plugin called Sig. Sig is a Chrome extension that is responsible for computing and visualizing S3s for a given Twitter (or I guess, it’s X now) profile. The tool specifically was designed to generate and surface S3s related to toxic accounts and misinformation.

At the end of the paper, the authors also discuss opportunities and challenges. I think they do a great job of outlining core theory, but given my software development background, I think the conversation is left unfinished. Towards the end of this piece, I will talk about the opportunity we developers have to help foster an ecosystem of S3s by discussing what a more formal spec & interface template could look like.

Synthesized Social Signals (S3s)

The paper opens by discussing the importance of social signals while communicating. Talking face-to-face gives more feedback, but on digital platforms something as simple as having a profile pic or not can be interpreted as a positive or negative signal.

When discussing these sorts of signals, one must also take into effect the cost of generating & interpreting such signals. Those on the receiving end of social interactions often value signals that are more costly to generate, seeing them as more reliable and harder to fake.

The flipside though, is that it is also often costly to evaluate such reliable signals. This results in people often relying on more simple, cheaper to interpret social signals (such as the aforementioned profile pictures). Reducing this receiver cost is something S3s are specifically targeted at addressing. By utilizing algorithms to analyze and collate harder to fake signals like friends lists or posting histories, combined with a user-experience designed around surfacing these signals, users can gain quicker access to better signals resulting in a better user experience all around.

The authors specifically call out that “how and where to render this information as accessible information so that end-users can use it remains as an open question.” I would interject here though and suggest that it is not as much an open question as an ignored one on most commercial platforms. S3s are often generated internally for business use by most major platforms, but integrating them for end users is a path either ignored or intentionally closed off and hidden.

These sorts of signals, while valuable to discrete end-users, tend to ultimately produce friction inside social platforms and thus a lot of these cues that could be (or already are) generated and displayed to users are intentionally hidden or suppressed. Because the incentive structures of these platforms demand maximizing end-user engagement, any sort of signals that might reduce the user base or engagements are not surfaced.

We know for a fact this is the case. X includes a particularly useful S3 in Tweet data related to whether or not a Tweet should be displayed in France & Germany. This is because France & Germany have explicit laws against advocating for hate, violence and nazism. But these attributes are not utilized or surfaced to end users in the USA because the algorithm generating this particular S3 also flags Republican electeds.

So the answer to the “open” question is already itself answered by the paper. The location for rendering this information must be a space that end-users have full control over. In the paper's case, a browser extension like Sig. Perhaps in the future, more open-source programmable social platforms will proliferate, providing an embedded context for users to develop and share these signals as first class experiences on their own terms.

Sig: A Tool for S3s in Practice

Moving on to the actual reference implementation for S3s brings us to Sig, the aforementioned third-party browser extension. Sig sits as a view and conduit layer that scrapes data from X, passes it to some pre-existing ML APIs related to attempting to assess “toxic language” and compares links against a large database of sites to assess “misinformation”.

An astute implementation detail in Sig, is that it doesn't do any checks if you already follow someone. No need to confirm if you want to see something if you've already opted in! It also allows for configuration of notification thresholds. This sort of user driven configuration is key to providing user friendly experiences. Including why something tripped the system (i.e. auditability) is also an important inclusion.

One amusing observation in this section is how in the initial tests, Sig’s S3 flagged a Twitter blue checkmark (historically a positive social signal on the platform) as spreading misinformation. I found this funny, since at the time of writing this post the blue check signal has since inverted. In a post-Elon world they are commonly seen as a negative signal, to the point that Elon has made it so users can hide these signals (in almost spiritual and ideological opposition to the goal of this research). This also lines up with my thoughts on the “open question” of where S3s should live in the previous section.

Later in the paper, the discussion around compounding S3s to trigger flagging and finding other S3s to potentially integrate into the UX was an interesting path to think on. I found the discussion around displaying “posts” as safe, but not wanting a lack of a flag to indicate something as safe to be something worth considering. A lot of the focus of this paper is on identifying and mitigating anti-social posters, but how could S3s also be used to elevate and encourage pro-social posting?

Amidst all the potential benefits of S3s though, a good point was brought up by two participants in the study that a lot of the ML that can be assumed to be driving these signals are also well known to bake in biases from training data. Which leads towards my personal take away from the whole paper.

Let's Talk About a Spec

Implementing S3s as a feature in modern social platforms has clearly been shown by the author's to broadly be a net positive for end users and something that can be implemented by a development team working with off the shelf tech. While this suggests that adoption in the social platform space for S3s might be quick and easy, I think the historical evidence and an analysis of most corporate platforms indicates this will not be the case.

However, there is still hope! With the concept now outlined and the viability of implementation validated, we can begin cultivating adoption through the development of an open specification around S3s. An open specification would allow for many different actors and developers across the open source ecosystem to begin experimenting on various forms and implementations while collectively honing in design, collecting buy-in, and propagating the concept to a wider audience.

This has already happened before with a spec like ActivityPub being adopted by many open source social platform projects, to the point where it is now table stakes and even corporate platforms like Bluesky & Threads have adopted it.

So, where to start? First, let's look at the ActivityStreams spec. This gives a common language for describing underlying data models and constructs that could be analyzed for synthesis. Which is to say, instead of having to write a custom data scraper implementation for Twitter, then Instagram, and on and on, the spec could assume a common input data model and thus only need to manage one form of data for ingestion.

With a common set of input signals to draw from, a standard way of describing S3 specifications could be developed. For example:

  • Toxcity:
    • A value from 0-100 with 100 being the most toxic.
    • Assessed on an `Actor`.
    • Calculated based on the `content` field of `Objects` created by the given `Actor`.
  • Veracity:
    • A value from 1-5 with 1 being demonstrably false & 5 being factually true.
    • Assessed on a `Link`.
    • Based on the root domain of link compared against a collection of lists.

And so and and so forth.

This then creates an interface contract that allows for common discussion of S3s between clients and the backing ML algorithms that generate the S3s in the first place. This would then mean things like, when platform A & B pass a Link to a veracity service and get a 5 back, it gets interpreted the same in both platform’s user experiences.

If A & B are then configured to trust each other (either via federation or some other system) they can also begin to pass S3 data on the objects themselves. For example, A checks a Link and assigns it a Veracity:5. When that link is passed to B, it doesn't need to check the Link again as it's already verified.

Likewise, if the ML algorithm verifying links is discovered to be biased or compromised in some manner, a new algorithm could be swapped in on the backend. But a 5 would still mean a 5 and so any front end or other system work assessing or rendering a Veracity S3 would not need to change. Perhaps in the interest of auditing some mechanism for “signing” S3s might also be needed.

This would also mean platforms large or small that implemented ActivityStreams and this as yet unnamed S3 spec could depend on common, collectively governed services for generating S3s. This could allow for a potentially wider consensus of what is or isn’t toxic behavior, misinformation, etc. Or if a platform has a different view or standards on a given signal, a custom implementation could always be swapped in. And yet, at the end of the day, if something is marked as a 5 in the platform's system that means it's a 5. Assuming you trust the backing algorithm, but hey, we already have to do that anyway.

All this is to say that S3s are neat and Sig is a great proof of concept, both of the underlying concept and of how the only way we’re really going to get these sorts of user experience concerns in our software is by reclaiming the front end UX for ourselves. We’re certainly not going to get them first class in the corporate platforms without a fight. For anyone else who has found S3s as interesting and compelling as I have, I hope we can begin to develop a common language and interface for describing & utilizing S3s across the free and open source software ecosystem in a way which will help pave the road for widespread adoption both in hobby projects and eventually in future corporate platforms.