The Concert You Watch Together: What Social Co-Watch Will Feel Like on Apple Vision Pro
You are standing inside the Sahara Tent. The bass is in your ribs. The visuals are wrapping around you in every direction. The set is building toward the moment you have been waiting for — the transition you watched three times already, the one where the melodic line breaks open and the room ignites.
The drop hits. You yell. Nobody hears you.
That is the loneliness problem. And it is the single biggest objection people have about immersive music on Apple Vision Pro. Not the resolution. Not the field of view. Not the weight of the headset or the price of admission. The fact that you are alone in the room. Every room. Every show. Every time.
We have heard it in every user interview. Every demo. Every email from a new user who loved the first Teleport but did not come back for a second. The technology works. The spatial audio is convincing. The visual fidelity is extraordinary. And it does not matter — because a concert is not a concert if there is nobody next to you when the drop hits.
This post is about fixing that. VPORT Social v1 is in development. It will land mid-2026. Here is what it will do, what it will not do, what we are still figuring out, and why we believe co-watching is the most important feature we will ever ship. More important than 8K. More important than Immersive mode. More important than any single piece of content in the catalog.
The Loneliness Problem No VR App Has Solved
This is not unique to VPORT. Every immersive media platform on every headset has the same problem. You put on the device. The world goes away. Your friends go away. Your partner goes away. The person sitting three feet from you on the couch might as well be in another city.
VR gaming addressed this years ago with multiplayer. You can shoot aliens with your friends. You can play mini-golf with strangers. You can attend a virtual comedy show with a room full of avatars. The social infrastructure exists for games. It barely exists for passive media consumption — and it does not exist at all for immersive concert experiences at the fidelity level VPORT delivers.
Why? Because the technical requirements are different and the design constraints are harder.
In a VR game, the environment is rendered in real-time by the headset. Adding another player means adding another rendered avatar to the same engine. Hard, but understood. Decades of multiplayer game development have solved this.
In an immersive concert experience, the environment is a pre-recorded spatial video. The viewer is inside a captured 360-degree sphere of real-world footage. Adding another person to that space means placing a social presence — a voice, an identity, a reaction — inside a recording of a real room. The social layer has to coexist with the captured reality without breaking the sense of presence that makes the experience work.
Get it wrong and you shatter the illusion. A cartoon avatar floating inside a photorealistic concert hall. A voice that does not match the acoustic signature of the room. A notification popup that pulls you out of the set. Every VR social feature that has been attempted in media playback has failed on at least one of these dimensions.
We think we know how to get it right. Here is the approach.
What Social v1 Will Do
Friend System
Simple. Necessary. You add friends by username or through a QR code scan (phone-to-headset, Vision Pro-to-Vision Pro, or through the VPORT web app). Your friends list persists across sessions. When a friend is actively watching a show, you see it. When you are watching, they see it.
No follower counts. No public profiles. No social graph optimization. This is not a platform for building an audience. It is a utility for knowing when the people you care about are inside a concert so you can join them.
Spatial Voice Chat
When you and a friend are watching the same show simultaneously, you can talk to each other. Voice only. No video feed. No avatar. Just a voice that is spatialized to feel like it is coming from next to you in the venue — to your right, slightly behind, at a distance that matches where a friend would stand in a real crowd.
The spatial positioning is critical. A voice that comes through flat, like a phone call layered on top of the music, breaks presence. A voice that sounds like it is coming from a specific point in the room reinforces it. Your friend is not on the phone. Your friend is next to you. That is the difference.
The voice audio is ducked underneath the concert audio automatically. When neither of you is talking, the concert plays at full volume. When one of you speaks, the music dips just enough to make the voice clear without pulling you out of the experience. The same way your brain processes a friend yelling in your ear at a real show — you hear them, but the music is still there, still dominant, still the reason you are both in the room.
Reactions
Sometimes you do not want to talk. You want to share a moment without narrating it. A reaction system — lightweight, spatial, non-verbal — lets you express the thing your face is already doing.
The reaction set is intentionally small. No emoji keyboard. No sticker packs. Five reactions: fire, applause, shock, tears, and a simple pulse that says "I am here and I felt that." Each one manifests as a subtle visual and haptic cue — a brief glow at the periphery of your vision, a gentle tap from the headset. Present but not disruptive. More like a shared glance than a text message.
We debated whether reactions should be visible to all viewers in the room or only to your friends. For v1, they are friend-only. We do not want a room full of strangers flooding your experience with visual noise during the best part of the set. The concert is the content. The social layer supports it. It does not compete with it.
Session Sync
When you join a friend's session, the playback synchronizes automatically. You are watching the same frame at the same time. No countdown. No manual alignment. You Teleport into the show and land at the same moment your friend is experiencing.
For past events — which are the majority of the VPORT catalog — this is straightforward. Both clients request the same playback position from the server. For live events, you are both watching the same real-time stream by default.
The important nuance: you maintain independent head orientation. You and your friend are in the same room at the same time, but you are not forced to look in the same direction. You might be watching the stage. Your friend might be looking at the crowd. When your friend says "look left," you look left. That shared discovery — that moment of pointing something out — is one of the most human things about attending a concert with someone. We want to preserve it.
Moderation
Social features create social problems. We know this. Everyone knows this.
VPORT Social v1 ships with a moderation framework built in from day one. Not bolted on after the first harassment incident. Built in.
- Mute any user instantly. One gesture. Their voice disappears. They do not know they have been muted.
- Block any user. They cannot see your online status, cannot join your sessions, cannot send you friend requests. Permanent unless you undo it.
- Report any user. Voice chat abuse, harassment, disruptive behavior. Reports are reviewed by the trust and safety team. Repeat offenders lose social features permanently.
- Room size limits. For v1, a co-watch session supports up to six people. Small enough to maintain intimacy. Large enough for a friend group. We will expand this cautiously as we learn how larger groups affect the experience quality.
Why Co-Watch Matters More Than Any Technical Milestone
We have spent the past year and a half pushing the technical boundaries of immersive concert playback. 8K streaming. Spatial audio. Three playback modes. Creator tools that lower the capture barrier. All of it matters. None of it solves the retention problem.
The retention problem is this: people try VPORT, they are blown away, and then they stop using it. Not because the experience is not good. Because the experience is lonely.
Music is social. Always has been. The concert is a communal ritual. You go with friends. You meet strangers. You share the moment with a room full of people who are feeling the same thing at the same time. Remove the social layer and the concert becomes a sensory demo. Impressive, but hollow. You watch once to see the technology. You do not come back because there is no one to come back with.
Every attempt to grow the VR media audience — by every platform, not just us — has hit this wall. Individual sessions are breathtaking. Repeat usage is anemic. And the reason is not content quality or hardware comfort or library size. The reason is isolation.
Co-watch is not a feature. It is the unlock. The thing that turns VPORT from a technology showcase into a place. A place you go because your friends are there. A place you return to because the last time you watched a set together, something happened between the two of you — a reaction, a conversation, a shared moment of awe — that you want to have again.
That is what we are building toward. Not a better video player. A venue.
SharePlay + VPORT
Apple's SharePlay framework is the technical foundation for co-watch on Vision Pro. It handles session coordination, playback synchronization, and the communication layer between devices. VPORT Social v1 is built on top of SharePlay, which means co-watching works natively within the visionOS ecosystem.
What that means practically: if your friend is on FaceTime and you start a VPORT session, you can invite them directly. They Teleport in and you are watching together. No separate app. No invite codes. No friction.
SharePlay also handles the hard problems of network synchronization — latency compensation, buffer management, playback state coordination across devices on different networks. We do not have to solve those from scratch. Apple solved them. We build the experience on top.
The VPORT-specific layer is everything that makes co-watching a concert different from co-watching a movie. The spatial voice positioning. The music-aware audio ducking. The reaction system. The friend list. These are ours. SharePlay is the plumbing. VPORT Social is the room.
Social Experiences That Do Not Work Yet
Honesty section. We promised it. Here it is.
Large-scale shared spaces do not work yet.
The dream — a hundred people Teleporting into the same show and experiencing it together, a virtual crowd inside a captured crowd — is technically and experientially unsolved. A hundred simultaneous voice channels create noise, not community. A hundred reaction cues create visual clutter. The sense of shared presence that works with six people collapses at fifty.
We will solve this eventually. Probably not through voice. Probably through some combination of ambient social presence (knowing that 847 other people are watching right now), lightweight shared reactions (a wave of fire emojis that ripples across the room when the drop hits), and optional matchmaking into small pods of six to eight people with similar taste profiles. But that is v2 or v3. Not v1.
Avatars do not work yet.
We tried them. Early prototypes included simple spatial avatars — a translucent silhouette positioned where your friend would be standing. It looked wrong. A ghostly shape floating inside a photorealistic concert recording breaks the visual coherence of the space. The captured room is real. The avatar is not. Your brain sees the conflict immediately.
The right approach might be avatar-free social presence — voice, reactions, and the knowledge that someone is there, without a visual representation. Or it might be a highly abstracted representation — a subtle glow, a directional indicator, a spatial marker that does not attempt to look human. We are still testing.
Shared experiences across headsets do not work yet.
VPORT Social v1 is Vision Pro to Vision Pro. Co-watching between a Vision Pro user and a Meta Quest user is not supported at launch. The playback pipelines are different. The spatial audio rendering is different. The session coordination layers are different. Cross-platform co-watch is technically possible but experientially inconsistent, and we would rather ship something that works beautifully on one platform than something that works poorly on two.
This will change. The cross-platform question is one we think about constantly. But v1 ships where it ships.
What "Going to a Concert with Friends" Means in 2026
Here is the thing nobody talks about at VR conferences: the definition of "going to a concert" has already changed. It changed before spatial video. It changed before Vision Pro. It changed during the pandemic, when livestreams became the primary way most people experienced live music. It changed again when those livestreams ended and the return to live was not universal — not everyone came back, not everyone could afford to come back, not everyone lived close enough to come back.
In 2026, "going to a concert" means at least four different things.
Being in the room. Physical presence. The original. Still the best. Nothing replaces it.
Watching the livestream. Flat, scheduled, passive. Better than nothing. The experience most people settle for.
Teleporting in. VPORT on Vision Pro. Full spatial video. On your schedule. Alone. The experience that makes you gasp the first time and feel lonely the third time.
Teleporting in with friends. VPORT Social. Full spatial video. With the people you care about. The experience we believe will make immersive concerts a habit, not a novelty.
That fourth category does not exist yet. In a few months, it will. And when it does, the question stops being "is VR good enough to replace being there?" — a question that has always had the wrong framing — and starts being "who do you want to watch this with?"
That is the right question. That has always been the right question. The technology just needed to catch up to it.
What Is Next
VPORT Social v1 ships mid-2026. We will announce the exact date when the feature passes internal testing and Apple's review process. Early access will be available to existing VPORT users with active libraries.
Between now and then, we are publishing a series of posts on the social features as they solidify. Voice chat design decisions. Reaction system details. Moderation framework specifics. We want to build this in the open, with feedback from the people who will actually use it.
If you have thoughts — what co-watching should feel like, what would make you invite a friend, what you are afraid we will get wrong — we want to hear them. The social layer is not just a technical problem. It is a design problem. And the best design happens when the people who will live inside it help shape it.
The concert you watch alone is impressive. The concert you watch together is the one you remember. We are building the room. Come help us furnish it.