https://www.hamedshirzad.com/
🧵 Or check out a great summary thread from Yi (Joshua): bsky.app/profile/josh...
If you're attending ICLR, take a visit to their poster and talk:
📍 Poster Hall 3+2B #376 on Fri, Apr 25 at 15:00
🎤 Oral in Session 6A on Sat, Apr 26 at 16:30
🧵 Or check out a great summary thread from Yi (Joshua): bsky.app/profile/josh...
If you're attending ICLR, take a visit to their poster and talk:
📍 Poster Hall 3+2B #376 on Fri, Apr 25 at 15:00
🎤 Oral in Session 6A on Sat, Apr 26 at 16:30
We’d love to see anyone do more analysis of these things! To get you started, our scores are available from the "Attention Score Analysis" notebook in our repo:
github.com/hamed1375/Sp...
We’d love to see anyone do more analysis of these things! To get you started, our scores are available from the "Attention Score Analysis" notebook in our repo:
github.com/hamed1375/Sp...
On the Photo dataset (homophilic), attention mainly comes from graph edges. On the Actor dataset (heterophilic), self-loops and expander edges play a major role.
On the Photo dataset (homophilic), attention mainly comes from graph edges. On the Actor dataset (heterophilic), self-loops and expander edges play a major role.
A. Top-k scores rarely cover the attention sum across nodes, unless the graph has a very small average degree. Results are consistent for both dim=4 and dim=64.
A. Top-k scores rarely cover the attention sum across nodes, unless the graph has a very small average degree. Results are consistent for both dim=4 and dim=64.
A. In all experiments, the first layer's attention scores differed significantly, but scores were very consistent for all the other layers.
A. In all experiments, the first layer's attention scores differed significantly, but scores were very consistent for all the other layers.
A. The first layer consistently shows much higher entropy (more uniform attention across nodes), while deeper layers have sharper attention scores.
A. The first layer consistently shows much higher entropy (more uniform attention across nodes), while deeper layers have sharper attention scores.
Q. Are attention scores consistent across widths?
A. The distributions of where a node attends are pretty consistent.
Q. Are attention scores consistent across widths?
A. The distributions of where a node attends are pretty consistent.
🗓️ Thursday, Dec 12
⏰ 11:00 AM–2:00 PM PST
📍 East Exhibit Hall A-C, Poster #3010
📄 Paper: arxiv.org/abs/2411.16278
💻 Code: github.com/hamed1375/Sp...
See you there! 🙌✨
[13/13]
🗓️ Thursday, Dec 12
⏰ 11:00 AM–2:00 PM PST
📍 East Exhibit Hall A-C, Poster #3010
📄 Paper: arxiv.org/abs/2411.16278
💻 Code: github.com/hamed1375/Sp...
See you there! 🙌✨
[13/13]
Workshop paper link: arxiv.org/abs/2411.13028
Workshop paper link: arxiv.org/abs/2411.13028
[11/13]
[11/13]
[7/13]
[7/13]
[5/13]
[5/13]
[4/13]
[4/13]
[3/13]
[3/13]