discord-community (Hugging Face Discord Community)

hesamation

posted an update 27 days ago

Post

2878

this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc).

key highlights:

> small LLMs can beat proprietary giants
RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3.

> models have a hard time learning Python.
mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing.

> not all languages are equal (coding scaling laws)
the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :)

> MoE vs Dense (ability vs stability)
MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability.

> code models are "insecure" by default (duh)
training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works."

read the full paper:
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538)

lunarflu

updated a Space 30 days ago

HuggingMod

🛠

13

Monitor and moderate Discord server activity

grimjim

posted an update about 1 month ago

Post

3121

I wanted to call attention to Arli Ai's success in applying my recent modifications to refusal ablation to a MoE model successfully. Nice work, @OwenArli !
ArliAI/GLM-4.5-Air-Derestricted
Ablation on a MoE model is no small thing; I expect preserving norms/magnitudes during intervention better respects routing compared to naive refusal ablation.

(I would have tagged their org earlier, but that feature seemed to be broken via "@")

ArliAI

4 replies

·

grimjim

posted an update about 1 month ago

Post

3304

Going forward, I will be adopting the term Magnitude-Preserving Orthogonal Ablation (MPOA) for my recent work in mitigating model damage from abliteration. The technique potentially unlocks reasoning capacity previously occupied with safety refusal processing.

For details, start here: https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration

Showcase results: grimjim/gemma-3-12b-it-norm-preserved-biprojected-abliterated (outperforms base instruct on UGI Leaderboard NatInt)

(The existing name, while technically accurate, was a bit of a mouthful.)

2 replies

·

grimjim

posted an update about 1 month ago

Post

5033

Implemented a proof of concept sampler in pure PyTorch and transformers.

Max P consists of a dynamic token filter which applies Winsorization to cap the probabilties of top tokens. Specifically, a base probability in the range of [0,1] is used to cap individual token probability; the sampler then redistributes excess proportionally.

https://github.com/jim-plus/maxp-sampler-poc

Combined with Temperature and Min P, this could represent a more intuitive way of reducing repetition in text generation.

2 replies

·

lunarflu

updated a Space about 2 months ago

LevelBot

🥇

443

Verify Discord accounts using Hugging Face

grimjim

posted an update 3 months ago

Post

792

I've uploaded abliteration code with support for sparsification of the refusal vector. It's poorly documented, but the code should be straightforward.
https://github.com/jim-plus/llm-abliteration
The code is built atop a fork that enabled abliteration to be performed on models loaded in 4-bit or 8-bit bitsandbytes quantization. TransformerLens is not required, just plain Transformers. For those previously unaware, this opens up abliteration experimentation to more people with local VRAM limitations.

Since performing abliteration on a quant involves precision and perplexity loss, it stands to reason that a small amount of magnitude sparsification could filter out some noise and possibly even reduce the damage inflicted on latent space via ablation of the refusal vector.

There's a small but real acceleration of ablation of the refusal vector by reducing outer product operations from O(d²×n) to O(d×n), and then by pushing said computation layerwise to GPU. The code is hardcoded for CUDA acceleration currently. Normalization of the refusal vector was deferred in order to allow sparsification. In principle other behavior vector interventions could also be added and explored.

4 replies

·

OzzyGT

in discord-community/README 3 months ago

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

3

#10 opened 3 months ago by

sadimanna

in discord-community/README 3 months ago

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

3

#10 opened 3 months ago by

sadimanna

Parveshiiii

in discord-community/README 3 months ago

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

3

#10 opened 3 months ago by

sadimanna

in discord-community/README 3 months ago

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

6

#9 opened 3 months ago by

sadimanna

Parveshiiii

in discord-community/README 3 months ago

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

6

#9 opened 3 months ago by

sadimanna

saribasmetehan

in discord-community/WelcomeBot 3 months ago

AI-powered welcoming

➕ 1

14

#4 opened 7 months ago by

ProCreations

Nexus-Walker

in discord-community/WelcomeBot 3 months ago

AI-powered welcoming

➕ 1

14

#4 opened 7 months ago by

ProCreations

sekakama

in discord-community/WelcomeBot 3 months ago

AI-powered welcoming

➕ 1

14

#4 opened 7 months ago by

ProCreations

dpe1

in discord-community/WelcomeBot 3 months ago

AI-powered welcoming

➕ 1

14

#4 opened 7 months ago by

ProCreations

SaadKhan188

in discord-community/HuggingMod 4 months ago

Update README.md

1

#10 opened 4 months ago by

Sissniko

hesamation

posted an update 4 months ago

Post

10831

a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.

the table of contents looks like everything you need to know about agents + code:
> advanced prompt techniques
> multi-agent patterns
> tool use and MCP
> you name it

read it here: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu

you can also pre-order on Amazon (published by Springer) and the royalties goes to Save the Children: https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/

Sissniko

in discord-community/HuggingMod 4 months ago

Update README.md

1

#10 opened 4 months ago by

Sissniko

nirajandhakal

in discord-community/WelcomeBot 4 months ago

test

👀 3

134

#2 opened 10 months ago by

lunarflu

Hugging Face Discord Community

AI & ML interests

Recent Activity

HuggingMod

LevelBot

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

AI-powered welcoming

AI-powered welcoming

AI-powered welcoming

AI-powered welcoming

Update README.md

Update README.md

test

AI & ML interests

Recent Activity

Team members 26,205

discord-community's activity

HuggingMod

LevelBot

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

How to run inference with a model loaded on both CPU and GPU with device_map="balanced"

AI-powered welcoming

AI-powered welcoming

AI-powered welcoming

AI-powered welcoming

Update README.md

Update README.md

test