Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : remove Persimmon #7408

Merged
merged 2 commits into from
May 20, 2024
Merged

llama : remove Persimmon #7408

merged 2 commits into from
May 20, 2024

Conversation

ggerganov
Copy link
Owner

The Persimmon arch does not seem to work correctly and is implemented in a convoluted way that does not fit the existing patterns. It's better to reimplement this from scratch

@github-actions github-actions bot added the python python script changes label May 20, 2024
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label May 20, 2024
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 536 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8703.72ms p(95)=21184.22ms fails=, finish reason: stop=480 truncated=56
  • Prompt processing (pp): avg=101.29tk/s p(95)=463.69tk/s
  • Token generation (tg): avg=47.82tk/s p(95)=46.93tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/remove-persimmon commit=5d777e9c22d370bd5944c9002771b2f52da18637

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716200155 --> 1716200777
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 441.7, 441.7, 441.7, 441.7, 441.7, 525.84, 525.84, 525.84, 525.84, 525.84, 523.8, 523.8, 523.8, 523.8, 523.8, 565.45, 565.45, 565.45, 565.45, 565.45, 632.54, 632.54, 632.54, 632.54, 632.54, 641.88, 641.88, 641.88, 641.88, 641.88, 663.48, 663.48, 663.48, 663.48, 663.48, 681.42, 681.42, 681.42, 681.42, 681.42, 704.72, 704.72, 704.72, 704.72, 704.72, 704.11, 704.11, 704.11, 704.11, 704.11, 707.11, 707.11, 707.11, 707.11, 707.11, 714.01, 714.01, 714.01, 714.01, 714.01, 734.17, 734.17, 734.17, 734.17, 734.17, 753.64, 753.64, 753.64, 753.64, 753.64, 757.12, 757.12, 757.12, 757.12, 757.12, 763.48, 763.48, 763.48, 763.48, 763.48, 780.42, 780.42, 780.42, 780.42, 780.42, 790.56, 790.56, 790.56, 790.56, 790.56, 789.45, 789.45, 789.45, 789.45, 789.45, 797.51, 797.51, 797.51, 797.51, 797.51, 801.34, 801.34, 801.34, 801.34, 801.34, 822.45, 822.45, 822.45, 822.45, 822.45, 820.41, 820.41, 820.41, 820.41, 820.41, 822.61, 822.61, 822.61, 822.61, 822.61, 839.01, 839.01, 839.01, 839.01, 839.01, 837.45, 837.45, 837.45, 837.45, 837.45, 835.92, 835.92, 835.92, 835.92, 835.92, 834.9, 834.9, 834.9, 834.9, 834.9, 840.52, 840.52, 840.52, 840.52, 840.52, 840.61, 840.61, 840.61, 840.61, 840.61, 838.12, 838.12, 838.12, 838.12, 838.12, 842.93, 842.93, 842.93, 842.93, 842.93, 855.86, 855.86, 855.86, 855.86, 855.86, 848.63, 848.63, 848.63, 848.63, 848.63, 848.51, 848.51, 848.51, 848.51, 848.51, 851.47, 851.47, 851.47, 851.47, 851.47, 849.66, 849.66, 849.66, 849.66, 849.66, 848.51, 848.51, 848.51, 848.51, 848.51, 852.06, 852.06, 852.06, 852.06, 852.06, 854.41, 854.41, 854.41, 854.41, 854.41, 854.22, 854.22, 854.22, 854.22, 854.22, 858.92, 858.92, 858.92, 858.92, 858.92, 859.08, 859.08, 859.08, 859.08, 859.08, 858.38, 858.38, 858.38, 858.38, 858.38, 856.42, 856.42, 856.42, 856.42, 856.42, 854.46, 854.46, 854.46, 854.46, 854.46, 847.66, 847.66, 847.66, 847.66, 847.66, 849.01, 849.01, 849.01, 849.01, 849.01, 847.91, 847.91, 847.91, 847.91, 847.91, 852.32, 852.32, 852.32, 852.32, 852.32, 853.23, 853.23, 853.23, 853.23, 853.23, 859.22, 859.22, 859.22, 859.22, 859.22, 858.37, 858.37, 858.37, 858.37, 858.37, 861.86, 861.86, 861.86, 861.86, 861.86, 860.64, 860.64, 860.64, 860.64, 860.64, 861.59, 861.59, 861.59, 861.59, 861.59, 861.3, 861.3, 861.3, 861.3, 861.3, 861.36, 861.36, 861.36, 861.36, 861.36, 860.67, 860.67, 860.67, 860.67, 860.67, 863.08, 863.08, 863.08, 863.08, 863.08, 862.54, 862.54]
                    
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716200155 --> 1716200777
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.79, 38.79, 38.79, 38.79, 38.79, 41.27, 41.27, 41.27, 41.27, 41.27, 33.8, 33.8, 33.8, 33.8, 33.8, 35.73, 35.73, 35.73, 35.73, 35.73, 35.33, 35.33, 35.33, 35.33, 35.33, 36.52, 36.52, 36.52, 36.52, 36.52, 37.33, 37.33, 37.33, 37.33, 37.33, 37.73, 37.73, 37.73, 37.73, 37.73, 37.27, 37.27, 37.27, 37.27, 37.27, 37.09, 37.09, 37.09, 37.09, 37.09, 36.08, 36.08, 36.08, 36.08, 36.08, 34.75, 34.75, 34.75, 34.75, 34.75, 34.76, 34.76, 34.76, 34.76, 34.76, 33.52, 33.52, 33.52, 33.52, 33.52, 32.83, 32.83, 32.83, 32.83, 32.83, 31.98, 31.98, 31.98, 31.98, 31.98, 32.13, 32.13, 32.13, 32.13, 32.13, 31.85, 31.85, 31.85, 31.85, 31.85, 31.75, 31.75, 31.75, 31.75, 31.75, 31.73, 31.73, 31.73, 31.73, 31.73, 31.91, 31.91, 31.91, 31.91, 31.91, 31.91, 31.91, 31.91, 31.91, 31.91, 31.6, 31.6, 31.6, 31.6, 31.6, 31.66, 31.66, 31.66, 31.66, 31.66, 31.88, 31.88, 31.88, 31.88, 31.88, 31.66, 31.66, 31.66, 31.66, 31.66, 31.52, 31.52, 31.52, 31.52, 31.52, 31.6, 31.6, 31.6, 31.6, 31.6, 31.78, 31.78, 31.78, 31.78, 31.78, 31.87, 31.87, 31.87, 31.87, 31.87, 31.91, 31.91, 31.91, 31.91, 31.91, 31.94, 31.94, 31.94, 31.94, 31.94, 31.96, 31.96, 31.96, 31.96, 31.96, 31.79, 31.79, 31.79, 31.79, 31.79, 31.61, 31.61, 31.61, 31.61, 31.61, 31.1, 31.1, 31.1, 31.1, 31.1, 30.78, 30.78, 30.78, 30.78, 30.78, 30.86, 30.86, 30.86, 30.86, 30.86, 30.94, 30.94, 30.94, 30.94, 30.94, 31.14, 31.14, 31.14, 31.14, 31.14, 31.25, 31.25, 31.25, 31.25, 31.25, 31.02, 31.02, 31.02, 31.02, 31.02, 30.82, 30.82, 30.82, 30.82, 30.82, 30.56, 30.56, 30.56, 30.56, 30.56, 29.68, 29.68, 29.68, 29.68, 29.68, 28.73, 28.73, 28.73, 28.73, 28.73, 28.79, 28.79, 28.79, 28.79, 28.79, 28.79, 28.79, 28.79, 28.79, 28.79, 28.78, 28.78, 28.78, 28.78, 28.78, 28.83, 28.83, 28.83, 28.83, 28.83, 28.9, 28.9, 28.9, 28.9, 28.9, 28.97, 28.97, 28.97, 28.97, 28.97, 28.89, 28.89, 28.89, 28.89, 28.89, 28.94, 28.94, 28.94, 28.94, 28.94, 28.84, 28.84, 28.84, 28.84, 28.84, 28.85, 28.85, 28.85, 28.85, 28.85, 28.97, 28.97, 28.97, 28.97, 28.97, 29.1, 29.1, 29.1, 29.1, 29.1, 29.2, 29.2, 29.2, 29.2, 29.2, 29.22, 29.22, 29.22, 29.22, 29.22, 29.29, 29.29]
                    

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716200155 --> 1716200777
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.11, 0.11, 0.11, 0.11, 0.11, 0.37, 0.37, 0.37, 0.37, 0.37, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.2, 0.2, 0.2, 0.2, 0.2, 0.26, 0.26, 0.26, 0.26, 0.26, 0.08, 0.08, 0.08, 0.08, 0.08, 0.19, 0.19, 0.19, 0.19, 0.19, 0.35, 0.35, 0.35, 0.35, 0.35, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.24, 0.24, 0.24, 0.24, 0.24, 0.21, 0.21, 0.21, 0.21, 0.21, 0.28, 0.28, 0.28, 0.28, 0.28, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.36, 0.36, 0.36, 0.36, 0.36, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.34, 0.34, 0.34, 0.34, 0.34, 0.26, 0.26, 0.26, 0.26, 0.26, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.23, 0.23, 0.23, 0.23, 0.23, 0.27, 0.27, 0.27, 0.27, 0.27, 0.26, 0.26, 0.26, 0.26, 0.26, 0.34, 0.34, 0.34, 0.34, 0.34, 0.32, 0.32, 0.32, 0.32, 0.32, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.33, 0.33, 0.33, 0.33, 0.33, 0.58, 0.58, 0.58, 0.58, 0.58, 0.64, 0.64, 0.64, 0.64, 0.64, 0.7, 0.7, 0.7, 0.7, 0.7, 0.52, 0.52, 0.52, 0.52, 0.52, 0.08, 0.08, 0.08, 0.08, 0.08, 0.23, 0.23, 0.23, 0.23, 0.23, 0.31, 0.31, 0.31, 0.31, 0.31, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.33, 0.33, 0.33, 0.33, 0.33, 0.25, 0.25, 0.25, 0.25, 0.25, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.08, 0.08, 0.08, 0.08, 0.08, 0.21, 0.21, 0.21, 0.21, 0.21, 0.26, 0.26]
                    
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716200155 --> 1716200777
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0]
                    

@mofosyne mofosyne merged commit fabf30b into master May 20, 2024
80 checks passed
@mofosyne mofosyne deleted the gg/remove-persimmon branch May 20, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants